Commit graph

8 commits

Author SHA1 Message Date
zero323 a9cbfc4f6a [SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames
## What changes were proposed in this pull request?

Makes `Window.unboundedPreceding` and `Window.unboundedFollowing` backward compatible.

## How was this patch tested?

Pyspark SQL unittests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: zero323 <zero323@users.noreply.github.com>

Closes #16123 from zero323/SPARK-17845-follow-up.
2016-12-02 17:39:28 -08:00
Reynold Xin 6f20a92ca3 [SPARK-17845] [SQL] More self-evident window function frame boundary API
## What changes were proposed in this pull request?
This patch improves the window function frame boundary API to make it more obvious to read and to use. The two high level changes are:

1. Create Window.currentRow, Window.unboundedPreceding, Window.unboundedFollowing to indicate the special values in frame boundaries. These methods map to the special integral values so we are not breaking backward compatibility here. This change makes the frame boundaries more self-evident (instead of Long.MinValue, it becomes Window.unboundedPreceding).

2. In Python, for any value less than or equal to JVM's Long.MinValue, treat it as Window.unboundedPreceding. For any value larger than or equal to JVM's Long.MaxValue, treat it as Window.unboundedFollowing. Before this change, if the user specifies any value that is less than Long.MinValue but not -sys.maxsize (e.g. -sys.maxsize + 1), the number we pass over to the JVM would overflow, resulting in a frame that does not make sense.

Code example required to specify a frame before this patch:
```
Window.rowsBetween(-Long.MinValue, 0)
```

While the above code should still work, the new way is more obvious to read:
```
Window.rowsBetween(Window.unboundedPreceding, Window.currentRow)
```

## How was this patch tested?
- Updated DataFrameWindowSuite (for Scala/Java)
- Updated test_window_functions_cumulative_sum (for Python)
- Renamed DataFrameWindowSuite DataFrameWindowFunctionsSuite to better reflect its purpose

Author: Reynold Xin <rxin@databricks.com>

Closes #15438 from rxin/SPARK-17845.
2016-10-12 16:45:10 -07:00
Reynold Xin b515768f26 [SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions
## What changes were proposed in this pull request?
When I was creating the example code for SPARK-10496, I realized it was pretty convoluted to define the frame boundaries for window functions when there is no partition column or ordering column. The reason is that we don't provide a way to create a WindowSpec directly with the frame boundaries. We can trivially improve this by adding rowsBetween and rangeBetween to Window object.

As an example, to compute cumulative sum using the natural ordering, before this pr:
```
df.select('key, sum("value").over(Window.partitionBy(lit(1)).rowsBetween(Long.MinValue, 0)))
```

After this pr:
```
df.select('key, sum("value").over(Window.rowsBetween(Long.MinValue, 0)))
```

Note that you could argue there is no point specifying a window frame without partitionBy/orderBy -- but it is strange that only rowsBetween and rangeBetween are not the only two APIs not available.

This also fixes https://issues.apache.org/jira/browse/SPARK-17656 (removing _root_.scala).

## How was this patch tested?
Added test cases to compute cumulative sum in DataFrameWindowSuite for Scala/Java and tests.py for Python.

Author: Reynold Xin <rxin@databricks.com>

Closes #15412 from rxin/SPARK-17844.
2016-10-10 22:33:20 -07:00
zero323 8193a266b5 [SPARK-14058][PYTHON] Incorrect docstring in Window.order
## What changes were proposed in this pull request?

Replaces current docstring ("Creates a :class:`WindowSpec` with the partitioning defined.") with "Creates a :class:`WindowSpec` with the ordering defined."

## How was this patch tested?

PySpark unit tests (no regression introduced). No changes to the code.

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #11877 from zero323/order-by-description.
2016-03-21 23:52:33 -07:00
Davies Liu 3a11e50e21 [SPARK-10373] [PYSPARK] move @since into pyspark from sql
cc mengxr

Author: Davies Liu <davies@databricks.com>

Closes #8657 from davies/move_since.
2015-09-08 20:56:22 -07:00
Davies Liu 11ed2b180e [SPARK-9978] [PYSPARK] [SQL] fix Window.orderBy and doc of ntile()
Author: Davies Liu <davies@databricks.com>

Closes #8213 from davies/fix_window.
2015-08-14 13:55:29 -07:00
Reynold Xin 0ac47083f7 [SPARK-8146] DataFrame Python API: Alias replace in df.na
Author: Reynold Xin <rxin@databricks.com>

Closes #6688 from rxin/df-alias-replace and squashes the following commits:

774c19c [Reynold Xin] [SPARK-8146] DataFrame Python API: Alias replace in DataFrameNaFunctions.
2015-06-07 01:21:02 -07:00
Davies Liu efe3bfdf49 [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates
1. ntile should take an integer as parameter.
2. Added Python API (based on #6364)
3. Update documentation of various DataFrame Python functions.

Author: Davies Liu <davies@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #6374 from rxin/window-final and squashes the following commits:

69004c7 [Reynold Xin] Style fix.
288cea9 [Reynold Xin] Update documentaiton.
7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
66092b4 [Davies Liu] update docs
ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
8936ade [Davies Liu] fix maxint in python 3
2649358 [Davies Liu] update docs
778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
2015-05-23 08:30:05 -07:00