spark-instrumented-optimizer/python/pyspark/sql
Reynold Xin 6f20a92ca3 [SPARK-17845] [SQL] More self-evident window function frame boundary API
## What changes were proposed in this pull request?
This patch improves the window function frame boundary API to make it more obvious to read and to use. The two high level changes are:

1. Create Window.currentRow, Window.unboundedPreceding, Window.unboundedFollowing to indicate the special values in frame boundaries. These methods map to the special integral values so we are not breaking backward compatibility here. This change makes the frame boundaries more self-evident (instead of Long.MinValue, it becomes Window.unboundedPreceding).

2. In Python, for any value less than or equal to JVM's Long.MinValue, treat it as Window.unboundedPreceding. For any value larger than or equal to JVM's Long.MaxValue, treat it as Window.unboundedFollowing. Before this change, if the user specifies any value that is less than Long.MinValue but not -sys.maxsize (e.g. -sys.maxsize + 1), the number we pass over to the JVM would overflow, resulting in a frame that does not make sense.

Code example required to specify a frame before this patch:
```
Window.rowsBetween(-Long.MinValue, 0)
```

While the above code should still work, the new way is more obvious to read:
```
Window.rowsBetween(Window.unboundedPreceding, Window.currentRow)
```

## How was this patch tested?
- Updated DataFrameWindowSuite (for Scala/Java)
- Updated test_window_functions_cumulative_sum (for Python)
- Renamed DataFrameWindowSuite DataFrameWindowFunctionsSuite to better reflect its purpose

Author: Reynold Xin <rxin@databricks.com>

Closes #15438 from rxin/SPARK-17845.
2016-10-12 16:45:10 -07:00
..
__init__.py [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes" 2016-08-06 05:02:59 +01:00
catalog.py [SPARK-17338][SQL][FOLLOW-UP] add global temp view 2016-10-11 15:21:28 +08:00
column.py [SPARK-17215][SQL] Method SQLContext.parseDataType(dataTypeString: String) could be removed. 2016-08-24 23:36:04 -07:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-17338][SQL] add global temp view 2016-10-10 15:48:57 +08:00
dataframe.py [SPARK-14761][SQL] Reject invalid join methods when join columns are not specified in PySpark DataFrame join. 2016-10-12 10:09:49 -07:00
functions.py [SPARK-16960][SQL] Deprecate approxCountDistinct, toDegrees and toRadians according to FunctionRegistry 2016-10-07 11:49:34 +01:00
group.py [MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark documentation 2016-07-06 10:45:51 -07:00
readwriter.py [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths 2016-10-07 00:27:55 -07:00
session.py [SPARK-17720][SQL] introduce static SQL conf 2016-10-11 20:27:08 -07:00
streaming.py [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation 2016-09-28 06:19:04 -04:00
tests.py [SPARK-17845] [SQL] More self-evident window function frame boundary API 2016-10-12 16:45:10 -07:00
types.py [SPARK-17215][SQL] Method SQLContext.parseDataType(dataTypeString: String) could be removed. 2016-08-24 23:36:04 -07:00
utils.py [SPARK-15953][WIP][STREAMING] Renamed ContinuousQuery to StreamingQuery 2016-06-15 10:46:07 -07:00
window.py [SPARK-17845] [SQL] More self-evident window function frame boundary API 2016-10-12 16:45:10 -07:00