spark-instrumented-optimizer

History

Reynold Xin b515768f26 [SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions ## What changes were proposed in this pull request? When I was creating the example code for SPARK-10496, I realized it was pretty convoluted to define the frame boundaries for window functions when there is no partition column or ordering column. The reason is that we don't provide a way to create a WindowSpec directly with the frame boundaries. We can trivially improve this by adding rowsBetween and rangeBetween to Window object. As an example, to compute cumulative sum using the natural ordering, before this pr: ``` df.select('key, sum("value").over(Window.partitionBy(lit(1)).rowsBetween(Long.MinValue, 0))) ``` After this pr: ``` df.select('key, sum("value").over(Window.rowsBetween(Long.MinValue, 0))) ``` Note that you could argue there is no point specifying a window frame without partitionBy/orderBy -- but it is strange that only rowsBetween and rangeBetween are not the only two APIs not available. This also fixes https://issues.apache.org/jira/browse/SPARK-17656 (removing _root_.scala). ## How was this patch tested? Added test cases to compute cumulative sum in DataFrameWindowSuite for Scala/Java and tests.py for Python. Author: Reynold Xin <rxin@databricks.com> Closes #15412 from rxin/SPARK-17844.		2016-10-10 22:33:20 -07:00
..
__init__.py	[SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes"	2016-08-06 05:02:59 +01:00
catalog.py	[SPARK-17338][SQL] add global temp view	2016-10-10 15:48:57 +08:00
column.py	[SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.	2016-08-24 23:36:04 -07:00
conf.py	[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code	2016-05-23 18:14:48 -07:00
context.py	[SPARK-17338][SQL] add global temp view	2016-10-10 15:48:57 +08:00
dataframe.py	[SPARK-17338][SQL] add global temp view	2016-10-10 15:48:57 +08:00
functions.py	[SPARK-16960][SQL] Deprecate approxCountDistinct, toDegrees and toRadians according to FunctionRegistry	2016-10-07 11:49:34 +01:00
group.py	[MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark documentation	2016-07-06 10:45:51 -07:00
readwriter.py	[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths	2016-10-07 00:27:55 -07:00
session.py	[SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"	2016-09-02 10:08:14 -07:00
streaming.py	[MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation	2016-09-28 06:19:04 -04:00
tests.py	[SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions	2016-10-10 22:33:20 -07:00
types.py	[SPARK-17215][SQL] Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.	2016-08-24 23:36:04 -07:00
utils.py	[SPARK-15953][WIP][STREAMING] Renamed ContinuousQuery to StreamingQuery	2016-06-15 10:46:07 -07:00
window.py	[SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions	2016-10-10 22:33:20 -07:00