spark-instrumented-optimizer

History

Pedro Rodriguez 560c658a74 [SPARK-8230][SQL] Add array/map size method Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230 Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket. Things to review: 1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python. 2. In Python code, should it be in a `1.5.0` function array or in a collections array? 3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case? 4. Something else? Author: Pedro Rodriguez <ski.rodriguez@gmail.com> Author: Pedro Rodriguez <prodriguez@trulia.com> Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits: 9a442ae [Pedro Rodriguez] fixed functions and sorted __all__ 9aea3bb [Pedro Rodriguez] removed imports from python docs 15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen d88247c [Pedro Rodriguez] removed python code bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge 59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print 130839f [Pedro Rodriguez] fixed failing test aa9bade [Pedro Rodriguez] fix style e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests 0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations 9a1a2ff [Pedro Rodriguez] added unit tests for map size 2bfbcb6 [Pedro Rodriguez] added unit test for size 20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays 99a6a5c [Pedro Rodriguez] fixed failing test cac75ac [Pedro Rodriguez] fix style 933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests 42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations f9c3b8a [Pedro Rodriguez] added unit tests for map size 2515d9f [Pedro Rodriguez] added documentation 0e60541 [Pedro Rodriguez] added unit test for size acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python 84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays		2015-07-21 00:53:20 -07:00
..
__init__.py	[SPARK-8060] Improve DataFrame Python test coverage and documentation.	2015-06-03 00:23:34 -07:00
column.py	[SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression	2015-06-23 15:51:16 -07:00
context.py	[SPARK-9114] [SQL] [PySpark] convert returned object from UDF into internal type	2015-07-20 12:14:47 -07:00
dataframe.py	[SPARK-7902] [SPARK-6289] [SPARK-8685] [SQL] [PYSPARK] Refactor of serialization for Python DataFrame	2015-07-09 14:43:38 -07:00
functions.py	[SPARK-8230][SQL] Add array/map size method	2015-07-21 00:53:20 -07:00
group.py	[SPARK-8770][SQL] Create BinaryOperator abstract class.	2015-07-01 21:14:13 -07:00
readwriter.py	[SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC	2015-07-21 15:08:44 +08:00
tests.py	[SPARK-9114] [SQL] [PySpark] convert returned object from UDF into internal type	2015-07-20 12:14:47 -07:00
types.py	[SPARK-9101] [PySpark] Add missing NullType	2015-07-20 12:00:48 -07:00
utils.py	[SPARK-9166][SQL][PYSPARK] Capture and hide IllegalArgumentException in Python API	2015-07-19 00:32:56 -07:00
window.py	[SPARK-8146] DataFrame Python API: Alias replace in df.na	2015-06-07 01:21:02 -07:00