spark-instrumented-optimizer

History

Pedro Rodriguez d34548587a [SPARK-8231] [SQL] Add array_contains This PR is based on #7580 , thanks to EntilZha PR for work on https://issues.apache.org/jira/browse/SPARK-8231 Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128 Main points are: 1. If the array is empty, null, or the value is null, return false 2. If there is a type mismatch, throw error 3. If comparison is not supported, throw error Closes #7580 Author: Pedro Rodriguez <prodriguez@trulia.com> Author: Pedro Rodriguez <ski.rodriguez@gmail.com> Author: Davies Liu <davies@databricks.com> Closes #7949 from davies/array_contains and squashes the following commits: d3c08bc [Davies Liu] use foreach() to avoid copy bc3d1fe [Davies Liu] fix array_contains 719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains e352cf9 [Pedro Rodriguez] fixed diff from master 4d5b0ff [Pedro Rodriguez] added docs and another type check ffc0591 [Pedro Rodriguez] fixed unit test 7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3 b5ffae8 [Pedro Rodriguez] fixed pyspark test 4e7dce3 [Pedro Rodriguez] added more docs 3082399 [Pedro Rodriguez] fixed unit test 46f9789 [Pedro Rodriguez] reverted change d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this 8528027 [Pedro Rodriguez] added more tests 686e029 [Pedro Rodriguez] fix scala style d262e9d [Pedro Rodriguez] reworked type checking code and added more tests 2517a58 [Pedro Rodriguez] removed unused import 28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests 12f8795 [Pedro Rodriguez] fix scala style checks e8a20a9 [Pedro Rodriguez] added python df (broken atm) 65b562c [Pedro Rodriguez] made array_contains nullable false 33b45aa [Pedro Rodriguez] reordered test 9623c64 [Pedro Rodriguez] fixed test 4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs 72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs 69c46fb [Pedro Rodriguez] added tests and codegen 9e0bfc4 [Pedro Rodriguez] initial attempt at implementation		2015-08-04 22:34:02 -07:00
..
ml	[SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier	2015-08-04 14:54:26 -07:00
mllib	[SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark.	2015-08-04 16:30:03 -07:00
sql	[SPARK-8231] [SQL] Add array_contains	2015-08-04 22:34:02 -07:00
streaming	[SPARK-8564] [STREAMING] Add the Python API for Kinesis	2015-07-31 12:09:48 -07:00
__init__.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
cloudpickle.py	[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__	2015-07-29 22:30:49 -07:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled	2015-07-22 21:04:04 -07:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-8850] [SQL] Enable Unsafe mode by default	2015-07-30 10:45:32 -07:00
join.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled	2015-07-22 21:04:04 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
shell.py	[SPARK-9270] [PYSPARK] allow --name option in pyspark	2015-07-24 11:56:55 -07:00
shuffle.py	[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__	2015-07-29 22:30:49 -07:00
statcounter.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-9244] Increase some memory defaults	2015-07-22 15:28:09 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-6216] [PYSPARK] check python version of worker with driver	2015-05-18 12:55:13 -07:00