spark-instrumented-optimizer/python/pyspark/sql
Pedro Rodriguez d34548587a [SPARK-8231] [SQL] Add array_contains
This PR is based on #7580 , thanks to EntilZha

PR for work on https://issues.apache.org/jira/browse/SPARK-8231

Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128

Main points are:
1. If the array is empty, null, or the value is null, return false
2. If there is a type mismatch, throw error
3. If comparison is not supported, throw error

Closes #7580

Author: Pedro Rodriguez <prodriguez@trulia.com>
Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
Author: Davies Liu <davies@databricks.com>

Closes #7949 from davies/array_contains and squashes the following commits:

d3c08bc [Davies Liu] use foreach() to avoid copy
bc3d1fe [Davies Liu] fix array_contains
719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains
e352cf9 [Pedro Rodriguez] fixed diff from master
4d5b0ff [Pedro Rodriguez] added docs and another type check
ffc0591 [Pedro Rodriguez] fixed unit test
7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3
b5ffae8 [Pedro Rodriguez] fixed pyspark test
4e7dce3 [Pedro Rodriguez] added more docs
3082399 [Pedro Rodriguez] fixed unit test
46f9789 [Pedro Rodriguez] reverted change
d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this
8528027 [Pedro Rodriguez] added more tests
686e029 [Pedro Rodriguez] fix scala style
d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
2517a58 [Pedro Rodriguez] removed unused import
28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
12f8795 [Pedro Rodriguez] fix scala style checks
e8a20a9 [Pedro Rodriguez] added python df (broken atm)
65b562c [Pedro Rodriguez] made array_contains nullable false
33b45aa [Pedro Rodriguez] reordered test
9623c64 [Pedro Rodriguez] fixed test
4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
69c46fb [Pedro Rodriguez] added tests and codegen
9e0bfc4 [Pedro Rodriguez] initial attempt at implementation
2015-08-04 22:34:02 -07:00
..
__init__.py [SPARK-8060] Improve DataFrame Python test coverage and documentation. 2015-06-03 00:23:34 -07:00
column.py [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression 2015-06-23 15:51:16 -07:00
context.py [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ 2015-07-29 22:30:49 -07:00
dataframe.py [SPARK-7157][SQL] add sampleBy to DataFrame 2015-07-30 17:16:03 -07:00
functions.py [SPARK-8231] [SQL] Add array_contains 2015-08-04 22:34:02 -07:00
group.py [SPARK-8770][SQL] Create BinaryOperator abstract class. 2015-07-01 21:14:13 -07:00
readwriter.py [SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC 2015-07-21 15:08:44 +08:00
tests.py [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ 2015-07-29 22:30:49 -07:00
types.py [SPARK-9408] [PYSPARK] [MLLIB] Refactor linalg.py to /linalg 2015-07-30 16:57:38 -07:00
utils.py [SPARK-9166][SQL][PYSPARK] Capture and hide IllegalArgumentException in Python API 2015-07-19 00:32:56 -07:00
window.py [SPARK-8146] DataFrame Python API: Alias replace in df.na 2015-06-07 01:21:02 -07:00