spark-instrumented-optimizer

History

HyukjinKwon 8682bb11ae [SPARK-29627][PYTHON][SQL] Allow array_contains to take column instances ### What changes were proposed in this pull request? This PR proposes to allow `array_contains` to take column instances. ### Why are the changes needed? For consistent support in Scala and Python APIs. Scala allows column instances at `array_contains` Scala: ```scala import org.apache.spark.sql.functions._ val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data") df.select(array_contains($"data", lit("a"))).show() ``` Python: ```python from pyspark.sql.functions import array_contains, lit df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data']) df.select(array_contains(df.data, lit("a"))).show() ``` However, PySpark sides does not allow. ### Does this PR introduce any user-facing change? Yes. ```python from pyspark.sql.functions import array_contains, lit df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data']) df.select(array_contains(df.data, lit("a"))).show() ``` Before: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/functions.py", line 1950, in array_contains return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__ File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__ raise TypeError("Column is not iterable") TypeError: Column is not iterable ``` After: ``` +-----------------------+ \|array_contains(data, a)\| +-----------------------+ \| true\| \| false\| +-----------------------+ ``` ### How was this patch tested? Manually tested and added a doctest. Closes #26288 from HyukjinKwon/SPARK-29627. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-10-30 09:45:19 +09:00
..
avro	[SPARK-28698][SQL] Support user-specified output schema in `to_avro`	2019-08-13 20:52:16 +08:00
tests	[SPARK-29402][PYTHON][TESTS] Added tests for grouped map pandas_udf with window	2019-10-11 16:19:13 -07:00
__init__.py	[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF	2019-09-30 22:25:35 +09:00
catalog.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
cogroup.py	[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF	2019-09-30 22:25:35 +09:00
column.py	[SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column	2019-06-13 11:04:41 +09:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
dataframe.py	[SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator	2019-09-20 09:59:31 -07:00
functions.py	[SPARK-29627][PYTHON][SQL] Allow array_contains to take column instances	2019-10-30 09:45:19 +09:00
group.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
readwriter.py	[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating	2019-10-24 10:25:04 -07:00
session.py	[SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized	2019-06-11 18:43:59 +09:00
streaming.py	[SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read	2019-10-15 15:44:51 -05:00
types.py	[SPARK-29041][PYTHON] Allows createDataFrame to accept bytes as binary type	2019-09-12 08:52:25 +09:00
udf.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
utils.py	[SPARK-21045][PYTHON] Allow non-ascii string as an exception message from python execution in Python 2	2019-09-21 08:09:19 +09:00
window.py	[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations	2019-09-01 10:15:00 -05:00