spark-instrumented-optimizer

History

HyukjinKwon fda0e6e48d [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function ### What changes were proposed in this pull request? This PR makes `element_at` in PySpark able to take PySpark `Column` instances. ### Why are the changes needed? To match with Scala side. Seems it was intended but not working correctly as a bug. ### Does this PR introduce any user-facing change? Yes. See below: ```python from pyspark.sql import functions as F x = spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num']) x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show() ``` Before: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/functions.py", line 2059, in element_at return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction)) File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__ File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args File "/.../forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__ raise TypeError("Column is not iterable") TypeError: Column is not iterable ``` After: ``` +---------+---+---+ \| list\|num\| aa\| +---------+---+---+ \|[1, 2, 3]\| 1\| 1\| \|[4, 5, 6]\| 2\| 5\| \|[7, 8, 9]\| 3\| 9\| +---------+---+---+ ``` ### How was this patch tested? Manually tested against literal, Python native types, and PySpark column. Closes #25950 from HyukjinKwon/SPARK-29240. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2019-09-27 11:04:55 -07:00
..
avro	[SPARK-28698][SQL] Support user-specified output schema in `to_avro`	2019-08-13 20:52:16 +08:00
tests	[SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator	2019-09-20 09:59:31 -07:00
__init__.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
catalog.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
cogroup.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
column.py	[SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column	2019-06-13 11:04:41 +09:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
dataframe.py	[SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator	2019-09-20 09:59:31 -07:00
functions.py	[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function	2019-09-27 11:04:55 -07:00
group.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
readwriter.py	[SPARK-28977][DOCS][SQL] Fix DataFrameReader.json docs to doc that partition column can be numeric, date or timestamp type	2019-09-05 18:32:45 +09:00
session.py	[SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized	2019-06-11 18:43:59 +09:00
streaming.py	[SPARK-28651][SS] Force the schema of Streaming file source to be nullable	2019-08-09 18:54:55 +09:00
types.py	[SPARK-29041][PYTHON] Allows createDataFrame to accept bytes as binary type	2019-09-12 08:52:25 +09:00
udf.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
utils.py	[SPARK-21045][PYTHON] Allow non-ascii string as an exception message from python execution in Python 2	2019-09-21 08:09:19 +09:00
window.py	[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations	2019-09-01 10:15:00 -05:00