8682bb11ae
### What changes were proposed in this pull request? This PR proposes to allow `array_contains` to take column instances. ### Why are the changes needed? For consistent support in Scala and Python APIs. Scala allows column instances at `array_contains` Scala: ```scala import org.apache.spark.sql.functions._ val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data") df.select(array_contains($"data", lit("a"))).show() ``` Python: ```python from pyspark.sql.functions import array_contains, lit df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data']) df.select(array_contains(df.data, lit("a"))).show() ``` However, PySpark sides does not allow. ### Does this PR introduce any user-facing change? Yes. ```python from pyspark.sql.functions import array_contains, lit df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data']) df.select(array_contains(df.data, lit("a"))).show() ``` **Before:** ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/functions.py", line 1950, in array_contains return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__ File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__ raise TypeError("Column is not iterable") TypeError: Column is not iterable ``` **After:** ``` +-----------------------+ |array_contains(data, a)| +-----------------------+ | true| | false| +-----------------------+ ``` ### How was this patch tested? Manually tested and added a doctest. Closes #26288 from HyukjinKwon/SPARK-29627. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
ml | ||
mllib | ||
sql | ||
streaming | ||
testing | ||
tests | ||
__init__.py | ||
_globals.py | ||
accumulators.py | ||
broadcast.py | ||
cloudpickle.py | ||
conf.py | ||
context.py | ||
daemon.py | ||
files.py | ||
find_spark_home.py | ||
heapq3.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
rdd.py | ||
rddsampler.py | ||
resourceinformation.py | ||
resultiterable.py | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
status.py | ||
storagelevel.py | ||
taskcontext.py | ||
traceback_utils.py | ||
util.py | ||
version.py | ||
worker.py |