[SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length

### What changes were proposed in this pull request? This is a follow-up of #29138 which added overload `slice` function to accept `Column` for `start` and `length` in Scala. This PR is updating the equivalent Python function to accept `Column` as well. ### Why are the changes needed? Now that Scala version accepts `Column`, Python version should also accept it. ### Does this PR introduce _any_ user-facing change? Yes, PySpark users will also be able to pass Column object to `start` and `length` parameter in `slice` function. ### How was this patch tested? Added tests. Closes #29195 from ueshin/issues/SPARK-32338/slice. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-23 13:53:50 +09:00 · 2020-07-23 13:53:50 +09:00 · 7b66882c9d
parent f8d29d371c
commit 7b66882c9d
2 changed files with 15 additions and 1 deletions
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@ -2068,7 +2068,11 @@ def slice(x, start, length):
    [Row(sliced=[2, 3]), Row(sliced=[5])]
    """
    sc = SparkContext._active_spark_context
-    return Column(sc._jvm.functions.slice(_to_java_column(x), start, length))
+    return Column(sc._jvm.functions.slice(
+        _to_java_column(x),
+        start._jc if isinstance(start, Column) else start,
+        length._jc if isinstance(length, Column) else length
+    ))


@since(2.4)
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@ -292,6 +292,16 @@ class FunctionsTests(ReusedSQLTestCase):
        for result in results:
            self.assertEqual(result[0], '')

+    def test_slice(self):
+        from pyspark.sql.functions import slice, lit
+
+        df = self.spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
+
+        self.assertEquals(
+            df.select(slice(df.x, 2, 2).alias("sliced")).collect(),
+            df.select(slice(df.x, lit(2), lit(2)).alias("sliced")).collect(),
+        )
+
    def test_array_repeat(self):
        from pyspark.sql.functions import array_repeat, lit