[SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length

### What changes were proposed in this pull request?

This is a follow-up of #29138 which added overload `slice` function to accept `Column` for `start` and `length` in Scala.

This PR is updating the equivalent Python function to accept `Column` as well.

### Why are the changes needed?

Now that Scala version accepts `Column`, Python version should also accept it.

### Does this PR introduce _any_ user-facing change?

Yes, PySpark users will also be able to pass Column object to `start` and `length` parameter in `slice` function.

### How was this patch tested?

Added tests.

Closes #29195 from ueshin/issues/SPARK-32338/slice.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
Takuya UESHIN 2020-07-23 13:53:50 +09:00 committed by HyukjinKwon
parent f8d29d371c
commit 7b66882c9d
2 changed files with 15 additions and 1 deletions

View file

@ -2068,7 +2068,11 @@ def slice(x, start, length):
[Row(sliced=[2, 3]), Row(sliced=[5])]
"""
sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.slice(_to_java_column(x), start, length))
return Column(sc._jvm.functions.slice(
_to_java_column(x),
start._jc if isinstance(start, Column) else start,
length._jc if isinstance(length, Column) else length
))
@since(2.4)

View file

@ -292,6 +292,16 @@ class FunctionsTests(ReusedSQLTestCase):
for result in results:
self.assertEqual(result[0], '')
def test_slice(self):
from pyspark.sql.functions import slice, lit
df = self.spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
self.assertEquals(
df.select(slice(df.x, 2, 2).alias("sliced")).collect(),
df.select(slice(df.x, lit(2), lit(2)).alias("sliced")).collect(),
)
def test_array_repeat(self):
from pyspark.sql.functions import array_repeat, lit