[SPARK-36142][PYTHON] Follow Pandas when pow between fractional series with Na and bool literal

### What changes were proposed in this pull request? Set the result to 1 when the exp with 0(or False). ### Why are the changes needed? Currently, exponentiation between fractional series and bools is not consistent with pandas' behavior. ``` >>> pser = pd.Series([1, 2, np.nan], dtype=float) >>> psser = ps.from_pandas(pser) >>> pser ** False 0 1.0 1 1.0 2 1.0 dtype: float64 >>> psser ** False 0 1.0 1 1.0 2 NaN dtype: float64 ``` We ought to adjust that. See more in [SPARK-36142](https://issues.apache.org/jira/browse/SPARK-36142) ### Does this PR introduce _any_ user-facing change? Yes, it introduces a user-facing change, resulting in a different result for pow between fractional Series with missing values and bool literal, the results follow pandas behavior. ### How was this patch tested? - Add test_pow_with_float_nan ut - Exsiting test in test_pow Closes #33521 from Yikun/SPARK-36142. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d52c2de08b) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-27 12:06:05 +09:00 · 2021-07-27 12:06:05 +09:00 · 139536c3ed
parent 1641812e97
commit 139536c3ed
2 changed files with 17 additions and 1 deletions
--- a/python/pyspark/pandas/data_type_ops/num_ops.py
+++ b/python/pyspark/pandas/data_type_ops/num_ops.py
@ -79,7 +79,11 @@ class NumericOps(DataTypeOps):
            raise TypeError("Exponentiation can not be applied to given types.")

        def pow_func(left: Column, right: Any) -> Column:
-            return F.when(left == 1, left).otherwise(Column.__pow__(left, right))
+            return (
+                F.when(left == 1, left)
+                .when(SF.lit(right) == 0, 1)
+                .otherwise(Column.__pow__(left, right))
+            )

        right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
        return column_op(pow_func)(left, right)
--- a/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
+++ b/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
@ -183,6 +183,18 @@ class NumOpsTest(PandasOnSparkTestCase, TestCasesUtils):
                else:
                    self.assertRaises(TypeError, lambda: psser ** psdf[n_col])

+    # TODO(SPARK-36031): Merge test_pow_with_nan into test_pow
+    def test_pow_with_float_nan(self):
+        for col in self.numeric_w_nan_df_cols:
+            if col == "float_w_nan":
+                pser, psser = self.numeric_w_nan_pdf[col], self.numeric_w_nan_psdf[col]
+                self.assert_eq(pser ** pser, psser ** psser)
+                self.assert_eq(pser ** pser.astype(bool), psser ** psser.astype(bool))
+                self.assert_eq(pser ** True, psser ** True)
+                self.assert_eq(pser ** False, psser ** False)
+                self.assert_eq(pser ** 1, psser ** 1)
+                self.assert_eq(pser ** 0, psser ** 0)
+
    def test_radd(self):
        pdf, psdf = self.pdf, self.psdf
        for col in self.numeric_df_cols: