[SPARK-36142][PYTHON] Follow Pandas when pow between fractional series with Na and bool literal

### What changes were proposed in this pull request?

Set the result to 1 when the exp with 0(or False).

### Why are the changes needed?
Currently, exponentiation between fractional series and bools is not consistent with pandas' behavior.
```
 >>> pser = pd.Series([1, 2, np.nan], dtype=float)
 >>> psser = ps.from_pandas(pser)
 >>> pser ** False
 0 1.0
 1 1.0
 2 1.0
 dtype: float64
 >>> psser ** False
 0 1.0
 1 1.0
 2 NaN
 dtype: float64
```
We ought to adjust that.

See more in [SPARK-36142](https://issues.apache.org/jira/browse/SPARK-36142)

### Does this PR introduce _any_ user-facing change?
Yes, it introduces a user-facing change, resulting in a different result for pow between fractional Series with missing values and bool literal, the results follow pandas behavior.

### How was this patch tested?
- Add test_pow_with_float_nan ut
- Exsiting test in test_pow

Closes #33521 from Yikun/SPARK-36142.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit d52c2de08b)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
Yikun Jiang 2021-07-27 12:06:05 +09:00 committed by Hyukjin Kwon
parent 1641812e97
commit 139536c3ed
2 changed files with 17 additions and 1 deletions

View file

@ -79,7 +79,11 @@ class NumericOps(DataTypeOps):
raise TypeError("Exponentiation can not be applied to given types.")
def pow_func(left: Column, right: Any) -> Column:
return F.when(left == 1, left).otherwise(Column.__pow__(left, right))
return (
F.when(left == 1, left)
.when(SF.lit(right) == 0, 1)
.otherwise(Column.__pow__(left, right))
)
right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
return column_op(pow_func)(left, right)

View file

@ -183,6 +183,18 @@ class NumOpsTest(PandasOnSparkTestCase, TestCasesUtils):
else:
self.assertRaises(TypeError, lambda: psser ** psdf[n_col])
# TODO(SPARK-36031): Merge test_pow_with_nan into test_pow
def test_pow_with_float_nan(self):
for col in self.numeric_w_nan_df_cols:
if col == "float_w_nan":
pser, psser = self.numeric_w_nan_pdf[col], self.numeric_w_nan_psdf[col]
self.assert_eq(pser ** pser, psser ** psser)
self.assert_eq(pser ** pser.astype(bool), psser ** psser.astype(bool))
self.assert_eq(pser ** True, psser ** True)
self.assert_eq(pser ** False, psser ** False)
self.assert_eq(pser ** 1, psser ** 1)
self.assert_eq(pser ** 0, psser ** 0)
def test_radd(self):
pdf, psdf = self.pdf, self.psdf
for col in self.numeric_df_cols: