spark-instrumented-optimizer/python/pyspark/pandas
itholic b8508f4876 [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3
### What changes were proposed in this pull request?

This PR proposes to fix `RollingGroupBy` and `ExpandingGroupBy` to follow latest pandas behavior.

`RollingGroupBy` and `ExpandingGroupBy` no longer returns grouped-by column in values from pandas 1.3.

Before:
```python
>>> df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
>>> df.groupby("A").rolling(2).sum()
       A    B
A
1 0  NaN  NaN
  1  2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN
```

After:
```python
>>> df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
>>> df.groupby("A").rolling(2).sum()
       B
A
1 0  NaN
  1  1.0
2 2  NaN
3 3  NaN
```

### Why are the changes needed?

We should follow the behavior of pandas as much as possible.

### Does this PR introduce _any_ user-facing change?

Yes, the result of `RollingGroupBy` and `ExpandingGroupBy` is changed as described above.

### How was this patch tested?

Unit tests.

Closes #33646 from itholic/SPARK-36388.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-08-10 10:12:52 +09:00
..
data_type_ops [SPARK-36192][PYTHON] Better error messages for DataTypeOps against lists 2021-08-03 16:25:49 +09:00
indexes [SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3 2021-08-09 11:10:01 +09:00
missing [SPARK-36260][PYTHON] Add set_categories to CategoricalAccessor and CategoricalIndex 2021-07-26 17:12:33 -07:00
plot [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
spark [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
tests [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 2021-08-10 10:12:52 +09:00
typedef [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
usage_logging [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
__init__.py [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package 2021-07-22 14:21:43 +09:00
_typing.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
accessors.py [SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side 2021-07-30 22:29:23 +09:00
base.py [SPARK-36333][PYTHON] Reuse isnull where the null check is needed 2021-07-29 15:33:11 -07:00
categorical.py [SPARK-36320][PYTHON] Fix Series/Index.copy() to drop extra columns 2021-07-28 18:39:53 +09:00
config.py [SPARK-36338][PYTHON][FOLLOW-UP] Keep the original default value as 'sequence' in default index in pandas on Spark 2021-07-31 08:31:10 +09:00
datetimes.py [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor 2021-06-01 10:33:10 +09:00
exceptions.py [SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module 2021-05-21 11:03:35 -07:00
extensions.py [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
frame.py [SPARK-35811][PYTHON][FOLLOWUP] Deprecate DataFrame.to_spark_io 2021-08-04 16:20:29 +09:00
generic.py [SPARK-36350][PYTHON] Move some logic related to F.nanvl to DataTypeOps 2021-07-30 11:19:49 -07:00
groupby.py [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 2021-08-10 10:12:52 +09:00
indexing.py [SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side 2021-07-30 22:29:23 +09:00
internal.py [SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side 2021-07-30 22:29:23 +09:00
ml.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
mlflow.py [SPARK-36254][PYTHON][FOLLOW-UP] Skip mlflow related tests in pandas on Spark 2021-07-30 22:28:19 +09:00
namespace.py [SPARK-35810][PYTHON][FOLLWUP] Deprecate ps.broadcast API 2021-07-22 17:10:03 +09:00
numpy_compat.py [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
series.py [SPARK-36345][SPARK-36367][INFRA][PYTHON] Disable tests failed by the incompatible behavior of pandas 1.3 2021-08-03 14:02:18 +09:00
sql_processor.py [SPARK-35809][PYTHON] Add index_col argument for ps.sql 2021-07-22 17:08:34 +09:00
strings.py [SPARK-35761][PYTHON] Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings 2021-06-15 11:17:56 +09:00
utils.py [SPARK-35806][PYTHON] Mapping the mode argument to pandas in DataFrame.to_csv 2021-07-19 19:58:11 +09:00
window.py [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 2021-08-10 10:12:52 +09:00