spark-instrumented-optimizer

History

itholic b8508f4876 [SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 ### What changes were proposed in this pull request? This PR proposes to fix `RollingGroupBy` and `ExpandingGroupBy` to follow latest pandas behavior. `RollingGroupBy` and `ExpandingGroupBy` no longer returns grouped-by column in values from pandas 1.3. Before: ```python >>> df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]}) >>> df.groupby("A").rolling(2).sum() A B A 1 0 NaN NaN 1 2.0 1.0 2 2 NaN NaN 3 3 NaN NaN ``` After: ```python >>> df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]}) >>> df.groupby("A").rolling(2).sum() B A 1 0 NaN 1 1.0 2 2 NaN 3 3 NaN ``` ### Why are the changes needed? We should follow the behavior of pandas as much as possible. ### Does this PR introduce _any_ user-facing change? Yes, the result of `RollingGroupBy` and `ExpandingGroupBy` is changed as described above. ### How was this patch tested? Unit tests. Closes #33646 from itholic/SPARK-36388. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>		2021-08-10 10:12:52 +09:00
..
data_type_ops	[SPARK-36192][PYTHON] Better error messages for DataTypeOps against lists	2021-08-03 16:25:49 +09:00
indexes	[SPARK-36369][PYTHON] Fix Index.union to follow pandas 1.3	2021-08-09 11:10:01 +09:00
missing	[SPARK-36260][PYTHON] Add set_categories to CategoricalAccessor and CategoricalIndex	2021-07-26 17:12:33 -07:00
plot	[SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark	2021-06-28 19:03:42 -07:00
spark	[SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark	2021-06-29 10:52:24 -07:00
tests	[SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3	2021-08-10 10:12:52 +09:00
typedef	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
usage_logging	[SPARK-35499][PYTHON] Apply black to pandas API on Spark codes	2021-06-06 17:30:07 -07:00
__init__.py	[SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package	2021-07-22 14:21:43 +09:00
_typing.py	[SPARK-35944][PYTHON] Introduce Name and Label type aliases	2021-07-01 09:40:07 +09:00
accessors.py	[SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side	2021-07-30 22:29:23 +09:00
base.py	[SPARK-36333][PYTHON] Reuse isnull where the null check is needed	2021-07-29 15:33:11 -07:00
categorical.py	[SPARK-36320][PYTHON] Fix Series/Index.copy() to drop extra columns	2021-07-28 18:39:53 +09:00
config.py	[SPARK-36338][PYTHON][FOLLOW-UP] Keep the original default value as 'sequence' in default index in pandas on Spark	2021-07-31 08:31:10 +09:00
datetimes.py	[SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor	2021-06-01 10:33:10 +09:00
exceptions.py	[SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module	2021-05-21 11:03:35 -07:00
extensions.py	[SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark	2021-06-29 10:52:24 -07:00
frame.py	[SPARK-35811][PYTHON][FOLLOWUP] Deprecate DataFrame.to_spark_io	2021-08-04 16:20:29 +09:00
generic.py	[SPARK-36350][PYTHON] Move some logic related to F.nanvl to DataTypeOps	2021-07-30 11:19:49 -07:00
groupby.py	[SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3	2021-08-10 10:12:52 +09:00
indexing.py	[SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side	2021-07-30 22:29:23 +09:00
internal.py	[SPARK-36338][PYTHON][SQL] Move distributed-sequence implementation to Scala side	2021-07-30 22:29:23 +09:00
ml.py	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
mlflow.py	[SPARK-36254][PYTHON][FOLLOW-UP] Skip mlflow related tests in pandas on Spark	2021-07-30 22:28:19 +09:00
namespace.py	[SPARK-35810][PYTHON][FOLLWUP] Deprecate ps.broadcast API	2021-07-22 17:10:03 +09:00
numpy_compat.py	[SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark	2021-06-28 19:03:42 -07:00
series.py	[SPARK-36345][SPARK-36367][INFRA][PYTHON] Disable tests failed by the incompatible behavior of pandas 1.3	2021-08-03 14:02:18 +09:00
sql_processor.py	[SPARK-35809][PYTHON] Add `index_col` argument for ps.sql	2021-07-22 17:08:34 +09:00
strings.py	[SPARK-35761][PYTHON] Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings	2021-06-15 11:17:56 +09:00
utils.py	[SPARK-35806][PYTHON] Mapping the `mode` argument to pandas in DataFrame.to_csv	2021-07-19 19:58:11 +09:00
window.py	[SPARK-36388][SPARK-36386][PYTHON] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3	2021-08-10 10:12:52 +09:00