spark-instrumented-optimizer

History

Xinrong Meng 75fd1f5b82 [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins ### What changes were proposed in this pull request? Improve bool, string, numeric DataTypeOps tests by avoiding joins. Previously, bool, string, numeric DataTypeOps tests are conducted between two different Series. After the PR, bool, string, numeric DataTypeOps tests should perform on a single DataFrame. ### Why are the changes needed? A considerable number of DataTypeOps tests have operations on different Series, so joining is needed, which takes a long time. We shall avoid joins for a shorter test duration. The majority of joins happen in bool, string, numeric DataTypeOps tests, so we improve them first. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. Closes #33402 from xinrong-databricks/datatypeops_diffframe. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>		2021-07-23 12:20:35 +09:00
..
data_type_ops	[SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings	2021-07-23 11:27:31 +09:00
indexes	[SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex	2021-07-22 17:06:12 +09:00
missing	[SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex	2021-07-22 17:06:12 +09:00
plot	[SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark	2021-06-28 19:03:42 -07:00
spark	[SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark	2021-06-29 10:52:24 -07:00
tests	[SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins	2021-07-23 12:20:35 +09:00
typedef	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
usage_logging	[SPARK-35499][PYTHON] Apply black to pandas API on Spark codes	2021-06-06 17:30:07 -07:00
__init__.py	[SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package	2021-07-22 14:21:43 +09:00
_typing.py	[SPARK-35944][PYTHON] Introduce Name and Label type aliases	2021-07-01 09:40:07 +09:00
accessors.py	[SPARK-35944][PYTHON] Introduce Name and Label type aliases	2021-07-01 09:40:07 +09:00
base.py	[SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings	2021-07-23 11:27:31 +09:00
categorical.py	[SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex	2021-07-22 17:06:12 +09:00
config.py	[SPARK-35499][PYTHON] Apply black to pandas API on Spark codes	2021-06-06 17:30:07 -07:00
datetimes.py	[SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor	2021-06-01 10:33:10 +09:00
exceptions.py	[SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module	2021-05-21 11:03:35 -07:00
extensions.py	[SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark	2021-06-29 10:52:24 -07:00
frame.py	[SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings	2021-07-23 11:27:31 +09:00
generic.py	[SPARK-35806][PYTHON] Mapping the `mode` argument to pandas in DataFrame.to_csv	2021-07-19 19:58:11 +09:00
groupby.py	[SPARK-35944][PYTHON] Introduce Name and Label type aliases	2021-07-01 09:40:07 +09:00
indexing.py	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
internal.py	[SPARK-36167][PYTHON] Revisit more InternalField managements	2021-07-15 19:25:20 -07:00
ml.py	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
mlflow.py	[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs	2021-07-15 08:01:54 -07:00
namespace.py	[SPARK-35810][PYTHON][FOLLWUP] Deprecate ps.broadcast API	2021-07-22 17:10:03 +09:00
numpy_compat.py	[SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark	2021-06-28 19:03:42 -07:00
series.py	[SPARK-36167][PYTHON] Revisit more InternalField managements	2021-07-15 19:25:20 -07:00
sql_processor.py	[SPARK-35809][PYTHON] Add `index_col` argument for ps.sql	2021-07-22 17:08:34 +09:00
strings.py	[SPARK-35761][PYTHON] Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings	2021-06-15 11:17:56 +09:00
utils.py	[SPARK-35806][PYTHON] Mapping the `mode` argument to pandas in DataFrame.to_csv	2021-07-19 19:58:11 +09:00
window.py	[SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark	2021-06-29 10:52:24 -07:00