spark-instrumented-optimizer/python/pyspark/pandas
Xinrong Meng 75fd1f5b82 [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins
### What changes were proposed in this pull request?
Improve bool, string, numeric DataTypeOps tests by avoiding joins.

Previously, bool, string, numeric DataTypeOps tests are conducted between two different Series.
After the PR, bool, string, numeric DataTypeOps tests should perform on a single DataFrame.

### Why are the changes needed?
A considerable number of DataTypeOps tests have operations on different Series, so joining is needed, which takes a long time.
We shall avoid joins for a shorter test duration.

The majority of joins happen in bool, string, numeric DataTypeOps tests, so we improve them first.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #33402 from xinrong-databricks/datatypeops_diffframe.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-23 12:20:35 +09:00
..
data_type_ops [SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings 2021-07-23 11:27:31 +09:00
indexes [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:12 +09:00
missing [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:12 +09:00
plot [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
spark [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
tests [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins 2021-07-23 12:20:35 +09:00
typedef [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
usage_logging [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
__init__.py [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package 2021-07-22 14:21:43 +09:00
_typing.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
accessors.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
base.py [SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings 2021-07-23 11:27:31 +09:00
categorical.py [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:12 +09:00
config.py [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
datetimes.py [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor 2021-06-01 10:33:10 +09:00
exceptions.py [SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module 2021-05-21 11:03:35 -07:00
extensions.py [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
frame.py [SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings 2021-07-23 11:27:31 +09:00
generic.py [SPARK-35806][PYTHON] Mapping the mode argument to pandas in DataFrame.to_csv 2021-07-19 19:58:11 +09:00
groupby.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
indexing.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
internal.py [SPARK-36167][PYTHON] Revisit more InternalField managements 2021-07-15 19:25:20 -07:00
ml.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
mlflow.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
namespace.py [SPARK-35810][PYTHON][FOLLWUP] Deprecate ps.broadcast API 2021-07-22 17:10:03 +09:00
numpy_compat.py [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
series.py [SPARK-36167][PYTHON] Revisit more InternalField managements 2021-07-15 19:25:20 -07:00
sql_processor.py [SPARK-35809][PYTHON] Add index_col argument for ps.sql 2021-07-22 17:08:34 +09:00
strings.py [SPARK-35761][PYTHON] Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings 2021-06-15 11:17:56 +09:00
utils.py [SPARK-35806][PYTHON] Mapping the mode argument to pandas in DataFrame.to_csv 2021-07-19 19:58:11 +09:00
window.py [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00