ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Takuya UESHIN	d44e6c7f10	Revert "[SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures" This reverts commit `d1b24d8aba`.	2021-05-19 16:49:47 -07:00
Xinrong Meng	d1b24d8aba	[SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures ### What changes were proposed in this pull request? The PR is proposed for pandas APIs on Spark, in order to separate arithmetic operations shown as below into data-type-based structures. `__add__, __sub__, __mul__, __truediv__, __floordiv__, __pow__, __mod__, __radd__, __rsub__, __rmul__, __rtruediv__, __rfloordiv__, __rpow__,__rmod__` DataTypeOps and subclasses are introduced. The existing behaviors of each arithmetic operation should be preserved. ### Why are the changes needed? Currently, the same arithmetic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. Introducing DataTypeOps would be the foundation for [pandas APIs on Spark: Separate basic operations into data type based structures.](https://docs.google.com/document/d/12MS6xK0hETYmrcl5b9pX5lgV4FmGVfpmcSKq--_oQlc/edit?usp=sharing). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests are introduced under pyspark.pandas.tests.data_type_ops. One test file per DataTypeOps class. Closes #32469 from xinrong-databricks/datatypeop_arith. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>	2021-05-19 15:05:32 -07:00
Xinrong Meng	4d2b559d92	[SPARK-34999][PYTHON] Consolidate PySpark testing utils ### What changes were proposed in this pull request? Consolidate PySpark testing utils by removing `python/pyspark/pandas/testing`, and then creating a file `pandasutils` under `python/pyspark/testing` for test utilities used in `pyspark/pandas`. ### Why are the changes needed? `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and `python/pyspark/testing` contain test utilities for pyspark. Consolidating them makes code cleaner and easier to maintain. Updated import statements are as shown below: - from pyspark.testing.sqlutils import SQLTestUtils - from pyspark.testing.pandasutils import PandasOnSparkTestCase, TestUtils (PandasOnSparkTestCase is the original ReusedSQLTestCase in `python/pyspark/pandas/testing/utils.py`) Minor improvements include: - Usage of missing library's requirement_message - `except ImportError` rather than `except` - import pyspark.pandas alias as `ps` rather than `pp` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests under python/pyspark/pandas/tests. Closes #32177 from xinrong-databricks/port.merge_utils. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>	2021-04-22 13:07:35 -07:00
Xinrong Meng	4aee19efb4	[SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark ### What changes were proposed in this pull request? Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas Index unit tests to PySpark. ### Why are the changes needed? Currently, the pandas-on-Spark modules are not tested fully. We should enable the Index unit tests. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Enable Index unit tests. Closes #32139 from xinrong-databricks/port.indexes_tests. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-04-16 08:53:30 +09:00

4 commits