spark-instrumented-optimizer/python/pyspark
Xinrong Meng d1b24d8aba [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
### What changes were proposed in this pull request?

The PR is proposed for **pandas APIs on Spark**, in order to separate arithmetic operations shown as below into data-type-based structures.
`__add__, __sub__, __mul__, __truediv__, __floordiv__, __pow__, __mod__,
__radd__, __rsub__, __rmul__, __rtruediv__, __rfloordiv__, __rpow__,__rmod__`

DataTypeOps and subclasses are introduced.

The existing behaviors of each arithmetic operation should be preserved.

### Why are the changes needed?

Currently, the same arithmetic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types.

Introducing DataTypeOps would be the foundation for [pandas APIs on Spark: Separate basic operations into data type based structures.](https://docs.google.com/document/d/12MS6xK0hETYmrcl5b9pX5lgV4FmGVfpmcSKq--_oQlc/edit?usp=sharing).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tests are introduced under pyspark.pandas.tests.data_type_ops. One test file per DataTypeOps class.

Closes #32469 from xinrong-databricks/datatypeop_arith.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
2021-05-19 15:05:32 -07:00
..
cloudpickle [SPARK-33983][PYTHON] Update cloudpickle to v1.6.0 2021-01-04 10:36:31 -08:00
ml [MINOR][DOCS] Add required imports to CV, train validation split Pyspark ML examples 2021-05-15 08:13:54 -05:00
mllib [SPARK-35176][PYTHON] Standardize input validation error type 2021-05-03 15:34:24 +09:00
pandas [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures 2021-05-19 15:05:32 -07:00
resource [SPARK-32320][PYSPARK] Remove mutable default arguments 2020-12-08 09:35:36 +08:00
sql [SPARK-35418][SQL] Add sentences function to functions.{scala,py} 2021-05-19 20:07:28 +09:00
streaming [MINOR] update dstream.py with more accurate exceptions 2020-12-21 14:17:09 -08:00
testing [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures 2021-05-19 15:05:32 -07:00
tests [SPARK-34604][PYTHON][TESTS] Use eventually in TaskContextTestsWithWorkerReuse.test_task_context_correct_with_python_worker_reuse 2021-03-04 08:40:48 +09:00
__init__.py [SPARK-34630][PYTHON][FOLLOWUP] Add __version__ into pyspark init __all__ 2021-04-14 23:36:25 +02:00
__init__.pyi [SPARK-34630][PYTHON][FOLLOWUP] Add __version__ into pyspark init __all__ 2021-04-14 23:36:25 +02:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
accumulators.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
accumulators.pyi [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
broadcast.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
broadcast.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
conf.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
conf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
context.py [SPARK-32447][CORE][PYTHON][FOLLOW-UP] Fix other occurrences of 'python' to 'python3' 2020-12-13 10:41:47 +09:00
context.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
files.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
find_spark_home.py [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option 2021-01-05 17:21:32 +09:00
install.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
java_gateway.py Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
profiler.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
py.typed [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
rdd.py [SPARK-33730][PYTHON] Standardize warning types 2021-01-18 09:32:55 +09:00
rdd.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
resultiterable.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
serializers.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
shell.py [SPARK-33363] Add prompt information related to the current task when pyspark/sparkR starts 2020-11-10 11:12:19 +09:00
shuffle.py Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
statcounter.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
statcounter.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
status.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
storagelevel.py [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py 2020-09-15 08:41:22 -05:00
storagelevel.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
taskcontext.py [SPARK-35176][PYTHON] Standardize input validation error type 2021-05-03 15:34:24 +09:00
taskcontext.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-33407][PYTHON] Simplify the exception message from Python UDFs (disabled by default) 2020-11-17 14:15:31 +09:00
version.py [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00
version.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
worker.py [SPARK-33730][PYTHON][FOLLOW-UP] Consider the case when the current frame is None 2021-01-19 15:30:42 +09:00