spark-instrumented-optimizer

History

Bryan Cutler 43d9c7e7e5 [SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion ### What changes were proposed in this pull request? Prevent unnecessary copies of data during conversion from Arrow to Pandas. ### Why are the changes needed? During conversion of pyarrow data to Pandas, columns are checked for timestamp types and then modified to correct for local timezone. If the data contains no timestamp types, then unnecessary copies of the data can be made. This is most prevalent when checking columns of a pandas DataFrame where each series is assigned back to the DataFrame, regardless if it had timestamps. See https://www.mail-archive.com/devarrow.apache.org/msg17008.html and ARROW-7596 for discussion. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #27358 from BryanCutler/pyspark-pandas-timestamp-copy-fix-SPARK-30640. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>		2020-01-26 15:21:06 -08:00
..
avro	[SPARK-27506][SQL][FOLLOWUP] Use option `avroSchema` to specify an evolved schema in `from_avro`	2019-12-30 18:14:21 +09:00
pandas	[SPARK-30640][PYTHON][SQL] Prevent unnecessary copies of data during Arrow to Pandas conversion	2020-01-26 15:21:06 -08:00
tests	[SPARK-30607][SQL][PYSPARK][SPARKR] Add overlay wrappers for SparkR and PySpark	2020-01-23 16:16:47 +09:00
__init__.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
catalog.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
column.py	[SPARK-30188][SQL] Resolve the failed unit tests when enable AQE	2020-01-13 22:55:19 +08:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[MINOR][PYSPARK][DOCS] Fix typo in example documentation	2019-11-01 11:55:29 -07:00
dataframe.py	[SPARK-30539][PYTHON][SQL] Add DataFrame.tail in PySpark	2020-01-18 00:18:12 -08:00
functions.py	Revert "[SPARK-25496][SQL] Deprecate from_utc_timestamp and to_utc_timestamp"	2020-01-25 21:34:12 -08:00
group.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
readwriter.py	[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC	2019-12-23 09:57:42 +09:00
session.py	[SPARK-30434][FOLLOW-UP][PYTHON][SQL] Make the parameter list consistent in createDataFrame	2020-01-16 12:39:44 +09:00
streaming.py	[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC	2019-12-23 09:57:42 +09:00
types.py	[SPARK-30252][SQL] Disallow negative scale of Decimal	2020-01-21 21:09:48 +08:00
udf.py	[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types	2020-01-22 15:32:58 +09:00
utils.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
window.py	[SPARK-30188][SQL] Resolve the failed unit tests when enable AQE	2020-01-13 22:55:19 +08:00