spark-instrumented-optimizer

History

Bryan Cutler 16990f9299 [SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0 ## What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0 Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 * Java update to official Flatbuffers version 1.9.0, ARROW-3175 complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0) PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name ## How was this patch tested? Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0 Closes #23657 from BryanCutler/arrow-upgrade-012. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>		2019-01-29 14:18:45 +08:00
..
__init__.py	[SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files	2018-11-14 14:51:11 +08:00
test_appsubmit.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_arrow.py	[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0	2019-01-29 14:18:45 +08:00
test_catalog.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_column.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_conf.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_context.py	[SPARK-26676][PYTHON] Make HiveContextSQLTests.test_unbounded_frames test compatible with Python 2 and PyPy	2019-01-21 14:27:17 -08:00
test_dataframe.py	[SPARK-23647][PYTHON][SQL] Adds more types for hint in pyspark	2018-12-01 10:37:03 +08:00
test_datasources.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_functions.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_group.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf.py	[SPARK-25811][PYSPARK] Raise a proper error when unsafe cast is detected by PyArrow	2019-01-22 14:54:41 +08:00
test_pandas_udf_grouped_agg.py	[SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*	2018-12-14 10:45:24 +08:00
test_pandas_udf_grouped_map.py	[SPARK-26566][PYTHON][SQL] Upgrade Apache Arrow to version 0.12.0	2019-01-29 14:18:45 +08:00
test_pandas_udf_scalar.py	[SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*	2018-12-14 10:45:24 +08:00
test_pandas_udf_window.py	[SPARK-24561][SQL][PYTHON] User-defined window aggregation functions with Pandas UDF (bounded window)	2018-12-18 09:15:21 +08:00
test_readwriter.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_serde.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_session.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_streaming.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_types.py	[SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype	2019-01-20 17:43:50 +08:00
test_udf.py	[SPARK-26293][SQL] Cast exception when having python udf in subquery	2018-12-11 14:16:51 +08:00
test_utils.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00