spark-instrumented-optimizer

History

Takuya UESHIN 8edae94fa7 [SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11. ## What changes were proposed in this pull request? In PyArrow 0.11, there is a API breaking change. - [ARROW-1949](https://issues.apache.org/jira/browse/ARROW-1949) - [Python/C++] Add option to Array.from_pandas and pyarrow.array to perform unsafe casts. This causes test failures in `ScalarPandasUDFTests.test_vectorized_udf_null_(byte\|short\|int\|long)`: ``` File "/Users/ueshin/workspace/apache-spark/spark/python/pyspark/worker.py", line 377, in main process() File "/Users/ueshin/workspace/apache-spark/spark/python/pyspark/worker.py", line 372, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/ueshin/workspace/apache-spark/spark/python/pyspark/serializers.py", line 317, in dump_stream batch = _create_batch(series, self._timezone) File "/Users/ueshin/workspace/apache-spark/spark/python/pyspark/serializers.py", line 286, in _create_batch arrs = [create_array(s, t) for s, t in series] File "/Users/ueshin/workspace/apache-spark/spark/python/pyspark/serializers.py", line 284, in create_array return pa.Array.from_pandas(s, mask=mask, type=t) File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas return array(obj, mask=mask, type=type, safe=safe, from_pandas=True, File "pyarrow/array.pxi", line 169, in pyarrow.lib.array return _ndarray_to_array(values, mask, type, from_pandas, safe, File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array check_status(NdarrayToArrow(pool, values, mask, from_pandas, File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status raise ArrowInvalid(message) ArrowInvalid: Floating point value truncated ``` We should add a workaround to support PyArrow 0.11. ## How was this patch tested? In my local environment. Closes #23305 from ueshin/issues/SPARK-26355/pyarrow_0.11. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>		2018-12-13 13:14:59 +08:00
..
__init__.py
test_appsubmit.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_arrow.py	[SPARK-25274][PYTHON][SQL] In toPandas with Arrow send un-ordered record batches to improve performance	2018-12-06 10:07:28 -08:00
test_catalog.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_column.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_conf.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_context.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_dataframe.py	[SPARK-23647][PYTHON][SQL] Adds more types for hint in pyspark	2018-12-01 10:37:03 +08:00
test_datasources.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_functions.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_group.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf_grouped_agg.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf_grouped_map.py	[SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11.	2018-12-13 13:14:59 +08:00
test_pandas_udf_scalar.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf_window.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_readwriter.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_serde.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_session.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_streaming.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_types.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_udf.py	[SPARK-26293][SQL] Cast exception when having python udf in subquery	2018-12-11 14:16:51 +08:00
test_utils.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00