16990f9299
## What changes were proposed in this pull request? Upgrade Apache Arrow to version 0.12.0. This includes the Java artifacts and fixes to enable usage with pyarrow 0.12.0 Version 0.12.0 includes the following selected fixes/improvements relevant to Spark users: * Safe cast fails from numpy float64 array with nans to integer, ARROW-4258 * Java, Reduce heap usage for variable width vectors, ARROW-4147 * Binary identity cast not implemented, ARROW-4101 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098 * conversion to date object no longer needed, ARROW-3910 * Error reading IPC file with no record batches, ARROW-3894 * Signed to unsigned integer cast yields incorrect results when type sizes are the same, ARROW-3790 * from_pandas gives incorrect results when converting floating point to bool, ARROW-3428 * Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue), ARROW-3048 * Java update to official Flatbuffers version 1.9.0, ARROW-3175 complete list [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.12.0) PySpark requires the following fixes to work with PyArrow 0.12.0 * Encrypted pyspark worker fails due to ChunkedStream missing closed property * pyarrow now converts dates as objects by default, which causes error because type is assumed datetime64 * ArrowTests fails due to difference in raised error message * pyarrow.open_stream deprecated * tests fail because groupby adds index column with duplicate name ## How was this patch tested? Ran unit tests with pyarrow versions 0.8.0, 0.10.0, 0.11.1, 0.12.0 Closes #23657 from BryanCutler/arrow-upgrade-012. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
ml | ||
mllib | ||
sql | ||
streaming | ||
testing | ||
tests | ||
__init__.py | ||
_globals.py | ||
accumulators.py | ||
broadcast.py | ||
cloudpickle.py | ||
conf.py | ||
context.py | ||
daemon.py | ||
files.py | ||
find_spark_home.py | ||
heapq3.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
rdd.py | ||
rddsampler.py | ||
resultiterable.py | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
status.py | ||
storagelevel.py | ||
taskcontext.py | ||
traceback_utils.py | ||
util.py | ||
version.py | ||
worker.py |