e0538bd38c
### What changes were proposed in this pull request? Upgrade Apache Arrow to version 1.0.1 for the Java dependency and increase minimum version of PyArrow to 1.0.0. This release marks a transition to binary stability of the columnar format (which was already informally backward-compatible going back to December 2017) and a transition to Semantic Versioning for the Arrow software libraries. Also note that the Java arrow-memory artifact has been split to separate dependence on netty-buffer and allow users to select an allocator. Spark will continue to use `arrow-memory-netty` to maintain performance benefits. Version 1.0.0 - 1.0.0 include the following selected fixes/improvements relevant to Spark users: ARROW-9300 - [Java] Separate Netty Memory to its own module ARROW-9272 - [C++][Python] Reduce complexity in python to arrow conversion ARROW-9016 - [Java] Remove direct references to Netty/Unsafe Allocators ARROW-8664 - [Java] Add skip null check to all Vector types ARROW-8485 - [Integration][Java] Implement extension types integration ARROW-8434 - [C++] Ipc RecordBatchFileReader deserializes the Schema multiple times ARROW-8314 - [Python] Provide a method to select a subset of columns of a Table ARROW-8230 - [Java] Move Netty memory manager into a separate module ARROW-8229 - [Java] Move ArrowBuf into the Arrow package ARROW-7955 - [Java] Support large buffer for file/stream IPC ARROW-7831 - [Java] unnecessary buffer allocation when calling splitAndTransferTo on variable width vectors ARROW-6111 - [Java] Support LargeVarChar and LargeBinary types and add integration test with C++ ARROW-6110 - [Java] Support LargeList Type and add integration test with C++ ARROW-5760 - [C++] Optimize Take implementation ARROW-300 - [Format] Add body buffer compression option to IPC message protocol using LZ4 or ZSTD ARROW-9098 - RecordBatch::ToStructArray cannot handle record batches with 0 column ARROW-9066 - [Python] Raise correct error in isnull() ARROW-9223 - [Python] Fix to_pandas() export for timestamps within structs ARROW-9195 - [Java] Wrong usage of Unsafe.get from bytearray in ByteFunctionsHelper class ARROW-7610 - [Java] Finish support for 64 bit int allocations ARROW-8115 - [Python] Conversion when mixing NaT and datetime objects not working ARROW-8392 - [Java] Fix overflow related corner cases for vector value comparison ARROW-8537 - [C++] Performance regression from ARROW-8523 ARROW-8803 - [Java] Row count should be set before loading buffers in VectorLoader ARROW-8911 - [C++] Slicing a ChunkedArray with zero chunks segfaults View release notes here: https://arrow.apache.org/release/1.0.1.html https://arrow.apache.org/release/1.0.0.html ### Why are the changes needed? Upgrade brings fixes, improvements and stability guarantees. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests with pyarrow 1.0.0 and 1.0.1 Closes #29686 from BryanCutler/arrow-upgrade-100-SPARK-32312. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
_static | ||
_templates | ||
development | ||
getting_started | ||
migration_guide | ||
reference | ||
user_guide | ||
conf.py | ||
index.rst |