e0538bd38c
### What changes were proposed in this pull request? Upgrade Apache Arrow to version 1.0.1 for the Java dependency and increase minimum version of PyArrow to 1.0.0. This release marks a transition to binary stability of the columnar format (which was already informally backward-compatible going back to December 2017) and a transition to Semantic Versioning for the Arrow software libraries. Also note that the Java arrow-memory artifact has been split to separate dependence on netty-buffer and allow users to select an allocator. Spark will continue to use `arrow-memory-netty` to maintain performance benefits. Version 1.0.0 - 1.0.0 include the following selected fixes/improvements relevant to Spark users: ARROW-9300 - [Java] Separate Netty Memory to its own module ARROW-9272 - [C++][Python] Reduce complexity in python to arrow conversion ARROW-9016 - [Java] Remove direct references to Netty/Unsafe Allocators ARROW-8664 - [Java] Add skip null check to all Vector types ARROW-8485 - [Integration][Java] Implement extension types integration ARROW-8434 - [C++] Ipc RecordBatchFileReader deserializes the Schema multiple times ARROW-8314 - [Python] Provide a method to select a subset of columns of a Table ARROW-8230 - [Java] Move Netty memory manager into a separate module ARROW-8229 - [Java] Move ArrowBuf into the Arrow package ARROW-7955 - [Java] Support large buffer for file/stream IPC ARROW-7831 - [Java] unnecessary buffer allocation when calling splitAndTransferTo on variable width vectors ARROW-6111 - [Java] Support LargeVarChar and LargeBinary types and add integration test with C++ ARROW-6110 - [Java] Support LargeList Type and add integration test with C++ ARROW-5760 - [C++] Optimize Take implementation ARROW-300 - [Format] Add body buffer compression option to IPC message protocol using LZ4 or ZSTD ARROW-9098 - RecordBatch::ToStructArray cannot handle record batches with 0 column ARROW-9066 - [Python] Raise correct error in isnull() ARROW-9223 - [Python] Fix to_pandas() export for timestamps within structs ARROW-9195 - [Java] Wrong usage of Unsafe.get from bytearray in ByteFunctionsHelper class ARROW-7610 - [Java] Finish support for 64 bit int allocations ARROW-8115 - [Python] Conversion when mixing NaT and datetime objects not working ARROW-8392 - [Java] Fix overflow related corner cases for vector value comparison ARROW-8537 - [C++] Performance regression from ARROW-8523 ARROW-8803 - [Java] Row count should be set before loading buffers in VectorLoader ARROW-8911 - [C++] Slicing a ChunkedArray with zero chunks segfaults View release notes here: https://arrow.apache.org/release/1.0.1.html https://arrow.apache.org/release/1.0.0.html ### Why are the changes needed? Upgrade brings fixes, improvements and stability guarantees. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests with pyarrow 1.0.0 and 1.0.1 Closes #29686 from BryanCutler/arrow-upgrade-100-SPARK-32312. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
207 lines
7.1 KiB
XML
207 lines
7.1 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
~ contributor license agreements. See the NOTICE file distributed with
|
|
~ this work for additional information regarding copyright ownership.
|
|
~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
~ (the "License"); you may not use this file except in compliance with
|
|
~ the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing, software
|
|
~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
~ See the License for the specific language governing permissions and
|
|
~ limitations under the License.
|
|
-->
|
|
|
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
<modelVersion>4.0.0</modelVersion>
|
|
<parent>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-parent_2.12</artifactId>
|
|
<version>3.1.0-SNAPSHOT</version>
|
|
<relativePath>../../pom.xml</relativePath>
|
|
</parent>
|
|
|
|
<artifactId>spark-catalyst_2.12</artifactId>
|
|
<packaging>jar</packaging>
|
|
<name>Spark Project Catalyst</name>
|
|
<url>http://spark.apache.org/</url>
|
|
<properties>
|
|
<sbt.project.name>catalyst</sbt.project.name>
|
|
</properties>
|
|
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>org.scala-lang</groupId>
|
|
<artifactId>scala-reflect</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.scala-lang.modules</groupId>
|
|
<artifactId>scala-parser-combinators_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-tags_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
|
|
<!--
|
|
This spark-tags test-dep is needed even though it isn't used in this module, otherwise testing-cmds that exclude
|
|
them will yield errors.
|
|
-->
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-tags_${scala.binary.version}</artifactId>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.mockito</groupId>
|
|
<artifactId>mockito-core</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-unsafe_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-sketch_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.scalacheck</groupId>
|
|
<artifactId>scalacheck_${scala.binary.version}</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.codehaus.janino</groupId>
|
|
<artifactId>janino</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.codehaus.janino</groupId>
|
|
<artifactId>commons-compiler</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.antlr</groupId>
|
|
<artifactId>antlr4-runtime</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>commons-codec</groupId>
|
|
<artifactId>commons-codec</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.univocity</groupId>
|
|
<artifactId>univocity-parsers</artifactId>
|
|
<type>jar</type>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.arrow</groupId>
|
|
<artifactId>arrow-vector</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.arrow</groupId>
|
|
<artifactId>arrow-memory-netty</artifactId>
|
|
</dependency>
|
|
</dependencies>
|
|
<build>
|
|
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
|
|
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
|
|
<plugins>
|
|
<!--
|
|
This plugin forces the generation of jar containing catalyst test classes,
|
|
so that the tests classes of external modules can use them. The two execution profiles
|
|
are necessary - first one for 'mvn package', second one for 'mvn test-compile'. Ideally,
|
|
'mvn compile' should not compile test classes and therefore should not need this.
|
|
However, a closed due to "Cannot Reproduce" Maven bug (https://issues.apache.org/jira/browse/MNG-3559)
|
|
causes the compilation to fail if catalyst test-jar is not generated. Hence, the
|
|
second execution profile for 'mvn test-compile'.
|
|
-->
|
|
<plugin>
|
|
<groupId>org.apache.maven.plugins</groupId>
|
|
<artifactId>maven-jar-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>prepare-test-jar</id>
|
|
<phase>test-compile</phase>
|
|
<goals>
|
|
<goal>test-jar</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.scalatest</groupId>
|
|
<artifactId>scalatest-maven-plugin</artifactId>
|
|
<configuration>
|
|
<argLine>-ea -Xmx4g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true</argLine>
|
|
</configuration>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.antlr</groupId>
|
|
<artifactId>antlr4-maven-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<goals>
|
|
<goal>antlr4</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
<configuration>
|
|
<visitor>true</visitor>
|
|
<sourceDirectory>../catalyst/src/main/antlr4</sourceDirectory>
|
|
<treatWarningsAsErrors>true</treatWarningsAsErrors>
|
|
</configuration>
|
|
</plugin>
|
|
<plugin>
|
|
<groupId>org.codehaus.mojo</groupId>
|
|
<artifactId>build-helper-maven-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>add-sources</id>
|
|
<phase>generate-sources</phase>
|
|
<goals>
|
|
<goal>add-source</goal>
|
|
</goals>
|
|
<configuration>
|
|
<sources>
|
|
<source>src/main/scala-${scala.binary.version}</source>
|
|
</sources>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
</plugins>
|
|
</build>
|
|
|
|
<profiles>
|
|
<profile>
|
|
<id>scala-2.13</id>
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>org.scala-lang.modules</groupId>
|
|
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
|
|
</dependency>
|
|
</dependencies>
|
|
</profile>
|
|
</profiles>
|
|
</project>
|