[SPARK-34542][BUILD] Upgrade Parquet to 1.12.0

### What changes were proposed in this pull request?

Parquet 1.12.0 New Feature
- PARQUET-41 - Add bloom filters to parquet statistics
- PARQUET-1373 - Encryption key management tools
- PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
- PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
- PARQUET-1784 - Column-wise configuration
- PARQUET-1817 - Crypto Properties Factory
- PARQUET-1854 - Properties-Driven Interface to Parquet Encryption

Parquet 1.12.0 release notes:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/CHANGES.md

### Why are the changes needed?

- Bloom filters to improve filter performance
- ZSTD enhancement

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test.

Closes #31649 from wangyum/SPARK-34542.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <yumwang@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
Yuming Wang 2021-03-27 07:56:29 -07:00 committed by Dongjoon Hyun
parent 468b944b00
commit cbffc12f90
4 changed files with 15 additions and 15 deletions

View file

@ -202,12 +202,12 @@ orc-shims/1.6.7//orc-shims-1.6.7.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.11.1//parquet-column-1.11.1.jar
parquet-common/1.11.1//parquet-common-1.11.1.jar
parquet-encoding/1.11.1//parquet-encoding-1.11.1.jar
parquet-format-structures/1.11.1//parquet-format-structures-1.11.1.jar
parquet-hadoop/1.11.1//parquet-hadoop-1.11.1.jar
parquet-jackson/1.11.1//parquet-jackson-1.11.1.jar
parquet-column/1.12.0//parquet-column-1.12.0.jar
parquet-common/1.12.0//parquet-common-1.12.0.jar
parquet-encoding/1.12.0//parquet-encoding-1.12.0.jar
parquet-format-structures/1.12.0//parquet-format-structures-1.12.0.jar
parquet-hadoop/1.12.0//parquet-hadoop-1.12.0.jar
parquet-jackson/1.12.0//parquet-jackson-1.12.0.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.2//py4j-0.10.9.2.jar
pyrolite/4.30//pyrolite-4.30.jar

View file

@ -173,12 +173,12 @@ orc-shims/1.6.7//orc-shims-1.6.7.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.11.1//parquet-column-1.11.1.jar
parquet-common/1.11.1//parquet-common-1.11.1.jar
parquet-encoding/1.11.1//parquet-encoding-1.11.1.jar
parquet-format-structures/1.11.1//parquet-format-structures-1.11.1.jar
parquet-hadoop/1.11.1//parquet-hadoop-1.11.1.jar
parquet-jackson/1.11.1//parquet-jackson-1.11.1.jar
parquet-column/1.12.0//parquet-column-1.12.0.jar
parquet-common/1.12.0//parquet-common-1.12.0.jar
parquet-encoding/1.12.0//parquet-encoding-1.12.0.jar
parquet-format-structures/1.12.0//parquet-format-structures-1.12.0.jar
parquet-hadoop/1.12.0//parquet-hadoop-1.12.0.jar
parquet-jackson/1.12.0//parquet-jackson-1.12.0.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.2//py4j-0.10.9.2.jar
pyrolite/4.30//pyrolite-4.30.jar

View file

@ -136,7 +136,7 @@
<kafka.version>2.6.0</kafka.version>
<!-- After 10.15.1.3, the minimum required version is JDK9 -->
<derby.version>10.14.2.0</derby.version>
<parquet.version>1.11.1</parquet.version>
<parquet.version>1.12.0</parquet.version>
<orc.version>1.6.7</orc.version>
<jetty.version>9.4.37.v20210219</jetty.version>
<jakartaservlet.version>4.0.3</jakartaservlet.version>
@ -2095,7 +2095,7 @@
<groupId>${hive.group}</groupId>
<artifactId>hive-service-rpc</artifactId>
</exclusion>
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.10.1 -->
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.12.0 -->
<exclusion>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop-bundle</artifactId>

View file

@ -1502,7 +1502,7 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
Seq(tbl, ext_tbl).foreach { tblName =>
sql(s"INSERT INTO $tblName VALUES (1, 'a', '2019-12-13')")
val expectedSize = 651
val expectedSize = 657
// analyze table
sql(s"ANALYZE TABLE $tblName COMPUTE STATISTICS NOSCAN")
var tableStats = getTableStats(tblName)