777b4502b2
## What changes were proposed in this pull request? When we compile and test Hadoop 3.2, we will hint the following two issues: 1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0) 2. java.lang.NoSuchFieldError: BROTLI at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0) The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1. I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue. This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723). This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189). ## How was this patch tested? 1. manual tests 2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962) Closes #24346 from wangyum/SPARK-27176. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> |
||
---|---|---|
.. | ||
create-release | ||
deps | ||
sparktestsupport | ||
tests | ||
.gitignore | ||
.rat-excludes | ||
.scalafmt.conf | ||
appveyor-guide.md | ||
appveyor-install-dependencies.ps1 | ||
change-scala-version.sh | ||
check-license | ||
checkstyle-suppressions.xml | ||
checkstyle.xml | ||
github_jira_sync.py | ||
lint-java | ||
lint-python | ||
lint-r | ||
lint-r.R | ||
lint-scala | ||
make-distribution.sh | ||
merge_spark_pr.py | ||
mima | ||
pip-sanity-check.py | ||
README.md | ||
requirements.txt | ||
run-pip-tests | ||
run-tests | ||
run-tests-jenkins | ||
run-tests-jenkins.py | ||
run-tests.py | ||
sbt-checkstyle | ||
scalafmt | ||
scalastyle | ||
test-dependencies.sh | ||
tox.ini |
Spark Developer Scripts
This directory contains scripts useful to developers when packaging, testing, or committing to Spark.
Many of these scripts require Apache credentials to work correctly.