spark-instrumented-optimizer/dev
Yuming Wang 777b4502b2 [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2
## What changes were proposed in this pull request?

When we compile and test Hadoop 3.2, we will hint the following two issues:
1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0)
2. java.lang.NoSuchFieldError: BROTLI
    at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0)

The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1.
I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue.

This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723).

This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189).

## How was this patch tested?

1. manual tests
2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962)

Closes #24346 from wangyum/SPARK-27176.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-04-19 08:59:08 -07:00
..
create-release [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 2019-03-25 10:46:42 -05:00
deps [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2 2019-04-19 08:59:08 -07:00
sparktestsupport [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00
tests [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-27358][UI] Update jquery to 1.12.x to pick up security fixes 2019-04-05 12:54:01 -05:00
.scalafmt.conf [SPARK-26177] Config change followup to [] Automated formatting for Scala code 2018-12-03 10:03:51 -06:00
appveyor-guide.md [SPARK-26918][DOCS] All .md should have ASF license header 2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1 [SPARK-26212][BUILD][TEST-MAVEN] Upgrade maven version to 3.6.0 2018-12-01 07:06:18 -06:00
change-scala-version.sh [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 2019-03-25 10:46:42 -05:00
check-license [MINOR][BUILD] Upgrade apache-rat to 0.13 2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml [MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org" 2019-02-25 11:25:53 -08:00
checkstyle.xml [MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org" 2019-02-25 11:25:53 -08:00
github_jira_sync.py [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-python [BUILD] refactor dev/lint-python in to something readable 2018-11-20 12:38:40 -08:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
lint-scala [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
make-distribution.sh [SPARK-26095][BUILD] Disable parallelization in make-distibution.sh. 2018-11-16 15:57:38 -08:00
merge_spark_pr.py [SPARK-27277][INFRA] Recover from setting fix version failure in merge script 2019-03-26 21:14:07 +09:00
mima [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
pip-sanity-check.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names 2018-09-07 09:35:25 -07:00
run-pip-tests Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
run-tests [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7 2017-10-22 02:22:35 +09:00
run-tests-jenkins [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
run-tests-jenkins.py [SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 2019-03-16 19:42:05 -05:00
run-tests.py [SPARK-25079][PYTHON] update python3 executable to 3.6.x 2019-04-19 10:03:50 +09:00
sbt-checkstyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
scalafmt [SPARK-26177] Automated formatting for Scala code 2018-11-29 08:54:31 -06:00
scalastyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
test-dependencies.sh [SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0 2019-03-16 19:42:05 -05:00
tox.ini [SPARK-23367][BUILD] Include python document style checking 2018-10-27 08:20:42 -05:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.