History

Yuming Wang 777b4502b2 [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2 ## What changes were proposed in this pull request? When we compile and test Hadoop 3.2, we will hint the following two issues: 1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0) 2. java.lang.NoSuchFieldError: BROTLI at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0) The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1. I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue. This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723). This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189). ## How was this patch tested? 1. manual tests 2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962) Closes #24346 from wangyum/SPARK-27176. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>		2019-04-19 08:59:08 -07:00
..
create-release	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0	2019-03-25 10:46:42 -05:00
deps	[SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2	2019-04-19 08:59:08 -07:00
sparktestsupport	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
tests	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-27358][UI] Update jquery to 1.12.x to pick up security fixes	2019-04-05 12:54:01 -05:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	[SPARK-26918][DOCS] All .md should have ASF license header	2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1	[SPARK-26212][BUILD][TEST-MAVEN] Upgrade maven version to 3.6.0	2018-12-01 07:06:18 -06:00
change-scala-version.sh	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0	2019-03-25 10:46:42 -05:00
check-license	[MINOR][BUILD] Upgrade apache-rat to 0.13	2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml	[MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org "	2019-02-25 11:25:53 -08:00
checkstyle.xml	[MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org "	2019-02-25 11:25:53 -08:00
github_jira_sync.py	[MINOR] Fix a bunch of typos	2018-01-02 07:10:19 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[BUILD] refactor dev/lint-python in to something readable	2018-11-20 12:38:40 -08:00
lint-r	[SPARK-10328] [SPARKR] Fix generic for na.omit	2015-08-28 00:37:50 -07:00
lint-r.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-26095][BUILD] Disable parallelization in make-distibution.sh.	2018-11-16 15:57:38 -08:00
merge_spark_pr.py	[SPARK-27277][INFRA] Recover from setting fix version failure in merge script	2019-03-26 21:14:07 +09:00
mima	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
pip-sanity-check.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names	2018-09-07 09:35:25 -07:00
run-pip-tests	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
run-tests	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
run-tests-jenkins	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
run-tests-jenkins.py	[SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0	2019-03-16 19:42:05 -05:00
run-tests.py	[SPARK-25079][PYTHON] update python3 executable to 3.6.x	2019-04-19 10:03:50 +09:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-26177] Automated formatting for Scala code	2018-11-29 08:54:31 -06:00
scalastyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
test-dependencies.sh	[SPARK-27175][BUILD] Upgrade hadoop-3 to 3.2.0	2019-03-16 19:42:05 -05:00
tox.ini	[SPARK-23367][BUILD] Include python document style checking	2018-10-27 08:20:42 -05:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.