History

Chao Sun b6f46ca297 [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile ### What changes were proposed in this pull request? This: 1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. 2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2 Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes #30701 from sunchao/test-hadoop-3.2.2. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2021-01-15 14:06:50 -08:00
..
create-release	[SPARK-33927][BUILD] Fix Dockerfile for Spark release to work	2020-12-30 16:37:23 +09:00
deps	[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile	2021-01-15 14:06:50 -08:00
sparktestsupport	[SPARK-32320][PYSPARK] Remove mutable default arguments	2020-12-08 09:35:36 +08:00
tests	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-31953][SS] Add Spark Structured Streaming History Server Support	2020-12-02 17:11:51 -08:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
appveyor-install-dependencies.ps1	[SPARK-33105][INFRA] Change default R arch from i386 to x64 and parametrize BINPREF	2020-10-10 13:48:26 +09:00
change-scala-version.sh	[SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13	2019-12-03 08:59:43 -08:00
check-license	[MINOR][INFRA] Suppress warning in check-license	2020-11-23 10:38:40 +09:00
checkstyle-suppressions.xml	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+	2019-11-03 15:13:06 -08:00
checkstyle.xml	[MINOR] Fix google style guide address	2019-12-12 11:04:01 -06:00
github_jira_sync.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency	2020-10-27 14:03:57 +09:00
lint-r	[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors	2019-11-17 10:09:46 -08:00
lint-r.R	[MINOR][R] small tidying of sh scripts for R	2020-04-30 16:58:05 -07:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh	2020-03-11 08:22:02 -05:00
merge_spark_pr.py	[MINOR] Fix usage print to guide pip3 to install jira-python library	2020-09-03 01:10:59 +09:00
mima	[SPARK-33510][BUILD] Update SBT to 1.4.4	2020-11-22 22:56:59 -08:00
pip-sanity-check.py	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency	2020-10-27 14:03:57 +09:00
run-pip-tests	[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test	2020-07-25 13:09:23 +09:00
run-tests	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins	[SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script	2020-11-24 09:50:10 -08:00
run-tests-jenkins.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
run-tests.py	[SPARK-33802][INFRA][FOLLOW-UP] Separate arguments properly for -c option in git command for PySpark coverage	2020-12-16 23:42:34 +09:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature	2020-01-23 12:44:43 -08:00
scalastyle	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`"	2020-01-21 18:23:03 +09:00
test-dependencies.sh	[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)	2020-10-05 15:29:56 -07:00
tox.ini	[SPARK-33749][BUILD][PYTHON] Exclude target directory in pycodestyle and flake8	2020-12-11 14:15:56 +09:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.