History

Chao Sun cb3fa6c936 [SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile ### What changes were proposed in this pull request? This switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. For Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? This serves two purposes: - to unblock Spark from upgrading to Hadoop 3.2.2/3.3.0+. Latest Hadoop versions have upgraded to use Guava 27+ and in order to adopt the latest Hadoop versions in Spark, we'll need to resolve the Guava conflicts. This takes the approach by switching to shaded client jars provided by Hadoop. - avoid pulling 3rd party dependencies from Hadoop and avoid potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes #29843 from sunchao/SPARK-29250. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>		2020-10-22 03:21:34 +00:00
..
create-release	[SPARK-32982][BUILD] Remove hive-1.2 profiles in PIP installation option	2020-09-24 14:49:58 +09:00
deps	[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile	2020-10-22 03:21:34 +00:00
sparktestsupport	[SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI	2020-09-23 09:30:51 +09:00
tests	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-32723][WEBUI] Upgrade to jQuery 3.5.1	2020-09-30 21:30:17 -07:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	[SPARK-26918][DOCS] All .md should have ASF license header	2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1	[SPARK-33105][INFRA] Change default R arch from i386 to x64 and parametrize BINPREF	2020-10-10 13:48:26 +09:00
change-scala-version.sh	[SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13	2019-12-03 08:59:43 -08:00
check-license	[MINOR][BUILD] Upgrade apache-rat to 0.13	2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+	2019-11-03 15:13:06 -08:00
checkstyle.xml	[MINOR] Fix google style guide address	2019-12-12 11:04:01 -06:00
github_jira_sync.py	[MINOR] Fix usage print to guide pip3 to install jira-python library	2020-09-03 01:10:59 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-17333][PYSPARK] Enable mypy	2020-10-19 12:50:01 -07:00
lint-r	[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors	2019-11-17 10:09:46 -08:00
lint-r.R	[MINOR][R] small tidying of sh scripts for R	2020-04-30 16:58:05 -07:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh	2020-03-11 08:22:02 -05:00
merge_spark_pr.py	[MINOR] Fix usage print to guide pip3 to install jira-python library	2020-09-03 01:10:59 +09:00
mima	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
pip-sanity-check.py	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation	2020-08-26 12:23:24 +09:00
run-pip-tests	[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test	2020-07-25 13:09:23 +09:00
run-tests	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins.py	[SPARK-33082][SPARK-20202][BUILD][SQL][FOLLOW-UP] Remove Hive 1.2 workarounds and Hive 1.2 profile in Jenkins script	2020-10-09 03:04:26 -07:00
run-tests.py	[SPARK-33179][TESTS] Switch default Hadoop profile in run-tests.py	2020-10-19 15:54:52 +09:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature	2020-01-23 12:44:43 -08:00
scalastyle	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`"	2020-01-21 18:23:03 +09:00
test-dependencies.sh	[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)	2020-10-05 15:29:56 -07:00
tox.ini	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.