History

Dongjoon Hyun 290aa02179 [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work ### What changes were proposed in this pull request? This reverts commit SPARK-33212 (`cb3fa6c936`) mostly with three exceptions: 1. `SparkSubmitUtils` was updated recently by SPARK-33580 2. `resource-managers/yarn/pom.xml` was updated recently by SPARK-33104 to add `hadoop-yarn-server-resourcemanager` test dependency. 3. Adjust `com.fasterxml.jackson.module:jackson-module-jaxb-annotations` dependency in K8s module which is updated recently by SPARK-33471. ### Why are the changes needed? According to [HADOOP-16080](https://issues.apache.org/jira/browse/HADOOP-16080) since Apache Hadoop 3.1.1, `hadoop-aws` doesn't work with `hadoop-client-api`. It fails at write operation like the following. 1. Spark distribution with `-Phadoop-cloud` ```scala $ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY 20/11/30 23:01:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context available as 'sc' (master = local[], app id = local-1606806088715). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.parquet("s3a://dongjoon/users.parquet").show 20/11/30 23:01:34 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties +------+--------------+----------------+ \| name\|favorite_color\|favorite_numbers\| +------+--------------+----------------+ \|Alyssa\| null\| [3, 9, 15, 20]\| \| Ben\| red\| []\| +------+--------------+----------------+ scala> Seq(1).toDF.write.parquet("s3a://dongjoon/out.parquet") 20/11/30 23:02:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)/ 1] java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V ``` 2. Spark distribution without `-Phadoop-cloud`* ```scala $ bin/spark-shell --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0 ... java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. Closes #30508 from dongjoon-hyun/SPARK-33212-REVERT. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-12-02 18:23:48 +09:00
..
create-release	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
deps	[SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work	2020-12-02 18:23:48 +09:00
sparktestsupport	[SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading	2020-12-01 09:36:42 +08:00
tests	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-33483][INFRA][TESTS] Fix rat exclusion patterns and add a LICENSE	2020-11-18 23:59:11 -08:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
appveyor-install-dependencies.ps1	[SPARK-33105][INFRA] Change default R arch from i386 to x64 and parametrize BINPREF	2020-10-10 13:48:26 +09:00
change-scala-version.sh	[SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13	2019-12-03 08:59:43 -08:00
check-license	[MINOR][INFRA] Suppress warning in check-license	2020-11-23 10:38:40 +09:00
checkstyle-suppressions.xml	[SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+	2019-11-03 15:13:06 -08:00
checkstyle.xml	[MINOR] Fix google style guide address	2019-12-12 11:04:01 -06:00
github_jira_sync.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency	2020-10-27 14:03:57 +09:00
lint-r	[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors	2019-11-17 10:09:46 -08:00
lint-r.R	[MINOR][R] small tidying of sh scripts for R	2020-04-30 16:58:05 -07:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh	2020-03-11 08:22:02 -05:00
merge_spark_pr.py	[MINOR] Fix usage print to guide pip3 to install jira-python library	2020-09-03 01:10:59 +09:00
mima	[SPARK-33510][BUILD] Update SBT to 1.4.4	2020-11-22 22:56:59 -08:00
pip-sanity-check.py	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency	2020-10-27 14:03:57 +09:00
run-pip-tests	[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test	2020-07-25 13:09:23 +09:00
run-tests	[SPARK-29672][PYSPARK] update spark testing framework to use python3	2019-11-14 10:18:55 -08:00
run-tests-jenkins	[SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script	2020-11-24 09:50:10 -08:00
run-tests-jenkins.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
run-tests.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature	2020-01-23 12:44:43 -08:00
scalastyle	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`"	2020-01-21 18:23:03 +09:00
test-dependencies.sh	[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)	2020-10-05 15:29:56 -07:00
tox.ini	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.