689386b1c6
This patch modifies Spark's SBT build so that it no longer uses `retrieveManaged` / `lib_managed` to store its dependencies. The motivations for this change are nicely described on the JIRA ticket ([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal interest in doing this stems from the fact that `lib_managed` has caused me some pain while debugging dependency issues in another PR of mine. Removing our use of `lib_managed` would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of maintaining compatibility, I have chosen to retain the `lib_managed/jars` directory _only_ for these Datanucleus JARs and have added custom code to `SparkBuild.scala` to automatically copy those JARs to that folder as part of the `assembly` task. `dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead. /cc dragos marmbrus pwendell srowen Author: Josh Rosen <joshrosen@databricks.com> Closes #9575 from JoshRosen/SPARK-7841. |
||
---|---|---|
.. | ||
audit-release | ||
create-release | ||
sparktestsupport | ||
tests | ||
.gitignore | ||
change-scala-version.sh | ||
change-version-to-2.10.sh | ||
change-version-to-2.11.sh | ||
check-license | ||
github_jira_sync.py | ||
lint-python | ||
lint-r | ||
lint-r.R | ||
lint-scala | ||
merge_spark_pr.py | ||
mima | ||
README.md | ||
run-tests | ||
run-tests-jenkins | ||
run-tests-jenkins.py | ||
run-tests.py | ||
scalastyle |
Spark Developer Scripts
This directory contains scripts useful to developers when packaging, testing, or committing to Spark.
Many of these scripts require Apache credentials to work correctly.