spark-instrumented-optimizer/dev
Holden Karau a36a76ac43 [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
## What changes were proposed in this pull request?

This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).

Done:
- pip installable on conda [manual tested]
- setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
- Automated testing of this (virtualenv)
- packaging and signing with release-build*

Possible follow up work:
- release-build update to publish to PyPI (SPARK-18128)
- figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
- Windows support and or testing ( SPARK-18136 )
- investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
- consider how we want to number our dev/snapshot versions

Explicitly out of scope:
- Using pip installed PySpark to start a standalone cluster
- Using pip installed PySpark for non-Python Spark programs

*I've done some work to test release-build locally but as a non-committer I've just done local testing.
## How was this patch tested?

Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.

release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)

Author: Holden Karau <holden@us.ibm.com>
Author: Juliet Hougland <juliet@cloudera.com>
Author: Juliet Hougland <not@myemail.com>

Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
2016-11-16 14:22:15 -08:00
..
create-release [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
deps [SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final 2016-11-12 09:49:14 +00:00
sparktestsupport [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
tests [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests 2015-12-30 12:47:42 -08:00
.gitignore [SPARK-6219] Reuse pep8.py 2015-04-18 16:46:28 -07:00
.rat-excludes [SPARK-17303] Added spark-warehouse to dev/.rat-excludes 2016-08-29 23:33:00 -07:00
appveyor-guide.md [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) 2016-09-08 08:26:59 -07:00
appveyor-install-dependencies.ps1 [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) 2016-09-08 08:26:59 -07:00
change-scala-version.sh [SPARK-9250] Make change-scala-version more helpful w.r.t. valid Scala versions 2015-07-24 17:09:33 +01:00
change-version-to-2.10.sh [SPARK-9304] [BUILD] Improve backwards compatibility of SPARK-8401 2015-07-25 11:05:08 +01:00
change-version-to-2.11.sh [SPARK-9304] [BUILD] Improve backwards compatibility of SPARK-8401 2015-07-25 11:05:08 +01:00
check-license [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs 2016-03-07 14:48:02 -08:00
checkstyle-suppressions.xml [MINOR] Fix Java Lint errors introduced by #13286 and #13280 2016-06-08 14:51:00 +01:00
checkstyle.xml [SPARK-18420][BUILD] Fix the errors caused by lint check in Java 2016-11-16 11:59:00 +00:00
github_jira_sync.py Fix install jira-python 2015-05-23 09:14:07 -07:00
lint-java [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
lint-python [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-14074][SPARKR] Specify commit sha1 ID when using install_github to install intr package. 2016-03-23 07:57:03 -07:00
lint-scala [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
make-distribution.sh [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
merge_spark_pr.py [SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible 2016-01-13 11:56:30 -08:00
mima [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
pip-sanity-check.py [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools 2016-01-24 11:48:28 -08:00
run-pip-tests [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
run-tests [SPARK-5161] Parallelize Python test execution 2015-06-29 21:32:40 -07:00
run-tests-jenkins [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python 2015-10-18 22:45:27 -07:00
run-tests-jenkins.py [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
run-tests.py [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
scalastyle [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
test-dependencies.sh [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
tox.ini [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs 2016-03-07 14:48:02 -08:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.