From 13c2b711e4c1d75d863aa2000d3ec444824ff8bc Mon Sep 17 00:00:00 2001 From: Hyukjin Kwon Date: Wed, 29 Sep 2021 14:45:58 +0900 Subject: [PATCH] [MINOR][DOCS] Mention other Python dependency tools in documentation ### What changes were proposed in this pull request? Self-contained. ### Why are the changes needed? For user's more information on available Python dependency management in PySpark. ### Does this PR introduce _any_ user-facing change? Yes, documentation change. ### How was this patch tested? Manaully built the docs and checked the results: Screen Shot 2021-09-29 at 10 11 56 AM Screen Shot 2021-09-29 at 10 12 22 AM Screen Shot 2021-09-29 at 10 12 42 AM Closes #34134 from HyukjinKwon/minor-docs-py-deps. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- docs/quick-start.md | 2 ++ docs/rdd-programming-guide.md | 8 ++++---- docs/submitting-applications.md | 3 ++- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/quick-start.md b/docs/quick-start.md index 958e1ba920..4d3f1e2b34 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -450,6 +450,8 @@ Lines with a: 46, Lines with b: 23 +Other dependency management tools such as Conda and pip can be also used for custom classes or third-party libraries. See also [Python Package Management](api/python/user_guide/python_packaging.html). + # Where to Go from Here Congratulations on running your first Spark application! diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md index 74412d4930..5c7121870f 100644 --- a/docs/rdd-programming-guide.md +++ b/docs/rdd-programming-guide.md @@ -241,12 +241,12 @@ For a complete list of options, run `spark-shell --help`. Behind the scenes, In the PySpark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `--master` argument, and you can add Python .zip, .egg or .py files -to the runtime path by passing a comma-separated list to `--py-files`. You can also add dependencies +to the runtime path by passing a comma-separated list to `--py-files`. For third-party Python dependencies, +see [Python Package Management](api/python/user_guide/python_packaging.html). You can also add dependencies (e.g. Spark Packages) to your shell session by supplying a comma-separated list of Maven coordinates to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. Sonatype) -can be passed to the `--repositories` argument. Any Python dependencies a Spark package has (listed in -the requirements.txt of that package) must be manually installed using `pip` when necessary. -For example, to run `bin/pyspark` on exactly four cores, use: +can be passed to the `--repositories` argument. For example, to run +`bin/pyspark` on exactly four cores, use: {% highlight bash %} $ ./bin/pyspark --master local[4] diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md index 402dd0614f..39e3473ea8 100644 --- a/docs/submitting-applications.md +++ b/docs/submitting-applications.md @@ -35,7 +35,8 @@ script as shown here while passing your jar. For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg` files to be distributed with your application. If you depend on multiple Python files we recommend -packaging them into a `.zip` or `.egg`. +packaging them into a `.zip` or `.egg`. For third-party Python dependencies, +see [Python Package Management](api/python/user_guide/python_packaging.html). # Launching Applications with spark-submit