From aa388cf3d0ff230eb0397876fe2db03bbe51658e Mon Sep 17 00:00:00 2001 From: HyukjinKwon Date: Fri, 8 Jan 2021 09:28:31 +0900 Subject: [PATCH] [SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark documentation ### What changes were proposed in this pull request? This PR proposes to: - Add a link of quick start in PySpark docs into "Programming Guides" in Spark main docs - `ML` / `MLlib` -> `MLlib (DataFrame-based)` / `MLlib (RDD-based)` in API reference page - Mention other user guides as well because the guide such as [ML](http://spark.apache.org/docs/latest/ml-guide.html) and [SQL](http://spark.apache.org/docs/latest/sql-programming-guide.html). - Mention other migration guides as well because PySpark can get affected by it. ### Why are the changes needed? For better documentation. ### Does this PR introduce _any_ user-facing change? It fixes user-facing docs. However, it's not released out yet. ### How was this patch tested? Manually tested by running: ```bash cd docs SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch ``` Closes #31082 from HyukjinKwon/SPARK-34041. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- docs/_layouts/global.html | 1 + docs/index.md | 2 ++ python/docs/source/getting_started/index.rst | 3 +++ python/docs/source/migration_guide/index.rst | 12 ++++++++++-- python/docs/source/reference/pyspark.ml.rst | 12 ++++++------ python/docs/source/reference/pyspark.mllib.rst | 4 ++-- python/docs/source/user_guide/index.rst | 12 ++++++++++++ 7 files changed, 36 insertions(+), 10 deletions(-) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index de98f29acf..f10d46763c 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -84,6 +84,7 @@ MLlib (Machine Learning) GraphX (Graph Processing) SparkR (R on Spark) + PySpark (Python on Spark) diff --git a/docs/index.md b/docs/index.md index 8fd169e63f..c4c2d722f9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -113,6 +113,8 @@ options for deployment: * [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API) * [MLlib](ml-guide.html): applying machine learning algorithms * [GraphX](graphx-programming-guide.html): processing graphs +* [SparkR](sparkr.html): processing data with Spark in R +* [PySpark](api/python/getting_started/index.html): processing data with Spark in Python **API Docs:** diff --git a/python/docs/source/getting_started/index.rst b/python/docs/source/getting_started/index.rst index 9fa3352ae2..38b9c935fc 100644 --- a/python/docs/source/getting_started/index.rst +++ b/python/docs/source/getting_started/index.rst @@ -21,6 +21,9 @@ Getting Started =============== This page summarizes the basic steps required to setup and get started with PySpark. +There are more guides shared with other languages such as +`Quick Start `_ in Programming Guides +at `the Spark documentation `_. .. toctree:: :maxdepth: 2 diff --git a/python/docs/source/migration_guide/index.rst b/python/docs/source/migration_guide/index.rst index 41e36b16b3..88e768dc46 100644 --- a/python/docs/source/migration_guide/index.rst +++ b/python/docs/source/migration_guide/index.rst @@ -21,8 +21,6 @@ Migration Guide =============== This page describes the migration guide specific to PySpark. -Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components. -Please also refer other migration guides such as `Migration Guide: SQL, Datasets and DataFrame `_. .. toctree:: :maxdepth: 2 @@ -33,3 +31,13 @@ Please also refer other migration guides such as `Migration Guide: SQL, Datasets pyspark_2.2_to_2.3 pyspark_1.4_to_1.5 pyspark_1.0_1.2_to_1.3 + + +Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components. +Please also refer other migration guides: + +- `Migration Guide: Spark Core `_ +- `Migration Guide: SQL, Datasets and DataFrame `_ +- `Migration Guide: Structured Streaming `_ +- `Migration Guide: MLlib (Machine Learning) `_ + diff --git a/python/docs/source/reference/pyspark.ml.rst b/python/docs/source/reference/pyspark.ml.rst index 2de0ff65a3..cc904597d2 100644 --- a/python/docs/source/reference/pyspark.ml.rst +++ b/python/docs/source/reference/pyspark.ml.rst @@ -16,11 +16,11 @@ under the License. -ML -== +MLlib (DataFrame-based) +======================= -ML Pipeline APIs ----------------- +Pipeline APIs +------------- .. currentmodule:: pyspark.ml @@ -188,8 +188,8 @@ Clustering PowerIterationClustering -ML Functions ----------------------------- +Functions +--------- .. currentmodule:: pyspark.ml.functions diff --git a/python/docs/source/reference/pyspark.mllib.rst b/python/docs/source/reference/pyspark.mllib.rst index df5ea017d0..12fc4798dd 100644 --- a/python/docs/source/reference/pyspark.mllib.rst +++ b/python/docs/source/reference/pyspark.mllib.rst @@ -16,8 +16,8 @@ under the License. -MLlib -===== +MLlib (RDD-based) +================= Classification -------------- diff --git a/python/docs/source/user_guide/index.rst b/python/docs/source/user_guide/index.rst index 3e535ce16b..704156b11d 100644 --- a/python/docs/source/user_guide/index.rst +++ b/python/docs/source/user_guide/index.rst @@ -20,9 +20,21 @@ User Guide ========== +This page is the guide for PySpark users which contains PySpark specific topics. + .. toctree:: :maxdepth: 2 arrow_pandas python_packaging + +There are more guides shared with other languages in Programming Guides +at `the Spark documentation `_. + +- `RDD Programming Guide `_ +- `Spark SQL, DataFrames and Datasets Guide `_ +- `Structured Streaming Programming Guide `_ +- `Spark Streaming Programming Guide `_ +- `Machine Learning Library (MLlib) Guide `_ +