[SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark documentation

### What changes were proposed in this pull request?

This PR proposes to:
- Add a link of quick start in PySpark docs into "Programming Guides" in Spark main docs
- `ML` / `MLlib` -> `MLlib (DataFrame-based)` / `MLlib (RDD-based)` in API reference page
- Mention other user guides as well because the guide such as [ML](http://spark.apache.org/docs/latest/ml-guide.html) and [SQL](http://spark.apache.org/docs/latest/sql-programming-guide.html).
- Mention other migration guides as well because PySpark can get affected by it.

### Why are the changes needed?

For better documentation.

### Does this PR introduce _any_ user-facing change?

It fixes user-facing docs. However, it's not released out yet.

### How was this patch tested?

Manually tested by running:

```bash
cd docs
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
```

Closes #31082 from HyukjinKwon/SPARK-34041.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
HyukjinKwon 2021-01-08 09:28:31 +09:00
parent 7b06acc28b
commit aa388cf3d0
7 changed files with 36 additions and 10 deletions

View file

@ -84,6 +84,7 @@
<a class="dropdown-item" href="ml-guide.html">MLlib (Machine Learning)</a>
<a class="dropdown-item" href="graphx-programming-guide.html">GraphX (Graph Processing)</a>
<a class="dropdown-item" href="sparkr.html">SparkR (R on Spark)</a>
<a class="dropdown-item" href="api/python/getting_started/index.html">PySpark (Python on Spark)</a>
</div>
</li>

View file

@ -113,6 +113,8 @@ options for deployment:
* [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API)
* [MLlib](ml-guide.html): applying machine learning algorithms
* [GraphX](graphx-programming-guide.html): processing graphs
* [SparkR](sparkr.html): processing data with Spark in R
* [PySpark](api/python/getting_started/index.html): processing data with Spark in Python
**API Docs:**

View file

@ -21,6 +21,9 @@ Getting Started
===============
This page summarizes the basic steps required to setup and get started with PySpark.
There are more guides shared with other languages such as
`Quick Start <http://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides
at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.
.. toctree::
:maxdepth: 2

View file

@ -21,8 +21,6 @@ Migration Guide
===============
This page describes the migration guide specific to PySpark.
Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides such as `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_.
.. toctree::
:maxdepth: 2
@ -33,3 +31,13 @@ Please also refer other migration guides such as `Migration Guide: SQL, Datasets
pyspark_2.2_to_2.3
pyspark_1.4_to_1.5
pyspark_1.0_1.2_to_1.3
Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides:
- `Migration Guide: Spark Core <http://spark.apache.org/docs/latest/core-migration-guide.html>`_
- `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_
- `Migration Guide: Structured Streaming <http://spark.apache.org/docs/latest/ss-migration-guide.html>`_
- `Migration Guide: MLlib (Machine Learning) <http://spark.apache.org/docs/latest/ml-migration-guide.html>`_

View file

@ -16,11 +16,11 @@
under the License.
ML
==
MLlib (DataFrame-based)
=======================
ML Pipeline APIs
----------------
Pipeline APIs
-------------
.. currentmodule:: pyspark.ml
@ -188,8 +188,8 @@ Clustering
PowerIterationClustering
ML Functions
----------------------------
Functions
---------
.. currentmodule:: pyspark.ml.functions

View file

@ -16,8 +16,8 @@
under the License.
MLlib
=====
MLlib (RDD-based)
=================
Classification
--------------

View file

@ -20,9 +20,21 @@
User Guide
==========
This page is the guide for PySpark users which contains PySpark specific topics.
.. toctree::
:maxdepth: 2
arrow_pandas
python_packaging
There are more guides shared with other languages in Programming Guides
at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.
- `RDD Programming Guide <http://spark.apache.org/docs/latest/rdd-programming-guide.html>`_
- `Spark SQL, DataFrames and Datasets Guide <http://spark.apache.org/docs/latest/sql-programming-guide.html>`_
- `Structured Streaming Programming Guide <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>`_
- `Spark Streaming Programming Guide <http://spark.apache.org/docs/latest/streaming-programming-guide.html>`_
- `Machine Learning Library (MLlib) Guide <http://spark.apache.org/docs/latest/ml-guide.html>`_