[SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

### What changes were proposed in this pull request?

This PR proposes to write the main page of PySpark documentation. The base work is finished at https://github.com/apache/spark/pull/29188.

### Why are the changes needed?

For better usability and readability in PySpark documentation.

### Does this PR introduce _any_ user-facing change?

Yes, it creates a new main page as below:

![Screen Shot 2020-07-31 at 10 02 44 PM](https://user-images.githubusercontent.com/6477701/89037618-d2d68880-d379-11ea-9a44-562f2aa0e3fd.png)

### How was this patch tested?

Manually built the PySpark documentation.

```bash
cd python
make clean html
```

Closes #29320 from HyukjinKwon/SPARK-32507.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
HyukjinKwon 2020-08-05 11:14:14 +09:00
parent 0660a0501d
commit 15b73339d9
3 changed files with 36 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

View file

@ -21,8 +21,44 @@
PySpark Documentation
=====================
.. TODO(SPARK-32204): Add Binder integration at Live Notebook.
PySpark is an interface for Apache Spark in Python. It not only allows you to write
Spark applications using Python APIs, but also provides the PySpark shell for
interactively analyzing your data in a distributed environment. PySpark supports most
of Spark's features such as Spark SQL, DataFrame, Streaming, MLlib
(Machine Learning) and Spark Core.
.. image:: ../../../docs/img/pyspark-components.png
:alt: PySpark Compoenents
**Spark SQL and DataFrame**
Spark SQL is a Spark module for structured data processing. It provides
a programming abstraction called DataFrame and can also act as distributed
SQL query engine.
**Streaming**
Running on top of Spark, the streaming feature in Apache Spark enables powerful
interactive and analytical applications across both streaming and historical data,
while inheriting Sparks ease of use and fault tolerance characteristics.
**MLlib**
Built on top of Spark, MLlib is a scalable machine learning library that provides
a uniform set of high-level APIs that help users create and tune practical machine
learning pipelines.
**Spark Core**
Spark Core is the underlying general execution engine for the Spark platform that all
other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset)
and in-memory computing capabilities.
.. toctree::
:maxdepth: 2
:hidden:
getting_started/index
user_guide/index