[SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

### What changes were proposed in this pull request? This PR proposes to write the main page of PySpark documentation. The base work is finished at https://github.com/apache/spark/pull/29188. ### Why are the changes needed? For better usability and readability in PySpark documentation. ### Does this PR introduce _any_ user-facing change? Yes, it creates a new main page as below: ![Screen Shot 2020-07-31 at 10 02 44 PM](https://user-images.githubusercontent.com/6477701/89037618-d2d68880-d379-11ea-9a44-562f2aa0e3fd.png) ### How was this patch tested? Manually built the PySpark documentation. ```bash cd python make clean html ``` Closes #29320 from HyukjinKwon/SPARK-32507. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-05 11:14:14 +09:00 · 2020-08-05 11:14:14 +09:00 · 15b73339d9
parent 0660a0501d
commit 15b73339d9
3 changed files with 36 additions and 0 deletions
--- a/docs/img/pyspark-components.png
+++ b/docs/img/pyspark-components.png
--- a/docs/img/pyspark-components.pptx
+++ b/docs/img/pyspark-components.pptx
--- a/python/docs/source/index.rst
+++ b/python/docs/source/index.rst
@ -21,8 +21,44 @@
 PySpark Documentation
 =====================

+.. TODO(SPARK-32204): Add Binder integration at Live Notebook.
+
+PySpark is an interface for Apache Spark in Python. It not only allows you to write
+Spark applications using Python APIs, but also provides the PySpark shell for
+interactively analyzing your data in a distributed environment. PySpark supports most
+of Spark's features such as Spark SQL, DataFrame, Streaming, MLlib
+(Machine Learning) and Spark Core.
+
+.. image:: ../../../docs/img/pyspark-components.png
+  :alt: PySpark Compoenents
+
+**Spark SQL and DataFrame**
+
+Spark SQL is a Spark module for structured data processing. It provides
+a programming abstraction called DataFrame and can also act as distributed
+SQL query engine.
+
+**Streaming**
+
+Running on top of Spark, the streaming feature in Apache Spark enables powerful
+interactive and analytical applications across both streaming and historical data,
+while inheriting Spark’s ease of use and fault tolerance characteristics.
+
+**MLlib**
+
+Built on top of Spark, MLlib is a scalable machine learning library that provides
+a uniform set of high-level APIs that help users create and tune practical machine
+learning pipelines.
+
+**Spark Core**
+
+Spark Core is the underlying general execution engine for the Spark platform that all
+other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset)
+and in-memory computing capabilities.
+
 .. toctree::
    :maxdepth: 2
+    :hidden:

    getting_started/index
    user_guide/index