[SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation
### What changes were proposed in this pull request? This PR proposes to write the main page of PySpark documentation. The base work is finished at https://github.com/apache/spark/pull/29188. ### Why are the changes needed? For better usability and readability in PySpark documentation. ### Does this PR introduce _any_ user-facing change? Yes, it creates a new main page as below: ![Screen Shot 2020-07-31 at 10 02 44 PM](https://user-images.githubusercontent.com/6477701/89037618-d2d68880-d379-11ea-9a44-562f2aa0e3fd.png) ### How was this patch tested? Manually built the PySpark documentation. ```bash cd python make clean html ``` Closes #29320 from HyukjinKwon/SPARK-32507. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
0660a0501d
commit
15b73339d9
BIN
docs/img/pyspark-components.png
Normal file
BIN
docs/img/pyspark-components.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
BIN
docs/img/pyspark-components.pptx
Normal file
BIN
docs/img/pyspark-components.pptx
Normal file
Binary file not shown.
|
@ -21,8 +21,44 @@
|
|||
PySpark Documentation
|
||||
=====================
|
||||
|
||||
.. TODO(SPARK-32204): Add Binder integration at Live Notebook.
|
||||
|
||||
PySpark is an interface for Apache Spark in Python. It not only allows you to write
|
||||
Spark applications using Python APIs, but also provides the PySpark shell for
|
||||
interactively analyzing your data in a distributed environment. PySpark supports most
|
||||
of Spark's features such as Spark SQL, DataFrame, Streaming, MLlib
|
||||
(Machine Learning) and Spark Core.
|
||||
|
||||
.. image:: ../../../docs/img/pyspark-components.png
|
||||
:alt: PySpark Compoenents
|
||||
|
||||
**Spark SQL and DataFrame**
|
||||
|
||||
Spark SQL is a Spark module for structured data processing. It provides
|
||||
a programming abstraction called DataFrame and can also act as distributed
|
||||
SQL query engine.
|
||||
|
||||
**Streaming**
|
||||
|
||||
Running on top of Spark, the streaming feature in Apache Spark enables powerful
|
||||
interactive and analytical applications across both streaming and historical data,
|
||||
while inheriting Spark’s ease of use and fault tolerance characteristics.
|
||||
|
||||
**MLlib**
|
||||
|
||||
Built on top of Spark, MLlib is a scalable machine learning library that provides
|
||||
a uniform set of high-level APIs that help users create and tune practical machine
|
||||
learning pipelines.
|
||||
|
||||
**Spark Core**
|
||||
|
||||
Spark Core is the underlying general execution engine for the Spark platform that all
|
||||
other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset)
|
||||
and in-memory computing capabilities.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:hidden:
|
||||
|
||||
getting_started/index
|
||||
user_guide/index
|
||||
|
|
Loading…
Reference in a new issue