[SPARK-35645][PYTHON][DOCS] Merge contents and remove obsolete pages in Getting Started section

### What changes were proposed in this pull request? This PR revise the installation to describe `pip install pyspark[pandas_on_spark]` and removes pandas-on-Spark installation and videos/blogposts. ### Why are the changes needed? pandas-on-Spark installation is merged to PySpark installation pages. For videos/blogposts, now this is named pandas API on Spark. Old Koalas blogposts and videos are obsolete. ### Does this PR introduce _any_ user-facing change? To end users, no because the docs are not released yet. ### How was this patch tested? I manually built the docs and checked the output Closes #33018 from HyukjinKwon/SPARK-35645. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-22 09:36:27 -07:00 · 2021-06-22 09:36:27 -07:00 · 27046582e4
parent ce53b7199d
commit 27046582e4
4 changed files with 5 additions and 283 deletions
--- a/python/docs/source/getting_started/index.rst
+++ b/python/docs/source/getting_started/index.rst
@ -25,18 +25,12 @@ There are more guides shared with other languages such as
 `Quick Start <https://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides
 at `the Spark documentation <https://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.

+.. TODO(SPARK-35588): Merge PySpark quickstart and 10 minutes to pandas API on Spark.
+
 .. toctree::
   :maxdepth: 2

   install
   quickstart
-
-For pandas API on Spark:
-
-.. toctree::
-   :maxdepth: 2
-
-   ps_install
   ps_10mins
-   ps_videos_blogs

--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@ -46,7 +46,10 @@ If you want to install extra dependencies for a specific component, you can inst

 .. code-block:: bash

+    # Spark SQL
    pip install pyspark[sql]
+    # pandas API on Spark
+    pip install pyspark[pandas_on_spark]

 For PySpark with/without a specific Hadoop version, you can install it by using ``PYSPARK_HADOOP_VERSION`` environment variables as below:

--- a/python/docs/source/getting_started/ps_install.rst
+++ b/python/docs/source/getting_started/ps_install.rst
@ -1,145 +0,0 @@
-============
-Installation
-============
-
-Pandas API on Spark requires PySpark so please make sure your PySpark is available.
-
-To install pandas API on Spark, you can use:
-
- `Conda <https://anaconda.org/conda-forge/koalas>`__
- `PyPI <https://pypi.org/project/koalas>`__
- `Installation from source <../development/ps_contributing.rst#environment-setup>`__
-
-To install PySpark, you can use:
-
- `Installation with the official release channel <https://spark.apache.org/downloads.html>`__
- `Conda <https://anaconda.org/conda-forge/pyspark>`__
- `PyPI <https://pypi.org/project/pyspark>`__
- `Installation from source <https://github.com/apache/spark#building-spark>`__
-
-
-Python version support
----------------------
-
-Officially Python 3.5 to 3.8.
-
-.. note::
-   Python 3.5 support is deprecated and will be dropped in the future release.
-   At that point, existing Python 3.5 workflows that use pandas API on Spark will continue to work without
-   modification, but Python 3.5 users will no longer get access to the latest pandas-on-Spark features
-   and bugfixes. We recommend that you upgrade to Python 3.6 or newer.
-
-Installing pandas API on Spark
-------------------------------
-
-Installing with Conda
-~~~~~~~~~~~~~~~~~~~~~~
-
-First you will need `Conda <http://conda.pydata.org/docs/>`__ to be installed.
-After that, we should create a new conda environment. A conda environment is similar with a
-virtualenv that allows you to specify a specific version of Python and set of libraries.
-Run the following commands from a terminal window::
-
-    conda create --name koalas-dev-env
-
-This will create a minimal environment with only Python installed in it.
-To put your self inside this environment run::
-
-    conda activate koalas-dev-env
-
-The final step required is to install pandas API on Spark. This can be done with the
-following command::
-
-    conda install -c conda-forge koalas
-
-To install a specific version of pandas API on Spark::
-
-    conda install -c conda-forge koalas=1.3.0
-
-
-Installing from PyPI
-~~~~~~~~~~~~~~~~~~~~
-
-Pandas API on Spark can be installed via pip from
-`PyPI <https://pypi.org/project/koalas>`__::
-
-    pip install koalas
-
-
-Installing from source
-~~~~~~~~~~~~~~~~~~~~~~
-
-See the `Contribution Guide <../development/ps_contributing.rst#environment-setup>`__ for complete instructions.
-
-
-Installing PySpark
------------------
-
-Installing with the official release channel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can install PySpark by downloading a release in `the official release channel <https://spark.apache.org/downloads.html>`__.
-Once you download the release, un-tar it first as below::
-
-    tar xzvf spark-2.4.4-bin-hadoop2.7.tgz
-
-After that, make sure set ``SPARK_HOME`` environment variable to indicate the directory you untar-ed::
-
-    cd spark-2.4.4-bin-hadoop2.7
-    export SPARK_HOME=`pwd`
-
-Also, make sure your ``PYTHONPATH`` can find the PySpark and Py4J under ``$SPARK_HOME/python/lib``::
-
-    export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
-
-
-Installing with Conda
-~~~~~~~~~~~~~~~~~~~~~~
-
-PySpark can be installed via `Conda <https://anaconda.org/conda-forge/pyspark>`__::
-
-    conda install -c conda-forge pyspark
-
-
-Installing with PyPI
-~~~~~~~~~~~~~~~~~~~~~~
-
-PySpark can be installed via pip from `PyPI <https://pypi.org/project/pyspark>`__::
-
-    pip install pyspark
-
-
-Installing from source
-~~~~~~~~~~~~~~~~~~~~~~
-
-To install PySpark from source, refer `Building Spark <https://github.com/apache/spark#building-spark>`__.
-
-Likewise, make sure you set ``SPARK_HOME`` environment variable to the git-cloned directory, and your
-``PYTHONPATH`` environment variable can find the PySpark and Py4J under ``$SPARK_HOME/python/lib``::
-
-    export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
-
-
-Dependencies
------------
-
-============= ================
-Package       Required version
-============= ================
-`pandas`      >=0.23.2
-`pyspark`     >=2.4.0
-`pyarrow`     >=0.10
-`numpy`       >=1.14
-============= ================
-
-
-Optional dependencies
-~~~~~~~~~~~~~~~~~~~~~
-
-============= ================
-Package       Required version
-============= ================
-`mlflow`      >=1.0
-`plotly`      >=4.8
-`matplotlib`  >=3.0.0,<3.3.0
-============= ================
--- a/python/docs/source/getting_started/ps_videos_blogs.rst
+++ b/python/docs/source/getting_started/ps_videos_blogs.rst
@ -1,130 +0,0 @@
-======================
-Koalas Talks and Blogs
-======================
-
-Blog Posts
----------
-
- `Interoperability between Koalas and Apache Spark (Aug 11, 2020) <https://databricks.com/blog/2020/08/11/interoperability-between-koalas-and-apache-spark.html>`_
- `Introducing Koalas 1.0 (Jun 24, 2020) <https://databricks.com/blog/2020/06/24/introducing-koalas-1-0.html>`_
- `10 Minutes from pandas to Koalas on Apache Spark (Mar 31, 2020) <https://databricks.com/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html>`_
- `Guest Blog: How Virgin Hyperloop One Reduced Processing Time from Hours to Minutes with Koalas (Aug 22, 2019) <https://databricks.com/blog/2019/08/22/guest-blog-how-virgin-hyperloop-one-reduced-processing-time-from-hours-to-minutes-with-koalas.html>`_
- `Koalas: Easy Transition from pandas to Apache Spark (Apr 24, 2019) <https://databricks.com/blog/2019/04/24/koalas-easy-transition-from-pandas-to-apache-spark.html>`_
-
-
-Data + AI Summit 2020 EUROPE (Nov 18-19, 2020)
----------------------------------------------
-
-Project Zen: Making Spark Pythonic
-==================================
-
-.. raw:: html
-
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/-vJLTEOdLvA" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Koalas: Interoperability Between Koalas and Apache Spark
-========================================================
-
-.. raw:: html
-
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/eI0Wh2Epo0Q" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Spark + AI Summit 2020 (Jun 24, 2020)
-------------------------------------
-
-Introducing Apache Spark 3.0: A retrospective of the Last 10 Years, and a Look Forward to the Next 10 Years to Come.
-====================================================================================================================
-
-.. raw:: html
-
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/OLJKIogf2nU?start=555" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Koalas: Making an Easy Transition from Pandas to Apache Spark
-=============================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/G_-9VbyHcx8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Koalas: Pandas on Apache Spark
-==============================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/iUpBSHoqzLM" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Webinar @ Databricks (Mar 27, 2020)
-----------------------------------
-
-Reducing Time-To-Insight for Virgin Hyperloop's Data
-====================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://player.vimeo.com/video/397032070" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen"="" allowfullscreen=""></iframe>
-
-
-PyData New York 2019 (Nov 4, 2019)
----------------------------------
-
-Pandas vs Koalas: The Ultimate Showdown
-=======================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/xcGEQUURAuk?start=1470" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Spark + AI Summit Europe 2019 (Oct 16, 2019)
--------------------------------------------
-
-New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, and Koalas
-=======================================================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/scM_WQMhB3A?start=1470" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-Koalas: Making an Easy Transition from Pandas to Apache Spark
-=============================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/Wfj2Vuse7as" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-Koalas: Pandas on Apache Spark
-==============================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/NpAMbzerAp0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-PyBay 2019 (Aug 17, 2019)
-------------------------
-
-Koalas Easy Transition from pandas to Apache Spark
-==================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/cMDLoGkidEE?v=xcGEQUURAuk?start=1470" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-
-Spark + AI Summit 2019 (Apr 24, 2019)
-------------------------------------
-
-Official Announcement of Koalas Open Source Project
-===================================================
-
-.. raw:: html
-
-    <iframe width="560" height="315" src="https://www.youtube.com/embed/Shzb15DZ9Qg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>