diff --git a/python/docs/source/getting_started/index.rst b/python/docs/source/getting_started/index.rst index ed23f5a3d4..595d6822ee 100644 --- a/python/docs/source/getting_started/index.rst +++ b/python/docs/source/getting_started/index.rst @@ -25,18 +25,12 @@ There are more guides shared with other languages such as `Quick Start `_ in Programming Guides at `the Spark documentation `_. +.. TODO(SPARK-35588): Merge PySpark quickstart and 10 minutes to pandas API on Spark. + .. toctree:: :maxdepth: 2 install quickstart - -For pandas API on Spark: - -.. toctree:: - :maxdepth: 2 - - ps_install ps_10mins - ps_videos_blogs diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 3d518932ff..df5431082b 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -46,7 +46,10 @@ If you want to install extra dependencies for a specific component, you can inst .. code-block:: bash + # Spark SQL pip install pyspark[sql] + # pandas API on Spark + pip install pyspark[pandas_on_spark] For PySpark with/without a specific Hadoop version, you can install it by using ``PYSPARK_HADOOP_VERSION`` environment variables as below: diff --git a/python/docs/source/getting_started/ps_install.rst b/python/docs/source/getting_started/ps_install.rst deleted file mode 100644 index 974895a073..0000000000 --- a/python/docs/source/getting_started/ps_install.rst +++ /dev/null @@ -1,145 +0,0 @@ -============ -Installation -============ - -Pandas API on Spark requires PySpark so please make sure your PySpark is available. - -To install pandas API on Spark, you can use: - -- `Conda `__ -- `PyPI `__ -- `Installation from source <../development/ps_contributing.rst#environment-setup>`__ - -To install PySpark, you can use: - -- `Installation with the official release channel `__ -- `Conda `__ -- `PyPI `__ -- `Installation from source `__ - - -Python version support ----------------------- - -Officially Python 3.5 to 3.8. - -.. note:: - Python 3.5 support is deprecated and will be dropped in the future release. - At that point, existing Python 3.5 workflows that use pandas API on Spark will continue to work without - modification, but Python 3.5 users will no longer get access to the latest pandas-on-Spark features - and bugfixes. We recommend that you upgrade to Python 3.6 or newer. - -Installing pandas API on Spark -------------------------------- - -Installing with Conda -~~~~~~~~~~~~~~~~~~~~~~ - -First you will need `Conda `__ to be installed. -After that, we should create a new conda environment. A conda environment is similar with a -virtualenv that allows you to specify a specific version of Python and set of libraries. -Run the following commands from a terminal window:: - - conda create --name koalas-dev-env - -This will create a minimal environment with only Python installed in it. -To put your self inside this environment run:: - - conda activate koalas-dev-env - -The final step required is to install pandas API on Spark. This can be done with the -following command:: - - conda install -c conda-forge koalas - -To install a specific version of pandas API on Spark:: - - conda install -c conda-forge koalas=1.3.0 - - -Installing from PyPI -~~~~~~~~~~~~~~~~~~~~ - -Pandas API on Spark can be installed via pip from -`PyPI `__:: - - pip install koalas - - -Installing from source -~~~~~~~~~~~~~~~~~~~~~~ - -See the `Contribution Guide <../development/ps_contributing.rst#environment-setup>`__ for complete instructions. - - -Installing PySpark ------------------- - -Installing with the official release channel -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You can install PySpark by downloading a release in `the official release channel `__. -Once you download the release, un-tar it first as below:: - - tar xzvf spark-2.4.4-bin-hadoop2.7.tgz - -After that, make sure set ``SPARK_HOME`` environment variable to indicate the directory you untar-ed:: - - cd spark-2.4.4-bin-hadoop2.7 - export SPARK_HOME=`pwd` - -Also, make sure your ``PYTHONPATH`` can find the PySpark and Py4J under ``$SPARK_HOME/python/lib``:: - - export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH - - -Installing with Conda -~~~~~~~~~~~~~~~~~~~~~~ - -PySpark can be installed via `Conda `__:: - - conda install -c conda-forge pyspark - - -Installing with PyPI -~~~~~~~~~~~~~~~~~~~~~~ - -PySpark can be installed via pip from `PyPI `__:: - - pip install pyspark - - -Installing from source -~~~~~~~~~~~~~~~~~~~~~~ - -To install PySpark from source, refer `Building Spark `__. - -Likewise, make sure you set ``SPARK_HOME`` environment variable to the git-cloned directory, and your -``PYTHONPATH`` environment variable can find the PySpark and Py4J under ``$SPARK_HOME/python/lib``:: - - export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH - - -Dependencies ------------- - -============= ================ -Package Required version -============= ================ -`pandas` >=0.23.2 -`pyspark` >=2.4.0 -`pyarrow` >=0.10 -`numpy` >=1.14 -============= ================ - - -Optional dependencies -~~~~~~~~~~~~~~~~~~~~~ - -============= ================ -Package Required version -============= ================ -`mlflow` >=1.0 -`plotly` >=4.8 -`matplotlib` >=3.0.0,<3.3.0 -============= ================ diff --git a/python/docs/source/getting_started/ps_videos_blogs.rst b/python/docs/source/getting_started/ps_videos_blogs.rst deleted file mode 100644 index a9c37ab598..0000000000 --- a/python/docs/source/getting_started/ps_videos_blogs.rst +++ /dev/null @@ -1,130 +0,0 @@ -====================== -Koalas Talks and Blogs -====================== - -Blog Posts ----------- - -- `Interoperability between Koalas and Apache Spark (Aug 11, 2020) `_ -- `Introducing Koalas 1.0 (Jun 24, 2020) `_ -- `10 Minutes from pandas to Koalas on Apache Spark (Mar 31, 2020) `_ -- `Guest Blog: How Virgin Hyperloop One Reduced Processing Time from Hours to Minutes with Koalas (Aug 22, 2019) `_ -- `Koalas: Easy Transition from pandas to Apache Spark (Apr 24, 2019) `_ - - -Data + AI Summit 2020 EUROPE (Nov 18-19, 2020) ----------------------------------------------- - -Project Zen: Making Spark Pythonic -================================== - -.. raw:: html - - - - - -Koalas: Interoperability Between Koalas and Apache Spark -======================================================== - -.. raw:: html - - - - - -Spark + AI Summit 2020 (Jun 24, 2020) -------------------------------------- - -Introducing Apache Spark 3.0: A retrospective of the Last 10 Years, and a Look Forward to the Next 10 Years to Come. -==================================================================================================================== - -.. raw:: html - - - - - -Koalas: Making an Easy Transition from Pandas to Apache Spark -============================================================= - -.. raw:: html - - - - -Koalas: Pandas on Apache Spark -============================== - -.. raw:: html - - - - -Webinar @ Databricks (Mar 27, 2020) ------------------------------------ - -Reducing Time-To-Insight for Virgin Hyperloop's Data -==================================================== - -.. raw:: html - - - - -PyData New York 2019 (Nov 4, 2019) ----------------------------------- - -Pandas vs Koalas: The Ultimate Showdown -======================================= - -.. raw:: html - - - - -Spark + AI Summit Europe 2019 (Oct 16, 2019) --------------------------------------------- - -New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, and Koalas -======================================================================================= - -.. raw:: html - - - -Koalas: Making an Easy Transition from Pandas to Apache Spark -============================================================= - -.. raw:: html - - - -Koalas: Pandas on Apache Spark -============================== - -.. raw:: html - - - - -PyBay 2019 (Aug 17, 2019) -------------------------- - -Koalas Easy Transition from pandas to Apache Spark -================================================== - -.. raw:: html - - - - -Spark + AI Summit 2019 (Apr 24, 2019) -------------------------------------- - -Official Announcement of Koalas Open Source Project -=================================================== - -.. raw:: html - -