3d158f9c91
### What changes were proposed in this pull request? This PR proposes to port Koalas documentation to PySpark documentation as its initial step. It ports almost as is except these differences: - Renamed import from `databricks.koalas` to `pyspark.pandas`. - Renamed `to_koalas` -> `to_pandas_on_spark` - Renamed `(Series|DataFrame).koalas` -> `(Series|DataFrame).pandas_on_spark` - Added a `ps_` prefix in the RST file names of Koalas documentation Other then that, - Excluded `python/docs/build/html` in linter - Fixed GA dependency installataion ### Why are the changes needed? To document pandas APIs on Spark. ### Does this PR introduce _any_ user-facing change? Yes, it adds new documentations. ### How was this patch tested? Manually built the docs and checked the output. Closes #32726 from HyukjinKwon/SPARK-35587. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
78 lines
2.9 KiB
ReStructuredText
78 lines
2.9 KiB
ReStructuredText
.. Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
.. http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
.. Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
|
|
.. PySpark documentation master file
|
|
|
|
=====================
|
|
PySpark Documentation
|
|
=====================
|
|
|
|
|binder|_ | `GitHub <https://github.com/apache/spark>`_ | `Issues <https://issues.apache.org/jira/projects/SPARK/issues>`_ | |examples|_ | `Community <https://spark.apache.org/community.html>`_
|
|
|
|
PySpark is an interface for Apache Spark in Python. It not only allows you to write
|
|
Spark applications using Python APIs, but also provides the PySpark shell for
|
|
interactively analyzing your data in a distributed environment. PySpark supports most
|
|
of Spark's features such as Spark SQL, DataFrame, Streaming, MLlib
|
|
(Machine Learning) and Spark Core.
|
|
|
|
.. image:: ../../../docs/img/pyspark-components.png
|
|
:alt: PySpark Components
|
|
|
|
**Spark SQL and DataFrame**
|
|
|
|
Spark SQL is a Spark module for structured data processing. It provides
|
|
a programming abstraction called DataFrame and can also act as distributed
|
|
SQL query engine.
|
|
|
|
**pandas APIs on Spark**
|
|
|
|
pandas APIs on Spark allow you to scale your pandas workload out.
|
|
With this package, you can:
|
|
|
|
* Be immediately productive with Spark, with no learning curve, if you are already familiar with pandas.
|
|
* Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets).
|
|
* Switch to pandas API and PySpark API contexts easily without any overhead.
|
|
|
|
**Streaming**
|
|
|
|
Running on top of Spark, the streaming feature in Apache Spark enables powerful
|
|
interactive and analytical applications across both streaming and historical data,
|
|
while inheriting Spark's ease of use and fault tolerance characteristics.
|
|
|
|
**MLlib**
|
|
|
|
Built on top of Spark, MLlib is a scalable machine learning library that provides
|
|
a uniform set of high-level APIs that help users create and tune practical machine
|
|
learning pipelines.
|
|
|
|
**Spark Core**
|
|
|
|
Spark Core is the underlying general execution engine for the Spark platform that all
|
|
other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset)
|
|
and in-memory computing capabilities.
|
|
|
|
.. toctree::
|
|
:maxdepth: 2
|
|
:hidden:
|
|
|
|
getting_started/index
|
|
user_guide/index
|
|
reference/index
|
|
development/index
|
|
migration_guide/index
|
|
|