spark-instrumented-optimizer/python/docs/source/reference/pyspark.streaming.rst

..  Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

..    http://www.apache.org/licenses/LICENSE-2.0

..  Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.


===============
Spark Streaming
===============

Core Classes
------------

.. currentmodule:: pyspark.streaming

.. autosummary::
    :toctree: api/

    StreamingContext
    DStream


Streaming Management
--------------------

.. currentmodule:: pyspark.streaming

.. autosummary::
    :toctree: api/

    StreamingContext.addStreamingListener
    StreamingContext.awaitTermination
    StreamingContext.awaitTerminationOrTimeout
    StreamingContext.checkpoint
    StreamingContext.getActive
    StreamingContext.getActiveOrCreate
    StreamingContext.getOrCreate
    StreamingContext.remember
    StreamingContext.sparkContext
    StreamingContext.start
    StreamingContext.stop
    StreamingContext.transform
    StreamingContext.union


Input and Output
----------------

.. autosummary::
    :toctree: api/

    StreamingContext.binaryRecordsStream
    StreamingContext.queueStream
    StreamingContext.socketTextStream
    StreamingContext.textFileStream
    DStream.pprint
    DStream.saveAsTextFiles


Transformations and Actions
---------------------------

.. currentmodule:: pyspark.streaming

.. autosummary::
    :toctree: api/

    DStream.cache
    DStream.checkpoint
    DStream.cogroup
    DStream.combineByKey
    DStream.context
    DStream.count
    DStream.countByValue
    DStream.countByValueAndWindow
    DStream.countByWindow
    DStream.filter
    DStream.flatMap
    DStream.flatMapValues
    DStream.foreachRDD
    DStream.fullOuterJoin
    DStream.glom
    DStream.groupByKey
    DStream.groupByKeyAndWindow
    DStream.join
    DStream.leftOuterJoin
    DStream.map
    DStream.mapPartitions
    DStream.mapPartitionsWithIndex
    DStream.mapValues
    DStream.partitionBy
    DStream.persist
    DStream.reduce
    DStream.reduceByKey
    DStream.reduceByKeyAndWindow
    DStream.reduceByWindow
    DStream.repartition
    DStream.rightOuterJoin
    DStream.slice
    DStream.transform
    DStream.transformWith
    DStream.union
    DStream.updateStateByKey
    DStream.window


Kinesis
-------

.. currentmodule:: pyspark.streaming.kinesis

.. autosummary::
    :toctree: api/

    KinesisUtils.createStream
    InitialPositionInStream.LATEST
    InitialPositionInStream.TRIM_HORIZON
[SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base ### What changes were proposed in this pull request? This PR proposes to redesign the PySpark documentation. I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html. Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html. In more details, this PR proposes: 1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark. 2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow. 3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively. One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage. ### Why are the changes needed? Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/). It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate. Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it. ### Does this PR introduce _any_ user-facing change? Yes, PySpark API documentation will be redesigned. ### How was this patch tested? Manually tested, and the demo site was made to show. Closes #29188 from HyukjinKwon/SPARK-32179. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> 2020-07-27 04:49:21 -04:00			`.. Licensed to the Apache Software Foundation (ASF) under one`
			`or more contributor license agreements. See the NOTICE file`
			`distributed with this work for additional information`
			`regarding copyright ownership. The ASF licenses this file`
			`to you under the Apache License, Version 2.0 (the`
			`"License"); you may not use this file except in compliance`
			`with the License. You may obtain a copy of the License at`

			`.. http://www.apache.org/licenses/LICENSE-2.0`

			`.. Unless required by applicable law or agreed to in writing,`
			`software distributed under the License is distributed on an`
			`"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`KIND, either express or implied. See the License for the`
			`specific language governing permissions and limitations`
			`under the License.`


			`===============`
			`Spark Streaming`
			`===============`

			`Core Classes`
			`------------`

			`.. currentmodule:: pyspark.streaming`

			`.. autosummary::`
			`:toctree: api/`

			`StreamingContext`
			`DStream`


			`Streaming Management`
			`--------------------`

			`.. currentmodule:: pyspark.streaming`

			`.. autosummary::`
			`:toctree: api/`

			`StreamingContext.addStreamingListener`
			`StreamingContext.awaitTermination`
			`StreamingContext.awaitTerminationOrTimeout`
			`StreamingContext.checkpoint`
			`StreamingContext.getActive`
			`StreamingContext.getActiveOrCreate`
			`StreamingContext.getOrCreate`
			`StreamingContext.remember`
			`StreamingContext.sparkContext`
			`StreamingContext.start`
			`StreamingContext.stop`
			`StreamingContext.transform`
			`StreamingContext.union`


			`Input and Output`
			`----------------`

			`.. autosummary::`
			`:toctree: api/`

			`StreamingContext.binaryRecordsStream`
			`StreamingContext.queueStream`
			`StreamingContext.socketTextStream`
			`StreamingContext.textFileStream`
			`DStream.pprint`
			`DStream.saveAsTextFiles`


			`Transformations and Actions`
			`---------------------------`

			`.. currentmodule:: pyspark.streaming`

			`.. autosummary::`
			`:toctree: api/`

			`DStream.cache`
			`DStream.checkpoint`
			`DStream.cogroup`
			`DStream.combineByKey`
			`DStream.context`
			`DStream.count`
			`DStream.countByValue`
			`DStream.countByValueAndWindow`
			`DStream.countByWindow`
			`DStream.filter`
			`DStream.flatMap`
			`DStream.flatMapValues`
			`DStream.foreachRDD`
			`DStream.fullOuterJoin`
			`DStream.glom`
			`DStream.groupByKey`
			`DStream.groupByKeyAndWindow`
			`DStream.join`
			`DStream.leftOuterJoin`
			`DStream.map`
			`DStream.mapPartitions`
			`DStream.mapPartitionsWithIndex`
			`DStream.mapValues`
			`DStream.partitionBy`
			`DStream.persist`
			`DStream.reduce`
			`DStream.reduceByKey`
			`DStream.reduceByKeyAndWindow`
			`DStream.reduceByWindow`
			`DStream.repartition`
			`DStream.rightOuterJoin`
			`DStream.slice`
			`DStream.transform`
			`DStream.transformWith`
			`DStream.union`
			`DStream.updateStateByKey`
			`DStream.window`


			`Kinesis`
			`-------`

			`.. currentmodule:: pyspark.streaming.kinesis`

			`.. autosummary::`
			`:toctree: api/`

			`KinesisUtils.createStream`
			`InitialPositionInStream.LATEST`
			`InitialPositionInStream.TRIM_HORIZON`