spark-instrumented-optimizer/python/docs/source/reference/ps_groupby.rst
Hyukjin Kwon 3d158f9c91 [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation
### What changes were proposed in this pull request?

This PR proposes to port Koalas documentation to PySpark documentation as its initial step.
It ports almost as is except these differences:

- Renamed import from `databricks.koalas` to `pyspark.pandas`.
- Renamed `to_koalas` -> `to_pandas_on_spark`
- Renamed `(Series|DataFrame).koalas` -> `(Series|DataFrame).pandas_on_spark`
- Added a `ps_` prefix in the RST file names of Koalas documentation

Other then that,

- Excluded `python/docs/build/html` in linter
- Fixed GA dependency installataion

### Why are the changes needed?

To document pandas APIs on Spark.

### Does this PR introduce _any_ user-facing change?

Yes, it adds new documentations.

### How was this patch tested?

Manually built the docs and checked the output.

Closes #32726 from HyukjinKwon/SPARK-35587.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-04 11:11:09 +09:00

89 lines
1.5 KiB
ReStructuredText

.. _api.groupby:
=======
GroupBy
=======
.. currentmodule:: pyspark.pandas
GroupBy objects are returned by groupby calls: :func:`DataFrame.groupby`, :func:`Series.groupby`, etc.
.. currentmodule:: pyspark.pandas.groupby
Indexing, iteration
-------------------
.. autosummary::
:toctree: api/
GroupBy.get_group
Function application
--------------------
.. autosummary::
:toctree: api/
GroupBy.apply
GroupBy.transform
The following methods are available only for `DataFrameGroupBy` objects.
.. autosummary::
:toctree: api/
DataFrameGroupBy.agg
DataFrameGroupBy.aggregate
Computations / Descriptive Stats
--------------------------------
.. autosummary::
:toctree: api/
GroupBy.all
GroupBy.any
GroupBy.count
GroupBy.cumcount
GroupBy.cummax
GroupBy.cummin
GroupBy.cumprod
GroupBy.cumsum
GroupBy.filter
GroupBy.first
GroupBy.last
GroupBy.max
GroupBy.mean
GroupBy.median
GroupBy.min
GroupBy.rank
GroupBy.std
GroupBy.sum
GroupBy.var
GroupBy.nunique
GroupBy.size
GroupBy.diff
GroupBy.idxmax
GroupBy.idxmin
GroupBy.fillna
GroupBy.bfill
GroupBy.ffill
GroupBy.head
GroupBy.backfill
GroupBy.shift
GroupBy.tail
The following methods are available only for `DataFrameGroupBy` objects.
.. autosummary::
:toctree: api/
DataFrameGroupBy.describe
The following methods are available only for `SeriesGroupBy` objects.
.. autosummary::
:toctree: api/
SeriesGroupBy.nsmallest
SeriesGroupBy.nlargest
SeriesGroupBy.value_counts
SeriesGroupBy.unique