921abc51cf
### What changes were proposed in this pull request? This PR proposes to restructure API files according to the layout, see https://github.com/apache/spark/pull/32799. Now the pandas APIs on Spark are under a separate directory which is same level as other modules such as Spark SQL. ```bash tree reference ``` **Before:** ``` reference ├── index.rst ├── ps_extensions.rst ├── ps_frame.rst ├── ps_general_functions.rst ├── ps_groupby.rst ├── ps_indexing.rst ├── ps_io.rst ├── ps_ml.rst ├── ps_series.rst ├── ps_window.rst ├── pyspark.ml.rst ├── pyspark.mllib.rst ├── pyspark.pandas.rst ├── pyspark.resource.rst ├── pyspark.rst ├── pyspark.sql.rst ├── pyspark.ss.rst └── pyspark.streaming.rst ``` **After:** ``` reference ├── index.rst ├── pyspark.ml.rst ├── pyspark.mllib.rst ├── pyspark.pandas │ ├── extensions.rst │ ├── frame.rst │ ├── general_functions.rst │ ├── groupby.rst │ ├── index.rst │ ├── indexing.rst │ ├── io.rst │ ├── ml.rst │ ├── series.rst │ └── window.rst ├── pyspark.resource.rst ├── pyspark.rst ├── pyspark.sql.rst ├── pyspark.ss.rst └── pyspark.streaming.rst ``` ### Why are the changes needed? To make the directory structure easier to follow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually built and tested the docs. Closes #32812 from HyukjinKwon/SPARK-35646-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
328 lines
6.2 KiB
ReStructuredText
328 lines
6.2 KiB
ReStructuredText
.. _api.dataframe:
|
|
|
|
=========
|
|
DataFrame
|
|
=========
|
|
.. currentmodule:: pyspark.pandas
|
|
|
|
Constructor
|
|
-----------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame
|
|
|
|
Attributes and underlying data
|
|
------------------------------
|
|
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.index
|
|
DataFrame.columns
|
|
DataFrame.empty
|
|
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.dtypes
|
|
DataFrame.shape
|
|
DataFrame.axes
|
|
DataFrame.ndim
|
|
DataFrame.size
|
|
DataFrame.select_dtypes
|
|
DataFrame.values
|
|
|
|
Conversion
|
|
----------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.copy
|
|
DataFrame.isna
|
|
DataFrame.astype
|
|
DataFrame.isnull
|
|
DataFrame.notna
|
|
DataFrame.notnull
|
|
DataFrame.pad
|
|
DataFrame.bool
|
|
|
|
Indexing, iteration
|
|
-------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.at
|
|
DataFrame.iat
|
|
DataFrame.head
|
|
DataFrame.idxmax
|
|
DataFrame.idxmin
|
|
DataFrame.loc
|
|
DataFrame.iloc
|
|
DataFrame.items
|
|
DataFrame.iteritems
|
|
DataFrame.iterrows
|
|
DataFrame.itertuples
|
|
DataFrame.keys
|
|
DataFrame.pop
|
|
DataFrame.tail
|
|
DataFrame.xs
|
|
DataFrame.get
|
|
DataFrame.where
|
|
DataFrame.mask
|
|
DataFrame.query
|
|
|
|
Binary operator functions
|
|
-------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.add
|
|
DataFrame.radd
|
|
DataFrame.div
|
|
DataFrame.rdiv
|
|
DataFrame.truediv
|
|
DataFrame.rtruediv
|
|
DataFrame.mul
|
|
DataFrame.rmul
|
|
DataFrame.sub
|
|
DataFrame.rsub
|
|
DataFrame.pow
|
|
DataFrame.rpow
|
|
DataFrame.mod
|
|
DataFrame.rmod
|
|
DataFrame.floordiv
|
|
DataFrame.rfloordiv
|
|
DataFrame.lt
|
|
DataFrame.gt
|
|
DataFrame.le
|
|
DataFrame.ge
|
|
DataFrame.ne
|
|
DataFrame.eq
|
|
DataFrame.dot
|
|
|
|
Function application, GroupBy & Window
|
|
--------------------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.apply
|
|
DataFrame.applymap
|
|
DataFrame.pipe
|
|
DataFrame.agg
|
|
DataFrame.aggregate
|
|
DataFrame.groupby
|
|
DataFrame.rolling
|
|
DataFrame.expanding
|
|
DataFrame.transform
|
|
|
|
.. _api.dataframe.stats:
|
|
|
|
Computations / Descriptive Stats
|
|
--------------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.abs
|
|
DataFrame.all
|
|
DataFrame.any
|
|
DataFrame.clip
|
|
DataFrame.corr
|
|
DataFrame.count
|
|
DataFrame.describe
|
|
DataFrame.kurt
|
|
DataFrame.kurtosis
|
|
DataFrame.mad
|
|
DataFrame.max
|
|
DataFrame.mean
|
|
DataFrame.min
|
|
DataFrame.median
|
|
DataFrame.pct_change
|
|
DataFrame.prod
|
|
DataFrame.product
|
|
DataFrame.quantile
|
|
DataFrame.nunique
|
|
DataFrame.sem
|
|
DataFrame.skew
|
|
DataFrame.sum
|
|
DataFrame.std
|
|
DataFrame.var
|
|
DataFrame.cummin
|
|
DataFrame.cummax
|
|
DataFrame.cumsum
|
|
DataFrame.cumprod
|
|
DataFrame.round
|
|
DataFrame.diff
|
|
DataFrame.eval
|
|
|
|
Reindexing / Selection / Label manipulation
|
|
-------------------------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.add_prefix
|
|
DataFrame.add_suffix
|
|
DataFrame.align
|
|
DataFrame.at_time
|
|
DataFrame.between_time
|
|
DataFrame.drop
|
|
DataFrame.droplevel
|
|
DataFrame.drop_duplicates
|
|
DataFrame.duplicated
|
|
DataFrame.equals
|
|
DataFrame.filter
|
|
DataFrame.first
|
|
DataFrame.head
|
|
DataFrame.last
|
|
DataFrame.rename
|
|
DataFrame.rename_axis
|
|
DataFrame.reset_index
|
|
DataFrame.set_index
|
|
DataFrame.swapaxes
|
|
DataFrame.swaplevel
|
|
DataFrame.take
|
|
DataFrame.isin
|
|
DataFrame.sample
|
|
DataFrame.truncate
|
|
|
|
.. _api.dataframe.missing:
|
|
|
|
Missing data handling
|
|
---------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.backfill
|
|
DataFrame.dropna
|
|
DataFrame.fillna
|
|
DataFrame.replace
|
|
DataFrame.bfill
|
|
DataFrame.ffill
|
|
|
|
Reshaping, sorting, transposing
|
|
-------------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.pivot_table
|
|
DataFrame.pivot
|
|
DataFrame.sort_index
|
|
DataFrame.sort_values
|
|
DataFrame.nlargest
|
|
DataFrame.nsmallest
|
|
DataFrame.stack
|
|
DataFrame.unstack
|
|
DataFrame.melt
|
|
DataFrame.explode
|
|
DataFrame.squeeze
|
|
DataFrame.T
|
|
DataFrame.transpose
|
|
DataFrame.reindex
|
|
DataFrame.reindex_like
|
|
DataFrame.rank
|
|
|
|
Combining / joining / merging
|
|
-----------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.append
|
|
DataFrame.assign
|
|
DataFrame.merge
|
|
DataFrame.join
|
|
DataFrame.update
|
|
DataFrame.insert
|
|
|
|
Time series-related
|
|
-------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.shift
|
|
DataFrame.first_valid_index
|
|
DataFrame.last_valid_index
|
|
|
|
Serialization / IO / Conversion
|
|
-------------------------------
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.from_records
|
|
DataFrame.info
|
|
DataFrame.to_table
|
|
DataFrame.to_delta
|
|
DataFrame.to_parquet
|
|
DataFrame.to_spark_io
|
|
DataFrame.to_csv
|
|
DataFrame.to_pandas
|
|
DataFrame.to_html
|
|
DataFrame.to_numpy
|
|
DataFrame.to_pandas_on_spark
|
|
DataFrame.to_spark
|
|
DataFrame.to_string
|
|
DataFrame.to_json
|
|
DataFrame.to_dict
|
|
DataFrame.to_excel
|
|
DataFrame.to_clipboard
|
|
DataFrame.to_markdown
|
|
DataFrame.to_records
|
|
DataFrame.to_latex
|
|
DataFrame.style
|
|
|
|
Spark-related
|
|
-------------
|
|
``DataFrame.spark`` provides features that does not exist in pandas but
|
|
in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
|
|
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.spark.schema
|
|
DataFrame.spark.print_schema
|
|
DataFrame.spark.frame
|
|
DataFrame.spark.cache
|
|
DataFrame.spark.persist
|
|
DataFrame.spark.hint
|
|
DataFrame.spark.to_table
|
|
DataFrame.spark.to_spark_io
|
|
DataFrame.spark.explain
|
|
DataFrame.spark.apply
|
|
DataFrame.spark.repartition
|
|
DataFrame.spark.coalesce
|
|
DataFrame.spark.checkpoint
|
|
DataFrame.spark.local_checkpoint
|
|
|
|
.. _api.dataframe.plot:
|
|
|
|
Plotting
|
|
--------
|
|
``DataFrame.plot`` is both a callable method and a namespace attribute for
|
|
specific plotting methods of the form ``DataFrame.plot.<kind>``.
|
|
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.plot
|
|
DataFrame.plot.area
|
|
DataFrame.plot.barh
|
|
DataFrame.plot.bar
|
|
DataFrame.plot.hist
|
|
DataFrame.plot.line
|
|
DataFrame.plot.pie
|
|
DataFrame.plot.scatter
|
|
DataFrame.plot.density
|
|
DataFrame.hist
|
|
DataFrame.kde
|
|
|
|
Koalas-specific
|
|
---------------
|
|
``DataFrame.pandas_on_spark`` provides Koalas-specific features that exists only in Koalas.
|
|
These can be accessed by ``DataFrame.pandas_on_spark.<function/property>``.
|
|
|
|
.. autosummary::
|
|
:toctree: api/
|
|
|
|
DataFrame.pandas_on_spark.attach_id_column
|
|
DataFrame.pandas_on_spark.apply_batch
|
|
DataFrame.pandas_on_spark.transform_batch
|