[SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window

### What changes were proposed in this pull request?

This PR adds PySpark API document of `session_window`.
The docstring of the function doesn't comply with numpydoc format so this PR also fix it.
Further, the API document of `window` doesn't have `Parameters` section so it's also added in this PR.

### Why are the changes needed?

To provide PySpark users with the API document of the newly added function.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`make html` in `python/docs` and get the following docs.

[window]
![time-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963797-ce25b268-20ca-48e3-ac8d-cbcbd85ebb3e.png)

[session_window]
![session-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963853-dd9d8417-139b-41ee-9924-14544b1a91af.png)

Closes #34118 from sarutak/python-session-window-doc.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 5a32e41e9c)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
This commit is contained in:
Kousuke Saruta 2021-09-30 16:51:12 +09:00 committed by Jungtaek Lim
parent 939c4d93b5
commit 8b2b6bb0d3
2 changed files with 36 additions and 0 deletions

View file

@ -497,6 +497,7 @@ Functions
second
sentences
sequence
session_window
sha1
sha2
shiftleft

View file

@ -2300,6 +2300,29 @@ def window(timeColumn, windowDuration, slideDuration=None, startTime=None):
.. versionadded:: 2.0.0
Parameters
----------
timeColumn : :class:`~pyspark.sql.Column`
The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.
windowDuration : str
A string specifying the width of the window, e.g. `10 minutes`,
`1 second`. Check `org.apache.spark.unsafe.types.CalendarInterval` for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
`1 day` always means 86,400,000 milliseconds, not a calendar day.
slideDuration : str, optional
A new window will be generated every `slideDuration`. Must be less than
or equal to the `windowDuration`. Check
`org.apache.spark.unsafe.types.CalendarInterval` for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.
startTime : str, optional
The offset with respect to 1970-01-01 00:00:00 UTC with which to start
window intervals. For example, in order to have hourly tumbling windows that
start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
`startTime` as `15 minutes`.
Examples
--------
>>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", "val")
@ -2347,7 +2370,19 @@ def session_window(timeColumn, gapDuration):
input row.
The output column will be a struct called 'session_window' by default with the nested columns
'start' and 'end', where 'start' and 'end' will be of :class:`pyspark.sql.types.TimestampType`.
.. versionadded:: 3.2.0
Parameters
----------
timeColumn : :class:`~pyspark.sql.Column`
The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.
gapDuration : :class:`~pyspark.sql.Column` or str
A column or string specifying the timeout of the session. It could be static value,
e.g. `10 minutes`, `1 second`, or an expression/UDF that specifies gap
duration dynamically based on the input row.
Examples
--------
>>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", "val")