spark-instrumented-optimizer/python/pyspark/sql
Enrico Minack f90eb6a5db [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark
### What changes were proposed in this pull request?
With SPARK-34806 we can now easily add an equivalent for `Dataset.observe(Observation, Column, Column*)` to PySpark's `DataFrame` API.

### Why are the changes needed?
This further aligns the Python DataFrame API with Scala Dataset API.

### Does this PR introduce _any_ user-facing change?
Yes, it adds the `Observation` class and the `DataFrame.observe` method.

### How was this patch tested?
Adds test `test_observe` to `pyspark.sql.test.test_dataframe`.

Closes #33484 from EnricoMi/branch-observation-python.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-28 01:39:34 +08:00
..
avro [SPARK-34300][PYSPARK][DOCS][MINOR] Fix some typos and syntax issues in docstrings and output of dev/lint-python 2021-02-02 09:30:50 +09:00
pandas [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
tests [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
__init__.py [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
__init__.pyi [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
_typing.pyi [SPARK-36211][PYTHON] Correct typing of udf return value 2021-07-27 09:07:22 +02:00
catalog.py [SPARK-36258][PYTHON] Exposing functionExists in pyspark sql catalog 2021-07-23 19:15:41 +09:00
catalog.pyi [SPARK-36258][PYTHON] Exposing functionExists in pyspark sql catalog 2021-07-23 19:15:41 +09:00
column.py [SPARK-36160][PYTHON][DOCS] Clarifying documentation for pyspark sql/column 2021-07-16 21:32:53 +09:00
column.pyi [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains 2021-03-24 15:21:19 +01:00
conf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
conf.pyi [SPARK-35019][PYTHON][SQL] Fix type hints mismatches in pyspark.sql.* 2021-04-13 11:21:13 +09:00
context.py [MINOR][DOCS] Avoid some python docs where first sentence has "e.g." or similar 2021-05-12 10:38:59 +09:00
context.pyi [SPARK-35019][PYTHON][SQL] Fix type hints mismatches in pyspark.sql.* 2021-04-13 11:21:13 +09:00
dataframe.py [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
dataframe.pyi [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
functions.py [SPARK-34893][SS] Support session window natively 2021-07-16 20:38:16 +09:00
functions.pyi [SPARK-36211][PYTHON] Correct typing of udf return value 2021-07-27 09:07:22 +02:00
group.py [SPARK-36226][PYTHON][DOCS] Improve python docstring links to other classes 2021-07-23 19:17:51 +09:00
group.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
observation.py [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
observation.pyi [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
readwriter.py [SPARK-36181][PYTHON] Update pyspark sql readwriter documentation 2021-07-19 19:50:42 +09:00
readwriter.pyi [SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV 2020-11-27 15:47:39 +09:00
session.py [SPARK-36226][PYTHON][DOCS] Improve python docstring links to other classes 2021-07-23 19:17:51 +09:00
session.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
streaming.py [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page 2021-06-01 10:58:49 +09:00
streaming.pyi [SPARK-33836][SS][PYTHON] Expose DataStreamReader.table and DataStreamWriter.toTable 2020-12-21 19:42:59 +09:00
types.py [SPARK-36226][PYTHON][DOCS] Improve python docstring links to other classes 2021-07-23 19:17:51 +09:00
types.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
udf.py [SPARK-34408][PYTHON] Refactor spark.udf.register to share the same path to generate UDF instance 2021-02-11 10:57:02 +09:00
udf.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
utils.py Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
window.py [SPARK-33250][PYTHON][DOCS] Migration to NumPy documentation style in SQL (pyspark.sql.*) 2020-11-03 10:00:49 +09:00
window.pyi [SPARK-33250][PYTHON][DOCS] Migration to NumPy documentation style in SQL (pyspark.sql.*) 2020-11-03 10:00:49 +09:00