spark-instrumented-optimizer/python/pyspark/sql
zero323 01f20394ac [SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions
### What changes were proposed in this pull request?

- Adds following overloaded variants to Scala `o.a.s.sql.functions`:

  - `percentile_approx(e: Column, percentage: Array[Double], accuracy: Long): Column`
  - `percentile_approx(columnName: String, percentage: Array[Double], accuracy: Long): Column`
  - `percentile_approx(e: Column, percentage: Double, accuracy: Long): Column`
  - `percentile_approx(columnName: String, percentage: Double, accuracy: Long): Column`
  - `percentile_approx(e: Column, percentage: Seq[Double], accuracy: Long): Column` (primarily for
Python interop).
  - `percentile_approx(columnName: String, percentage: Seq[Double], accuracy: Long): Column`

- Adds `percentile_approx` to `pyspark.sql.functions`.

- Adds `percentile_approx` function to SparkR.

### Why are the changes needed?

Currently we support `percentile_approx` only in SQL expression. It is inconvenient and makes this function relatively unknown.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New unit tests for SparkR an PySpark.

As for now there are no additional tests in Scala API ‒ `ApproximatePercentile` is well tested and Python (including docstrings) and R tests provide additional tests, so it seems unnecessary.

Closes #27278 from zero323/SPARK-30569.

Lead-authored-by: zero323 <mszymkiewicz@gmail.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-17 10:44:21 +09:00
..
avro [SPARK-27506][SQL][FOLLOWUP] Use option avroSchema to specify an evolved schema in from_avro 2019-12-30 18:14:21 +09:00
pandas [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy 2020-02-18 20:39:50 +08:00
tests [SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions 2020-03-17 10:44:21 +09:00
__init__.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
catalog.py [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
column.py [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation 2020-02-18 16:46:45 +09:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted 2020-02-20 12:21:24 +09:00
dataframe.py [SPARK-30764][SQL] Improve the readability of EXPLAIN FORMATTED style 2020-02-21 23:36:14 +08:00
functions.py [SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions 2020-03-17 10:44:21 +09:00
group.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
readwriter.py [SPARK-31030][SQL] Backward Compatibility for Parsing and formatting Datetime 2020-03-11 14:11:13 +08:00
session.py [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted 2020-02-20 12:21:24 +09:00
streaming.py [SPARK-31030][SQL] Backward Compatibility for Parsing and formatting Datetime 2020-03-11 14:11:13 +08:00
types.py [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields 2020-03-09 11:06:45 -07:00
udf.py [SPARK-30722][PYTHON][DOCS] Update documentation for Pandas UDF with Python type hints 2020-02-12 10:49:46 +09:00
utils.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
window.py [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE 2020-01-13 22:55:19 +08:00