[SPARK-32306][SQL][DOCS] Clarify the result of percentile_approx()

### What changes were proposed in this pull request?
More precise description of the result of the `percentile_approx()` function and its synonym `approx_percentile()`. The proposed sentence clarifies that  the function returns **one of elements** (or array of elements) from the input column.

### Why are the changes needed?
To improve Spark docs and avoid misunderstanding of the function behavior.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
`./dev/scalastyle`

Closes #29835 from MaxGekk/doc-percentile_approx.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
This commit is contained in:
Max Gekk 2020-09-22 12:45:19 -07:00 committed by Liang-Chi Hsieh
parent fba5736c50
commit 7c14f177eb
4 changed files with 17 additions and 10 deletions

View file

@ -1417,8 +1417,10 @@ setMethod("quarter",
})
#' @details
#' \code{percentile_approx} Returns the approximate percentile value of
#' numeric column at the given percentage.
#' \code{percentile_approx} Returns the approximate \code{percentile} of the numeric column
#' \code{col} which is the smallest value in the ordered \code{col} values (sorted from least to
#' greatest) such that no more than \code{percentage} of \code{col} values is less than the value
#' or equal to that value.
#'
#' @param percentage Numeric percentage at which percentile should be computed
#' All values should be between 0 and 1.

View file

@ -592,7 +592,9 @@ def nanvl(col1, col2):
@since(3.1)
def percentile_approx(col, percentage, accuracy=10000):
"""Returns the approximate percentile value of numeric column col at the given percentage.
"""Returns the approximate `percentile` of the numeric column `col` which is the smallest value
in the ordered `col` values (sorted from least to greatest) such that no more than `percentage`
of `col` values is less than the value or equal to that value.
The value of percentage must be between 0.0 and 1.0.
The accuracy parameter (default: 10000)

View file

@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
*/
@ExpressionDescription(
usage = """
_FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric
column `col` at the given percentage. The value of percentage must be between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which
controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the approximation.
_FUNC_(col, percentage [, accuracy]) - Returns the approximate `percentile` of the numeric
column `col` which is the smallest value in the ordered `col` values (sorted from least to
greatest) such that no more than `percentage` of `col` values is less than the value
or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy
at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is
the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array.

View file

@ -684,8 +684,9 @@ object functions {
def min(columnName: String): Column = min(Column(columnName))
/**
* Aggregate function: returns and array of the approximate percentile values
* of numeric column col at the given percentages.
* Aggregate function: returns the approximate `percentile` of the numeric column `col` which
* is the smallest value in the ordered `col` values (sorted from least to greatest) such that
* no more than `percentage` of `col` values is less than the value or equal to that value.
*
* If percentage is an array, each value must be between 0.0 and 1.0.
* If it is a single floating point value, it must be between 0.0 and 1.0.