[SPARK-13380][SQL][DOCUMENT] Document Rand(seed) and Randn(seed) Return Indeterministic Results When Data Partitions are not fixed.

`rand` and `randn` functions with a `seed` argument are commonly used. Based on the common sense, the results of `rand` and `randn` should be deterministic if the `seed` parameter value is provided. For example, in MS SQL Server, it also has a function `rand`. Regarding the parameter `seed`, the description is like: ```Seed is an integer expression (tinyint, smallint, or int) that gives the seed value. If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.```

Update: the current implementation is unable to generate deterministic results when the partitions are not fixed. This PR documents this issue in the function descriptions.

jkbradley hit an issue and provided an example in the following JIRA: https://issues.apache.org/jira/browse/SPARK-13333

Author: gatorsmile <gatorsmile@gmail.com>

Closes #11232 from gatorsmile/randSeed.
This commit is contained in:
gatorsmile 2016-02-18 21:19:36 -08:00 committed by Reynold Xin
parent 95e1ab223e
commit c776fce99b
2 changed files with 5 additions and 1 deletions

View file

@ -85,7 +85,7 @@ case class Randn(seed: Long) extends RDG {
def this(seed: Expression) = this(seed match {
case IntegerLiteral(s) => s
case _ => throw new AnalysisException("Input argument to rand must be an integer literal.")
case _ => throw new AnalysisException("Input argument to randn must be an integer literal.")
})
override def genCode(ctx: CodegenContext, ev: ExprCode): String = {

View file

@ -1052,6 +1052,8 @@ object functions extends LegacyFunctions {
/**
* Generate a random column with i.i.d. samples from U[0.0, 1.0].
*
* Note that this is indeterministic when data partitions are not fixed.
*
* @group normal_funcs
* @since 1.4.0
*/
@ -1068,6 +1070,8 @@ object functions extends LegacyFunctions {
/**
* Generate a column with i.i.d. samples from the standard normal distribution.
*
* Note that this is indeterministic when data partitions are not fixed.
*
* @group normal_funcs
* @since 1.4.0
*/