[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc

## What changes were proposed in this pull request?

This PR proposes to two things:

1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`.

2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`:

    ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png)

    It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0).

This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field.

Now it shows the deprecation information as below:

- **SQL documentation is shown as below:**

    ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png)

- **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**:

    ```
    Function: from_utc_timestamp
    Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
    Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
    Extended Usage:
        Examples:
          > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
           2016-08-31 09:00:00

        Since: 1.5.0

        Deprecated:
          Deprecated since 3.0.0. See SPARK-25496.

    ```

## How was this patch tested?

Manually tested via:

- For documentation verification:

    ```
    $ cd sql
    $ sh create-docs.sh
    ```

- For checking description:

    ```
    $ ./bin/spark-sql
    ```
    ```
    spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp;
    spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp;
    ```

Closes #24259 from HyukjinKwon/SPARK-27328.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Hyukjin Kwon 2019-04-09 13:49:42 +08:00 committed by Wenchen Fan
parent 051336d9dd
commit f16dfb9129
9 changed files with 136 additions and 36 deletions

View file

@ -31,35 +31,58 @@ import java.lang.annotation.RetentionPolicy;
* `usage()` will be used for the function usage in brief way.
*
* These below are concatenated and used for the function usage in verbose way, suppose arguments,
* examples, note and since will be provided.
* examples, note, since and deprecated will be provided.
*
* `arguments()` describes arguments for the expression. This should follow the format as below:
* `arguments()` describes arguments for the expression.
*
* Arguments:
* * arg0 - ...
* ....
* * arg1 - ...
* ....
*
* `examples()` describes examples for the expression. This should follow the format as below:
*
* Examples:
* > SELECT ...;
* ...
* > SELECT ...;
* ...
* `examples()` describes examples for the expression.
*
* `note()` contains some notes for the expression optionally.
*
* `since()` contains version information for the expression. Version is specified by,
* for example, "2.2.0".
*
* We can refer the function name by `_FUNC_`, in `usage`, `arguments` and `examples`, as it's
* registered in `FunctionRegistry`.
* `deprecated()` contains deprecation information for the expression optionally, for example,
* "Deprecated since 2.2.0. Use something else instead".
*
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()` and `since()` will
* be ignored and `extended()` will be used for the extended description for backward
* compatibility.
* The format, in particular for `arguments()`, `examples()`,`note()`, `since()` and
* `deprecated()`, should strictly be as follows.
*
* <pre>
* <code>@ExpressionDescription(
* ...
* arguments = """
* Arguments:
* * arg0 - ...
* ....
* * arg1 - ...
* ....
* """,
* examples = """
* Examples:
* > SELECT ...;
* ...
* > SELECT ...;
* ...
* """,
* note = """
* ...
* """,
* since = "3.0.0",
* deprecated = """
* ...
* """)
* </code>
* </pre>
*
* We can refer the function name by `_FUNC_`, in `usage()`, `arguments()` and `examples()` as
* it is registered in `FunctionRegistry`.
*
* Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `since()` and
* `deprecated()` should be not defined together. `extended()` exists for backward compatibility.
*
* Note this contents are used in the SparkSQL documentation for built-in functions. The contents
* here are considered as a Markdown text and then rendered.
*/
@DeveloperApi
@Retention(RetentionPolicy.RUNTIME)
@ -70,4 +93,5 @@ public @interface ExpressionDescription {
String examples() default "";
String note() default "";
String since() default "";
String deprecated() default "";
}

View file

@ -30,6 +30,7 @@ public class ExpressionInfo {
private String examples;
private String note;
private String since;
private String deprecated;
public String getClassName() {
return className;
@ -63,6 +64,10 @@ public class ExpressionInfo {
return note;
}
public String getDeprecated() {
return deprecated;
}
public String getDb() {
return db;
}
@ -75,13 +80,15 @@ public class ExpressionInfo {
String arguments,
String examples,
String note,
String since) {
String since,
String deprecated) {
assert name != null;
assert arguments != null;
assert examples != null;
assert examples.isEmpty() || examples.contains(" Examples:");
assert note != null;
assert since != null;
assert deprecated != null;
this.className = className;
this.db = db;
@ -91,6 +98,7 @@ public class ExpressionInfo {
this.examples = examples;
this.note = note;
this.since = since;
this.deprecated = deprecated;
// Make the extended description.
this.extended = arguments + examples;
@ -98,25 +106,44 @@ public class ExpressionInfo {
this.extended = "\n No example/argument for _FUNC_.\n";
}
if (!note.isEmpty()) {
if (!note.contains(" ") || !note.endsWith(" ")) {
throw new IllegalArgumentException("'note' is malformed in the expression [" +
this.name + "]. It should start with a newline and 4 leading spaces; end " +
"with a newline and two spaces; however, got [" + note + "].");
}
this.extended += "\n Note:\n " + note.trim() + "\n";
}
if (!since.isEmpty()) {
if (Integer.parseInt(since.split("\\.")[0]) < 0) {
throw new IllegalArgumentException("'since' is malformed in the expression [" +
this.name + "]. It should not start with a negative number; however, " +
"got [" + since + "].");
}
this.extended += "\n Since: " + since + "\n";
}
if (!deprecated.isEmpty()) {
if (!deprecated.contains(" ") || !deprecated.endsWith(" ")) {
throw new IllegalArgumentException("'deprecated' is malformed in the " +
"expression [" + this.name + "]. It should start with a newline and 4 " +
"leading spaces; end with a newline and two spaces; however, got [" +
deprecated + "].");
}
this.extended += "\n Deprecated:\n " + deprecated.trim() + "\n";
}
}
public ExpressionInfo(String className, String name) {
this(className, null, name, null, "", "", "", "");
this(className, null, name, null, "", "", "", "", "");
}
public ExpressionInfo(String className, String db, String name) {
this(className, db, name, null, "", "", "", "");
this(className, db, name, null, "", "", "", "", "");
}
// This is to keep the original constructor just in case.
public ExpressionInfo(String className, String db, String name, String usage, String extended) {
// `arguments` and `examples` are concatenated for the extended description. So, here
// simply pass the `extended` as `arguments` and an empty string for `examples`.
this(className, db, name, usage, extended, "", "", "");
this(className, db, name, usage, extended, "", "", "", "");
}
}

View file

@ -621,7 +621,7 @@ object FunctionRegistry {
val clazz = scala.reflect.classTag[Cast].runtimeClass
val usage = "_FUNC_(expr) - Casts the value `expr` to the target data type `_FUNC_`."
val expressionInfo =
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "")
new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "")
(name, (expressionInfo, builder))
}
@ -641,7 +641,8 @@ object FunctionRegistry {
df.arguments(),
df.examples(),
df.note(),
df.since())
df.since(),
df.deprecated())
} else {
// This exists for the backward compatibility with old `ExpressionDescription`s defining
// the extended description in `extended()`.

View file

@ -958,7 +958,9 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi
> SELECT _FUNC_(array(1, 20, null, 3));
[20,null,3,1]
""",
note = "The function is non-deterministic.",
note = """
The function is non-deterministic.
""",
since = "2.4.0")
case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
extends UnaryExpression with ExpectsInputTypes with Stateful with ExpressionWithRandomSeed {
@ -1042,7 +1044,9 @@ case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
[3,4,1,2]
""",
since = "1.5.0",
note = "Reverse logic for arrays is available since 2.4.0."
note = """
Reverse logic for arrays is available since 2.4.0.
"""
)
case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
@ -2056,7 +2060,9 @@ case class ElementAt(left: Expression, right: Expression)
> SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
[1,2,3,4,5,6]
""",
note = "Concat logic for arrays is available since 2.4.0.")
note = """
Concat logic for arrays is available since 2.4.0.
""")
case class Concat(children: Seq[Expression]) extends ComplexTypeMergingExpression {
private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)

View file

@ -288,6 +288,7 @@ object CreateStruct extends FunctionBuilder {
"",
"",
"",
"",
"")
("struct", (info, this))
}

View file

@ -1018,7 +1018,10 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
""",
since = "1.5.0")
since = "1.5.0",
deprecated = """
Deprecated since 3.0.0. See SPARK-25496.
""")
// scalastyle:on line.size.limit
case class FromUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {
@ -1229,7 +1232,10 @@ case class MonthsBetween(
> SELECT _FUNC_('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
""",
since = "1.5.0")
since = "1.5.0",
deprecated = """
Deprecated since 3.0.0. See SPARK-25496.
""")
// scalastyle:on line.size.limit
case class ToUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {

View file

@ -125,7 +125,9 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
> SELECT _FUNC_();
46707d92-02f4-4817-8116-a4c3b23e6266
""",
note = "The function is non-deterministic.")
note = """
The function is non-deterministic.
""")
// scalastyle:on line.size.limit
case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with Stateful
with ExpressionWithRandomSeed {

View file

@ -78,7 +78,9 @@ trait ExpressionWithRandomSeed {
> SELECT _FUNC_(null);
0.8446490682263027
""",
note = "The function is non-deterministic in general case.")
note = """
The function is non-deterministic in general case.
""")
// scalastyle:on line.size.limit
case class Rand(child: Expression) extends RDG with ExpressionWithRandomSeed {
@ -118,7 +120,9 @@ object Rand {
> SELECT _FUNC_(null);
1.1164209726833079
""",
note = "The function is non-deterministic in general case.")
note = """
The function is non-deterministic in general case.
""")
// scalastyle:on line.size.limit
case class Randn(child: Expression) extends RDG with ExpressionWithRandomSeed {

View file

@ -20,7 +20,7 @@ import os
from collections import namedtuple
ExpressionInfo = namedtuple(
"ExpressionInfo", "className name usage arguments examples note since")
"ExpressionInfo", "className name usage arguments examples note since deprecated")
def _list_function_infos(jvm):
@ -42,7 +42,8 @@ def _list_function_infos(jvm):
arguments=jinfo.getArguments().replace("_FUNC_", name),
examples=jinfo.getExamples().replace("_FUNC_", name),
note=jinfo.getNote(),
since=jinfo.getSince()))
since=jinfo.getSince(),
deprecated=jinfo.getDeprecated()))
return sorted(infos, key=lambda i: i.name)
@ -136,6 +137,27 @@ def _make_pretty_note(note):
return "**Note:**\n%s\n" % note
def _make_pretty_deprecated(deprecated):
"""
Makes the deprecated description pretty and returns a formatted string if `deprecated`
is not an empty string. Otherwise, returns None.
Expected input:
...
Expected output:
**Deprecated:**
...
"""
if deprecated != "":
deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n")))
return "**Deprecated:**\n%s\n" % deprecated
def generate_sql_markdown(jvm, path):
"""
Generates a markdown file after listing the function information. The output file
@ -162,6 +184,10 @@ def generate_sql_markdown(jvm, path):
**Since:** SINCE
**Deprecated:**
DEPRECATED
<br/>
"""
@ -174,6 +200,7 @@ def generate_sql_markdown(jvm, path):
examples = _make_pretty_examples(info.examples)
note = _make_pretty_note(info.note)
since = info.since
deprecated = _make_pretty_deprecated(info.deprecated)
mdfile.write("### %s\n\n" % name)
if usage is not None:
@ -186,6 +213,8 @@ def generate_sql_markdown(jvm, path):
mdfile.write(note)
if since is not None and since != "":
mdfile.write("**Since:** %s\n\n" % since.strip())
if deprecated is not None:
mdfile.write(deprecated)
mdfile.write("<br/>\n\n")