spark-instrumented-optimizer/sql/gen-sql-api-docs.py

227 lines
5.8 KiB
Python
Raw Normal View History

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import os
from collections import namedtuple
from pyspark.java_gateway import launch_gateway
ExpressionInfo = namedtuple(
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
"ExpressionInfo", "className name usage arguments examples note since deprecated")
def _list_function_infos(jvm):
"""
Returns a list of function information via JVM. Sorts wrapped expression infos by name
and returns them.
"""
jinfos = jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listBuiltinFunctionInfos()
infos = []
for jinfo in jinfos:
name = jinfo.getName()
usage = jinfo.getUsage()
usage = usage.replace("_FUNC_", name) if usage is not None else usage
infos.append(ExpressionInfo(
className=jinfo.getClassName(),
name=name,
usage=usage,
arguments=jinfo.getArguments().replace("_FUNC_", name),
examples=jinfo.getExamples().replace("_FUNC_", name),
note=jinfo.getNote(),
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
since=jinfo.getSince(),
deprecated=jinfo.getDeprecated()))
return sorted(infos, key=lambda i: i.name)
def _make_pretty_usage(usage):
"""
Makes the usage description pretty and returns a formatted string if `usage`
is not an empty string. Otherwise, returns None.
"""
if usage is not None and usage.strip() != "":
usage = "\n".join(map(lambda u: u.strip(), usage.split("\n")))
return "%s\n\n" % usage
def _make_pretty_arguments(arguments):
"""
Makes the arguments description pretty and returns a formatted string if `arguments`
starts with the argument prefix. Otherwise, returns None.
Expected input:
Arguments:
* arg0 - ...
...
* arg0 - ...
...
Expected output:
**Arguments:**
* arg0 - ...
...
* arg0 - ...
...
"""
if arguments.startswith("\n Arguments:"):
arguments = "\n".join(map(lambda u: u[6:], arguments.strip().split("\n")[1:]))
return "**Arguments:**\n\n%s\n\n" % arguments
def _make_pretty_examples(examples):
"""
Makes the examples description pretty and returns a formatted string if `examples`
starts with the example prefix. Otherwise, returns None.
Expected input:
Examples:
> SELECT ...;
...
> SELECT ...;
...
Expected output:
**Examples:**
```
> SELECT ...;
...
> SELECT ...;
...
```
"""
if examples.startswith("\n Examples:"):
examples = "\n".join(map(lambda u: u[6:], examples.strip().split("\n")[1:]))
return "**Examples:**\n\n```\n%s\n```\n\n" % examples
def _make_pretty_note(note):
"""
Makes the note description pretty and returns a formatted string if `note` is not
an empty string. Otherwise, returns None.
Expected input:
...
Expected output:
**Note:**
...
"""
if note != "":
note = "\n".join(map(lambda n: n[4:], note.split("\n")))
return "**Note:**\n%s\n" % note
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
def _make_pretty_deprecated(deprecated):
"""
Makes the deprecated description pretty and returns a formatted string if `deprecated`
is not an empty string. Otherwise, returns None.
Expected input:
...
Expected output:
**Deprecated:**
...
"""
if deprecated != "":
deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n")))
return "**Deprecated:**\n%s\n" % deprecated
def generate_sql_markdown(jvm, path):
"""
Generates a markdown file after listing the function information. The output file
is created in `path`.
Expected output:
### NAME
USAGE
**Arguments:**
ARGUMENTS
**Examples:**
```
EXAMPLES
```
**Note:**
NOTE
**Since:** SINCE
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
**Deprecated:**
DEPRECATED
<br/>
"""
with open(path, 'w') as mdfile:
for info in _list_function_infos(jvm):
name = info.name
usage = _make_pretty_usage(info.usage)
arguments = _make_pretty_arguments(info.arguments)
examples = _make_pretty_examples(info.examples)
note = _make_pretty_note(info.note)
since = info.since
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
deprecated = _make_pretty_deprecated(info.deprecated)
mdfile.write("### %s\n\n" % name)
if usage is not None:
mdfile.write("%s\n\n" % usage.strip())
if arguments is not None:
mdfile.write(arguments)
if examples is not None:
mdfile.write(examples)
if note is not None:
mdfile.write(note)
if since is not None and since != "":
mdfile.write("**Since:** %s\n\n" % since.strip())
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc ## What changes were proposed in this pull request? This PR proposes to two things: 1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`. 2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`: ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png) It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0). This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field. Now it shows the deprecation information as below: - **SQL documentation is shown as below:** ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png) - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**: ``` Function: from_utc_timestamp Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'. Extended Usage: Examples: > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-31 09:00:00 Since: 1.5.0 Deprecated: Deprecated since 3.0.0. See SPARK-25496. ``` ## How was this patch tested? Manually tested via: - For documentation verification: ``` $ cd sql $ sh create-docs.sh ``` - For checking description: ``` $ ./bin/spark-sql ``` ``` spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp; spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp; ``` Closes #24259 from HyukjinKwon/SPARK-27328. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-09 01:49:42 -04:00
if deprecated is not None:
mdfile.write(deprecated)
mdfile.write("<br/>\n\n")
if __name__ == "__main__":
jvm = launch_gateway().jvm
spark_root_dir = os.path.dirname(os.path.dirname(__file__))
markdown_file_path = os.path.join(spark_root_dir, "sql/docs/index.md")
generate_sql_markdown(jvm, markdown_file_path)