From 3e203c985c0fb7434776b854ecca6fc553e24d58 Mon Sep 17 00:00:00 2001 From: Dilip Biswal Date: Wed, 29 Jan 2020 08:41:40 -0600 Subject: [PATCH] [SPARK-28801][DOC][FOLLOW-UP] Setup links and address other review comments ### What changes were proposed in this pull request? - Sets up links between related sections. - Add "Related sections" for each section. - Change to the left hand side menu to reflect the current status of the doc. - Other minor cleanups. ### Why are the changes needed? Currently Spark lacks documentation on the supported SQL constructs causing confusion among users who sometimes have to look at the code to understand the usage. This is aimed at addressing this issue. ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Tested using jykyll build --serve Closes #27371 from dilipbiswal/select_finalization. Authored-by: Dilip Biswal Signed-off-by: Sean Owen --- docs/_data/menu-sql.yaml | 32 +++++++------------ docs/sql-ref-syntax-qry-select-clusterby.md | 17 ++++++++-- ...sql-ref-syntax-qry-select-distribute-by.md | 14 ++++++-- docs/sql-ref-syntax-qry-select-groupby.md | 27 +++++++++++----- docs/sql-ref-syntax-qry-select-having.md | 27 +++++++++++----- docs/sql-ref-syntax-qry-select-limit.md | 29 ++++++++++++----- docs/sql-ref-syntax-qry-select-orderby.md | 14 ++++++-- docs/sql-ref-syntax-qry-select-sortby.md | 13 +++++++- docs/sql-ref-syntax-qry-select-where.md | 19 ++++++++--- docs/sql-ref-syntax-qry-select.md | 10 ++++++ docs/sql-ref-syntax-qry.md | 25 +++++++++++---- 11 files changed, 164 insertions(+), 63 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 0ffe55c3a3..7673731778 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -123,37 +123,27 @@ - text: SELECT url: sql-ref-syntax-qry-select.html subitems: - - text: DISTINCT Clause - url: sql-ref-syntax-qry-select-distinct.html - - text: Joins - url: sql-ref-syntax-qry-select-join.html - - text: ORDER BY Clause - url: sql-ref-syntax-qry-select-orderby.html + - text: WHERE Clause + url: sql-ref-syntax-qry-select-where.html - text: GROUP BY Clause url: sql-ref-syntax-qry-select-groupby.html - text: HAVING Clause url: sql-ref-syntax-qry-select-having.html + - text: ORDER BY Clause + url: sql-ref-syntax-qry-select-orderby.html + - text: SORT BY Clause + url: sql-ref-syntax-qry-select-sortby.html + - text: CLUSTER BY Clause + url: sql-ref-syntax-qry-select-clusterby.html + - text: DISTRIBUTE BY Clause + url: sql-ref-syntax-qry-select-distribute-by.html - text: LIMIT Clause url: sql-ref-syntax-qry-select-limit.html - - text: Set operations - url: sql-ref-syntax-qry-select-setops.html - text: USE database url: sql-ref-syntax-qry-select-usedb.html - - text: Common Table Expression(CTE) - url: sql-ref-syntax-qry-select-cte.html - - text: Subqueries - url: sql-ref-syntax-qry-select-subqueries.html - - text: Query hints - url: sql-ref-syntax-qry-select-hints.html - - text: SAMPLING - url: sql-ref-syntax-qry-sampling.html - - text: WINDOWING ANALYTIC FUNCTIONS - url: sql-ref-syntax-qry-window.html - - text: AGGREGATION (CUBE/ROLLUP/GROUPING) - url: sql-ref-syntax-qry-aggregation.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - - text: Auxilarry Statements + - text: Auxiliary Statements url: sql-ref-syntax-aux.html subitems: - text: Analyze statement diff --git a/docs/sql-ref-syntax-qry-select-clusterby.md b/docs/sql-ref-syntax-qry-select-clusterby.md index 4e59a3e55a..c96c441921 100644 --- a/docs/sql-ref-syntax-qry-select-clusterby.md +++ b/docs/sql-ref-syntax-qry-select-clusterby.md @@ -20,9 +20,10 @@ license: | --- The CLUSTER BY clause is used to first repartition the data based on the input expressions and then sort the data within each partition. This is -semantically equivalent to performing a DISTRIBUTE BY followed by -a SORT BY. This clause only ensures that the resultant rows are -sorted within each partition and does not guarantee a total order of output. +semantically equivalent to performing a +[DISTRIBUTE BY](sql-ref-syntax-qry-select-distribute-by.html) followed by a +[SORT BY](sql-ref-syntax-qry-select-sortby.html). This clause only ensures that the +resultant rows are sorted within each partition and does not guarantee a total order of output. ### Syntax {% highlight sql %} @@ -86,3 +87,13 @@ SELECT age, name FROM person CLUSTER BY age; |16 |Jack N | +---+-------+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) \ No newline at end of file diff --git a/docs/sql-ref-syntax-qry-select-distribute-by.md b/docs/sql-ref-syntax-qry-select-distribute-by.md index a1b3fcbfb5..e706ccf039 100644 --- a/docs/sql-ref-syntax-qry-select-distribute-by.md +++ b/docs/sql-ref-syntax-qry-select-distribute-by.md @@ -19,8 +19,8 @@ license: | limitations under the License. --- The DISTRIBUTE BY clause is used to repartition the data based -on the input expressions. Unlike the `CLUSTER BY` clause, this does not -sort the data within each partition. +on the input expressions. Unlike the [CLUSTER BY](sql-ref-syntax-qry-select-clusterby.html) +clause, this does not sort the data within each partition. ### Syntax {% highlight sql %} @@ -82,3 +82,13 @@ SELECT age, name FROM person DISTRIBUTE BY age; |16 |Jack N | +---+-------+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) \ No newline at end of file diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index e47ca0bf3c..ab1c5d6eb5 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -73,14 +73,15 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [ ### Examples {% highlight sql %} CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT); -INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10), - (100, 'Fremont', 'Honda Accord', 15), - (100, 'Fremont', 'Honda CRV', 7), - (200, 'Dublin', 'Honda Civic', 20), - (200, 'Dublin', 'Honda Accord', 10), - (200, 'Dublin', 'Honda CRV', 3), - (300, 'San Jose', 'Honda Civic', 5), - (300, 'San Jose', 'Honda Accord', 8); +INSERT INTO dealer VALUES + (100, 'Fremont', 'Honda Civic', 10), + (100, 'Fremont', 'Honda Accord', 15), + (100, 'Fremont', 'Honda CRV', 7), + (200, 'Dublin', 'Honda Civic', 20), + (200, 'Dublin', 'Honda Accord', 10), + (200, 'Dublin', 'Honda CRV', 3), + (300, 'San Jose', 'Honda Civic', 5), + (300, 'San Jose', 'Honda Accord', 8); -- Sum of quantity per dealership. Group by `id`. SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id; @@ -223,3 +224,13 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer +--------+------------+---+ {% endhighlight %} + +### Related clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) diff --git a/docs/sql-ref-syntax-qry-select-having.md b/docs/sql-ref-syntax-qry-select-having.md index 428d8556e7..94d9be649f 100644 --- a/docs/sql-ref-syntax-qry-select-having.md +++ b/docs/sql-ref-syntax-qry-select-having.md @@ -49,14 +49,15 @@ HAVING boolean_expression ### Examples {% highlight sql %} CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT); -INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10), - (100, 'Fremont', 'Honda Accord', 15), - (100, 'Fremont', 'Honda CRV', 7), - (200, 'Dublin', 'Honda Civic', 20), - (200, 'Dublin', 'Honda Accord', 10), - (200, 'Dublin', 'Honda CRV', 3), - (300, 'San Jose', 'Honda Civic', 5), - (300, 'San Jose', 'Honda Accord', 8); +INSERT INTO dealer VALUES + (100, 'Fremont', 'Honda Civic', 10), + (100, 'Fremont', 'Honda Accord', 15), + (100, 'Fremont', 'Honda CRV', 7), + (200, 'Dublin', 'Honda Civic', 20), + (200, 'Dublin', 'Honda Accord', 10), + (200, 'Dublin', 'Honda CRV', 3), + (300, 'San Jose', 'Honda Civic', 5), + (300, 'San Jose', 'Honda Accord', 8); -- `HAVING` clause referring to column in `GROUP BY`. SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING city = 'Fremont'; @@ -117,3 +118,13 @@ SELECT sum(quantity) AS sum FROM dealer HAVING sum(quantity) > 10; +---+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) diff --git a/docs/sql-ref-syntax-qry-select-limit.md b/docs/sql-ref-syntax-qry-select-limit.md index 609bfb98a0..2b9999cc40 100644 --- a/docs/sql-ref-syntax-qry-select-limit.md +++ b/docs/sql-ref-syntax-qry-select-limit.md @@ -18,8 +18,10 @@ license: | See the License for the specific language governing permissions and limitations under the License. --- -The LIMIT clause is used to constrain the number of rows returned by the SELECT statement. -In general, this clause is used in conjuction with ORDER BY to ensure that the results are deterministic. +The LIMIT clause is used to constrain the number of rows returned by +the [SELECT](sql-ref-syntax-qry-select.html) statement. In general, this clause +is used in conjuction with [ORDER BY](sql-ref-syntax-qry-select-orderby.html) to +ensure that the results are deterministic. ### Syntax {% highlight sql %} @@ -42,12 +44,13 @@ LIMIT { ALL | integer_expression } ### Examples {% highlight sql %} CREATE TABLE person (name STRING, age INT); -INSERT INTO person VALUES ('Zen Hui', 25), - ('Anil B', 18), - ('Shone S', 16), - ('Mike A', 25), - ('John A', 18), - ('Jack N', 16); +INSERT INTO person VALUES + ('Zen Hui', 25), + ('Anil B', 18), + ('Shone S', 16), + ('Mike A', 25), + ('John A', 18), + ('Jack N', 16); -- Select the first two rows. SELECT name, age FROM person ORDER BY name LIMIT 2; @@ -86,3 +89,13 @@ SELECT name, age FROM person ORDER BY name LIMIT length('SPARK') |Shone S| 16| +-------+---+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) diff --git a/docs/sql-ref-syntax-qry-select-orderby.md b/docs/sql-ref-syntax-qry-select-orderby.md index 1a5d2d404e..c4b4ced0b7 100644 --- a/docs/sql-ref-syntax-qry-select-orderby.md +++ b/docs/sql-ref-syntax-qry-select-orderby.md @@ -19,8 +19,8 @@ license: | limitations under the License. --- The ORDER BY clause is used to return the result rows in a sorted manner -in the user specified order. Unlike the SORT BY clause, this clause guarantees -a total order in the output. +in the user specified order. Unlike the [SORT BY](sql-ref-syntax-qry-select-sortby.html) +clause, this clause guarantees a total order in the output. ### Syntax {% highlight sql %} @@ -141,3 +141,13 @@ SELECT * FROM person ORDER BY name ASC, age DESC; |300|Mike |80 | +---+-----+----+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) diff --git a/docs/sql-ref-syntax-qry-select-sortby.md b/docs/sql-ref-syntax-qry-select-sortby.md index ee2e006a79..c0a491b78e 100644 --- a/docs/sql-ref-syntax-qry-select-sortby.md +++ b/docs/sql-ref-syntax-qry-select-sortby.md @@ -21,7 +21,8 @@ license: | The SORT BY clause is used to return the result rows sorted within each partition in the user specified order. When there is more than one partition SORT BY may return result that is partially ordered. This is different -than ORDER BY clause which guarantees a total order of the output. +than [ORDER BY](sql-ref-syntax-qry-select-orderby.html) clause which guarantees a +total order of the output. ### Syntax {% highlight sql %} @@ -174,3 +175,13 @@ SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person |Lalit B.|null|94511 | +--------+----+--------+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) \ No newline at end of file diff --git a/docs/sql-ref-syntax-qry-select-where.md b/docs/sql-ref-syntax-qry-select-where.md index 09fab64bbe..a493623df4 100644 --- a/docs/sql-ref-syntax-qry-select-where.md +++ b/docs/sql-ref-syntax-qry-select-where.md @@ -39,10 +39,11 @@ WHERE boolean_expression ### Examples {% highlight sql %} CREATE TABLE person (id INT, name STRING, age INT); -INSERT INTO person VALUES (100, 'John', 30), - (200, 'Mary', NULL), - (300, 'Mike', 80), - (400, 'Dan', 50); +INSERT INTO person VALUES + (100, 'John', 30), + (200, 'Mary', NULL), + (300, 'Mike', 80), + (400, 'Dan', 50); -- Comparison operator in `WHERE` clause. SELECT * FROM person WHERE id > 200 ORDER BY id; @@ -111,3 +112,13 @@ WHERE EXISTS ( +---+----+----+ {% endhighlight %} + +### Related Clauses +- [SELECT Main](sql-ref-syntax-qry-select.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 05feda5f9a..00bd719004 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -134,3 +134,13 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } be referenced in the widow definitions in the query. + +### Related Clauses +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) \ No newline at end of file diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md index eb0e73d00e..cd7c0ffccd 100644 --- a/docs/sql-ref-syntax-qry.md +++ b/docs/sql-ref-syntax-qry.md @@ -1,7 +1,7 @@ --- layout: global -title: Reference -displayTitle: Reference +title: Data Retrieval +displayTitle: Data Retrieval license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -19,7 +19,20 @@ license: | limitations under the License. --- -Spark SQL is a Apache Spark's module for working with structured data. -This guide is a reference for Structured Query Language (SQL) for Apache -Spark. This document describes the SQL constructs supported by Spark in detail -along with usage examples when applicable. +Spark supports SELECT statement that is used to retrieve rows +from one or more tables according to the specified clauses. The full syntax +and brief description of supported clauses are explained in +[SELECT](sql-ref-syntax-qry-select.html) section. Spark also provides the +ability to generate logical and physical plan for a given query using +[EXPLAIN](sql-ref-syntax-qry-explain.html) statement. + + +- [WHERE Clause](sql-ref-syntax-qry-select-where.html) +- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) +- [HAVING Clause](sql-ref-syntax-qry-select-having.html) +- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) +- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) +- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) +- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) +- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) +- [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)