[SPARK-28801][DOC][FOLLOW-UP] Setup links and address other review comments

### What changes were proposed in this pull request?

- Sets up links between related sections.
- Add "Related sections" for each section.
- Change to the left hand side menu to reflect the current status of the doc.
- Other minor cleanups.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

### How was this patch tested?
Tested using jykyll build --serve

Closes #27371 from dilipbiswal/select_finalization.

Authored-by: Dilip Biswal <dkbiswal@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
This commit is contained in:
Dilip Biswal 2020-01-29 08:41:40 -06:00 committed by Sean Owen
parent ec1fb6b4e1
commit 3e203c985c
11 changed files with 164 additions and 63 deletions

View file

@ -123,37 +123,27 @@
- text: SELECT
url: sql-ref-syntax-qry-select.html
subitems:
- text: DISTINCT Clause
url: sql-ref-syntax-qry-select-distinct.html
- text: Joins
url: sql-ref-syntax-qry-select-join.html
- text: ORDER BY Clause
url: sql-ref-syntax-qry-select-orderby.html
- text: WHERE Clause
url: sql-ref-syntax-qry-select-where.html
- text: GROUP BY Clause
url: sql-ref-syntax-qry-select-groupby.html
- text: HAVING Clause
url: sql-ref-syntax-qry-select-having.html
- text: ORDER BY Clause
url: sql-ref-syntax-qry-select-orderby.html
- text: SORT BY Clause
url: sql-ref-syntax-qry-select-sortby.html
- text: CLUSTER BY Clause
url: sql-ref-syntax-qry-select-clusterby.html
- text: DISTRIBUTE BY Clause
url: sql-ref-syntax-qry-select-distribute-by.html
- text: LIMIT Clause
url: sql-ref-syntax-qry-select-limit.html
- text: Set operations
url: sql-ref-syntax-qry-select-setops.html
- text: USE database
url: sql-ref-syntax-qry-select-usedb.html
- text: Common Table Expression(CTE)
url: sql-ref-syntax-qry-select-cte.html
- text: Subqueries
url: sql-ref-syntax-qry-select-subqueries.html
- text: Query hints
url: sql-ref-syntax-qry-select-hints.html
- text: SAMPLING
url: sql-ref-syntax-qry-sampling.html
- text: WINDOWING ANALYTIC FUNCTIONS
url: sql-ref-syntax-qry-window.html
- text: AGGREGATION (CUBE/ROLLUP/GROUPING)
url: sql-ref-syntax-qry-aggregation.html
- text: EXPLAIN
url: sql-ref-syntax-qry-explain.html
- text: Auxilarry Statements
- text: Auxiliary Statements
url: sql-ref-syntax-aux.html
subitems:
- text: Analyze statement

View file

@ -20,9 +20,10 @@ license: |
---
The <code>CLUSTER BY</code> clause is used to first repartition the data based
on the input expressions and then sort the data within each partition. This is
semantically equivalent to performing a <code>DISTRIBUTE BY</code> followed by
a <code>SORT BY</code>. This clause only ensures that the resultant rows are
sorted within each partition and does not guarantee a total order of output.
semantically equivalent to performing a
[DISTRIBUTE BY](sql-ref-syntax-qry-select-distribute-by.html) followed by a
[SORT BY](sql-ref-syntax-qry-select-sortby.html). This clause only ensures that the
resultant rows are sorted within each partition and does not guarantee a total order of output.
### Syntax
{% highlight sql %}
@ -86,3 +87,13 @@ SELECT age, name FROM person CLUSTER BY age;
|16 |Jack N |
+---+-------+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -19,8 +19,8 @@ license: |
limitations under the License.
---
The <code>DISTRIBUTE BY</code> clause is used to repartition the data based
on the input expressions. Unlike the `CLUSTER BY` clause, this does not
sort the data within each partition.
on the input expressions. Unlike the [CLUSTER BY](sql-ref-syntax-qry-select-clusterby.html)
clause, this does not sort the data within each partition.
### Syntax
{% highlight sql %}
@ -82,3 +82,13 @@ SELECT age, name FROM person DISTRIBUTE BY age;
|16 |Jack N |
+---+-------+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -73,14 +73,15 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [
### Examples
{% highlight sql %}
CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
INSERT INTO dealer VALUES
(100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
-- Sum of quantity per dealership. Group by `id`.
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
@ -223,3 +224,13 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
+--------+------------+---+
{% endhighlight %}
### Related clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -49,14 +49,15 @@ HAVING boolean_expression
### Examples
{% highlight sql %}
CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
INSERT INTO dealer VALUES
(100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
-- `HAVING` clause referring to column in `GROUP BY`.
SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING city = 'Fremont';
@ -117,3 +118,13 @@ SELECT sum(quantity) AS sum FROM dealer HAVING sum(quantity) > 10;
+---+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -18,8 +18,10 @@ license: |
See the License for the specific language governing permissions and
limitations under the License.
---
The <code>LIMIT</code> clause is used to constrain the number of rows returned by the <code>SELECT</code> statement.
In general, this clause is used in conjuction with <code>ORDER BY</code> to ensure that the results are deterministic.
The <code>LIMIT</code> clause is used to constrain the number of rows returned by
the [SELECT](sql-ref-syntax-qry-select.html) statement. In general, this clause
is used in conjuction with [ORDER BY](sql-ref-syntax-qry-select-orderby.html) to
ensure that the results are deterministic.
### Syntax
{% highlight sql %}
@ -42,12 +44,13 @@ LIMIT { ALL | integer_expression }
### Examples
{% highlight sql %}
CREATE TABLE person (name STRING, age INT);
INSERT INTO person VALUES ('Zen Hui', 25),
('Anil B', 18),
('Shone S', 16),
('Mike A', 25),
('John A', 18),
('Jack N', 16);
INSERT INTO person VALUES
('Zen Hui', 25),
('Anil B', 18),
('Shone S', 16),
('Mike A', 25),
('John A', 18),
('Jack N', 16);
-- Select the first two rows.
SELECT name, age FROM person ORDER BY name LIMIT 2;
@ -86,3 +89,13 @@ SELECT name, age FROM person ORDER BY name LIMIT length('SPARK')
|Shone S| 16|
+-------+---+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)

View file

@ -19,8 +19,8 @@ license: |
limitations under the License.
---
The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
in the user specified order. Unlike the <code>SORT BY</code> clause, this clause guarantees
a total order in the output.
in the user specified order. Unlike the [SORT BY](sql-ref-syntax-qry-select-sortby.html)
clause, this clause guarantees a total order in the output.
### Syntax
{% highlight sql %}
@ -141,3 +141,13 @@ SELECT * FROM person ORDER BY name ASC, age DESC;
|300|Mike |80 |
+---+-----+----+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -21,7 +21,8 @@ license: |
The <code>SORT BY</code> clause is used to return the result rows sorted
within each partition in the user specified order. When there is more than one partition
<code>SORT BY</code> may return result that is partially ordered. This is different
than <code>ORDER BY</code> clause which guarantees a total order of the output.
than [ORDER BY](sql-ref-syntax-qry-select-orderby.html) clause which guarantees a
total order of the output.
### Syntax
{% highlight sql %}
@ -174,3 +175,13 @@ SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person
|Lalit B.|null|94511 |
+--------+----+--------+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -39,10 +39,11 @@ WHERE boolean_expression
### Examples
{% highlight sql %}
CREATE TABLE person (id INT, name STRING, age INT);
INSERT INTO person VALUES (100, 'John', 30),
(200, 'Mary', NULL),
(300, 'Mike', 80),
(400, 'Dan', 50);
INSERT INTO person VALUES
(100, 'John', 30),
(200, 'Mary', NULL),
(300, 'Mike', 80),
(400, 'Dan', 50);
-- Comparison operator in `WHERE` clause.
SELECT * FROM person WHERE id > 200 ORDER BY id;
@ -111,3 +112,13 @@ WHERE EXISTS (
+---+----+----+
{% endhighlight %}
### Related Clauses
- [SELECT Main](sql-ref-syntax-qry-select.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -134,3 +134,13 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
be referenced in the widow definitions in the query.
</dd>
</dl>
### Related Clauses
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)

View file

@ -1,7 +1,7 @@
---
layout: global
title: Reference
displayTitle: Reference
title: Data Retrieval
displayTitle: Data Retrieval
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@ -19,7 +19,20 @@ license: |
limitations under the License.
---
Spark SQL is a Apache Spark's module for working with structured data.
This guide is a reference for Structured Query Language (SQL) for Apache
Spark. This document describes the SQL constructs supported by Spark in detail
along with usage examples when applicable.
Spark supports <code>SELECT</code> statement that is used to retrieve rows
from one or more tables according to the specified clauses. The full syntax
and brief description of supported clauses are explained in
[SELECT](sql-ref-syntax-qry-select.html) section. Spark also provides the
ability to generate logical and physical plan for a given query using
[EXPLAIN](sql-ref-syntax-qry-explain.html) statement.
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
- [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)