[SPARK-28801][DOC][FOLLOW-UP] Setup links and address other review comments
### What changes were proposed in this pull request? - Sets up links between related sections. - Add "Related sections" for each section. - Change to the left hand side menu to reflect the current status of the doc. - Other minor cleanups. ### Why are the changes needed? Currently Spark lacks documentation on the supported SQL constructs causing confusion among users who sometimes have to look at the code to understand the usage. This is aimed at addressing this issue. ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Tested using jykyll build --serve Closes #27371 from dilipbiswal/select_finalization. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
This commit is contained in:
parent
ec1fb6b4e1
commit
3e203c985c
|
@ -123,37 +123,27 @@
|
|||
- text: SELECT
|
||||
url: sql-ref-syntax-qry-select.html
|
||||
subitems:
|
||||
- text: DISTINCT Clause
|
||||
url: sql-ref-syntax-qry-select-distinct.html
|
||||
- text: Joins
|
||||
url: sql-ref-syntax-qry-select-join.html
|
||||
- text: ORDER BY Clause
|
||||
url: sql-ref-syntax-qry-select-orderby.html
|
||||
- text: WHERE Clause
|
||||
url: sql-ref-syntax-qry-select-where.html
|
||||
- text: GROUP BY Clause
|
||||
url: sql-ref-syntax-qry-select-groupby.html
|
||||
- text: HAVING Clause
|
||||
url: sql-ref-syntax-qry-select-having.html
|
||||
- text: ORDER BY Clause
|
||||
url: sql-ref-syntax-qry-select-orderby.html
|
||||
- text: SORT BY Clause
|
||||
url: sql-ref-syntax-qry-select-sortby.html
|
||||
- text: CLUSTER BY Clause
|
||||
url: sql-ref-syntax-qry-select-clusterby.html
|
||||
- text: DISTRIBUTE BY Clause
|
||||
url: sql-ref-syntax-qry-select-distribute-by.html
|
||||
- text: LIMIT Clause
|
||||
url: sql-ref-syntax-qry-select-limit.html
|
||||
- text: Set operations
|
||||
url: sql-ref-syntax-qry-select-setops.html
|
||||
- text: USE database
|
||||
url: sql-ref-syntax-qry-select-usedb.html
|
||||
- text: Common Table Expression(CTE)
|
||||
url: sql-ref-syntax-qry-select-cte.html
|
||||
- text: Subqueries
|
||||
url: sql-ref-syntax-qry-select-subqueries.html
|
||||
- text: Query hints
|
||||
url: sql-ref-syntax-qry-select-hints.html
|
||||
- text: SAMPLING
|
||||
url: sql-ref-syntax-qry-sampling.html
|
||||
- text: WINDOWING ANALYTIC FUNCTIONS
|
||||
url: sql-ref-syntax-qry-window.html
|
||||
- text: AGGREGATION (CUBE/ROLLUP/GROUPING)
|
||||
url: sql-ref-syntax-qry-aggregation.html
|
||||
- text: EXPLAIN
|
||||
url: sql-ref-syntax-qry-explain.html
|
||||
- text: Auxilarry Statements
|
||||
- text: Auxiliary Statements
|
||||
url: sql-ref-syntax-aux.html
|
||||
subitems:
|
||||
- text: Analyze statement
|
||||
|
|
|
@ -20,9 +20,10 @@ license: |
|
|||
---
|
||||
The <code>CLUSTER BY</code> clause is used to first repartition the data based
|
||||
on the input expressions and then sort the data within each partition. This is
|
||||
semantically equivalent to performing a <code>DISTRIBUTE BY</code> followed by
|
||||
a <code>SORT BY</code>. This clause only ensures that the resultant rows are
|
||||
sorted within each partition and does not guarantee a total order of output.
|
||||
semantically equivalent to performing a
|
||||
[DISTRIBUTE BY](sql-ref-syntax-qry-select-distribute-by.html) followed by a
|
||||
[SORT BY](sql-ref-syntax-qry-select-sortby.html). This clause only ensures that the
|
||||
resultant rows are sorted within each partition and does not guarantee a total order of output.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -86,3 +87,13 @@ SELECT age, name FROM person CLUSTER BY age;
|
|||
|16 |Jack N |
|
||||
+---+-------+
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
@ -19,8 +19,8 @@ license: |
|
|||
limitations under the License.
|
||||
---
|
||||
The <code>DISTRIBUTE BY</code> clause is used to repartition the data based
|
||||
on the input expressions. Unlike the `CLUSTER BY` clause, this does not
|
||||
sort the data within each partition.
|
||||
on the input expressions. Unlike the [CLUSTER BY](sql-ref-syntax-qry-select-clusterby.html)
|
||||
clause, this does not sort the data within each partition.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -82,3 +82,13 @@ SELECT age, name FROM person DISTRIBUTE BY age;
|
|||
|16 |Jack N |
|
||||
+---+-------+
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
@ -73,14 +73,15 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [
|
|||
### Examples
|
||||
{% highlight sql %}
|
||||
CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
|
||||
INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10),
|
||||
(100, 'Fremont', 'Honda Accord', 15),
|
||||
(100, 'Fremont', 'Honda CRV', 7),
|
||||
(200, 'Dublin', 'Honda Civic', 20),
|
||||
(200, 'Dublin', 'Honda Accord', 10),
|
||||
(200, 'Dublin', 'Honda CRV', 3),
|
||||
(300, 'San Jose', 'Honda Civic', 5),
|
||||
(300, 'San Jose', 'Honda Accord', 8);
|
||||
INSERT INTO dealer VALUES
|
||||
(100, 'Fremont', 'Honda Civic', 10),
|
||||
(100, 'Fremont', 'Honda Accord', 15),
|
||||
(100, 'Fremont', 'Honda CRV', 7),
|
||||
(200, 'Dublin', 'Honda Civic', 20),
|
||||
(200, 'Dublin', 'Honda Accord', 10),
|
||||
(200, 'Dublin', 'Honda CRV', 3),
|
||||
(300, 'San Jose', 'Honda Civic', 5),
|
||||
(300, 'San Jose', 'Honda Accord', 8);
|
||||
|
||||
-- Sum of quantity per dealership. Group by `id`.
|
||||
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
|
||||
|
@ -223,3 +224,13 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
|
|||
+--------+------------+---+
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
### Related clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||
|
|
|
@ -49,14 +49,15 @@ HAVING boolean_expression
|
|||
### Examples
|
||||
{% highlight sql %}
|
||||
CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
|
||||
INSERT INTO dealer VALUES (100, 'Fremont', 'Honda Civic', 10),
|
||||
(100, 'Fremont', 'Honda Accord', 15),
|
||||
(100, 'Fremont', 'Honda CRV', 7),
|
||||
(200, 'Dublin', 'Honda Civic', 20),
|
||||
(200, 'Dublin', 'Honda Accord', 10),
|
||||
(200, 'Dublin', 'Honda CRV', 3),
|
||||
(300, 'San Jose', 'Honda Civic', 5),
|
||||
(300, 'San Jose', 'Honda Accord', 8);
|
||||
INSERT INTO dealer VALUES
|
||||
(100, 'Fremont', 'Honda Civic', 10),
|
||||
(100, 'Fremont', 'Honda Accord', 15),
|
||||
(100, 'Fremont', 'Honda CRV', 7),
|
||||
(200, 'Dublin', 'Honda Civic', 20),
|
||||
(200, 'Dublin', 'Honda Accord', 10),
|
||||
(200, 'Dublin', 'Honda CRV', 3),
|
||||
(300, 'San Jose', 'Honda Civic', 5),
|
||||
(300, 'San Jose', 'Honda Accord', 8);
|
||||
|
||||
-- `HAVING` clause referring to column in `GROUP BY`.
|
||||
SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING city = 'Fremont';
|
||||
|
@ -117,3 +118,13 @@ SELECT sum(quantity) AS sum FROM dealer HAVING sum(quantity) > 10;
|
|||
+---+
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||
|
|
|
@ -18,8 +18,10 @@ license: |
|
|||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
---
|
||||
The <code>LIMIT</code> clause is used to constrain the number of rows returned by the <code>SELECT</code> statement.
|
||||
In general, this clause is used in conjuction with <code>ORDER BY</code> to ensure that the results are deterministic.
|
||||
The <code>LIMIT</code> clause is used to constrain the number of rows returned by
|
||||
the [SELECT](sql-ref-syntax-qry-select.html) statement. In general, this clause
|
||||
is used in conjuction with [ORDER BY](sql-ref-syntax-qry-select-orderby.html) to
|
||||
ensure that the results are deterministic.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -42,12 +44,13 @@ LIMIT { ALL | integer_expression }
|
|||
### Examples
|
||||
{% highlight sql %}
|
||||
CREATE TABLE person (name STRING, age INT);
|
||||
INSERT INTO person VALUES ('Zen Hui', 25),
|
||||
('Anil B', 18),
|
||||
('Shone S', 16),
|
||||
('Mike A', 25),
|
||||
('John A', 18),
|
||||
('Jack N', 16);
|
||||
INSERT INTO person VALUES
|
||||
('Zen Hui', 25),
|
||||
('Anil B', 18),
|
||||
('Shone S', 16),
|
||||
('Mike A', 25),
|
||||
('John A', 18),
|
||||
('Jack N', 16);
|
||||
|
||||
-- Select the first two rows.
|
||||
SELECT name, age FROM person ORDER BY name LIMIT 2;
|
||||
|
@ -86,3 +89,13 @@ SELECT name, age FROM person ORDER BY name LIMIT length('SPARK')
|
|||
|Shone S| 16|
|
||||
+-------+---+
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
|
|
|
@ -19,8 +19,8 @@ license: |
|
|||
limitations under the License.
|
||||
---
|
||||
The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
|
||||
in the user specified order. Unlike the <code>SORT BY</code> clause, this clause guarantees
|
||||
a total order in the output.
|
||||
in the user specified order. Unlike the [SORT BY](sql-ref-syntax-qry-select-sortby.html)
|
||||
clause, this clause guarantees a total order in the output.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -141,3 +141,13 @@ SELECT * FROM person ORDER BY name ASC, age DESC;
|
|||
|300|Mike |80 |
|
||||
+---+-----+----+
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||
|
|
|
@ -21,7 +21,8 @@ license: |
|
|||
The <code>SORT BY</code> clause is used to return the result rows sorted
|
||||
within each partition in the user specified order. When there is more than one partition
|
||||
<code>SORT BY</code> may return result that is partially ordered. This is different
|
||||
than <code>ORDER BY</code> clause which guarantees a total order of the output.
|
||||
than [ORDER BY](sql-ref-syntax-qry-select-orderby.html) clause which guarantees a
|
||||
total order of the output.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -174,3 +175,13 @@ SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person
|
|||
|Lalit B.|null|94511 |
|
||||
+--------+----+--------+
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
@ -39,10 +39,11 @@ WHERE boolean_expression
|
|||
### Examples
|
||||
{% highlight sql %}
|
||||
CREATE TABLE person (id INT, name STRING, age INT);
|
||||
INSERT INTO person VALUES (100, 'John', 30),
|
||||
(200, 'Mary', NULL),
|
||||
(300, 'Mike', 80),
|
||||
(400, 'Dan', 50);
|
||||
INSERT INTO person VALUES
|
||||
(100, 'John', 30),
|
||||
(200, 'Mary', NULL),
|
||||
(300, 'Mike', 80),
|
||||
(400, 'Dan', 50);
|
||||
|
||||
-- Comparison operator in `WHERE` clause.
|
||||
SELECT * FROM person WHERE id > 200 ORDER BY id;
|
||||
|
@ -111,3 +112,13 @@ WHERE EXISTS (
|
|||
+---+----+----+
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
### Related Clauses
|
||||
- [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||
|
|
|
@ -134,3 +134,13 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
|
|||
be referenced in the widow definitions in the query.
|
||||
</dd>
|
||||
</dl>
|
||||
|
||||
### Related Clauses
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: global
|
||||
title: Reference
|
||||
displayTitle: Reference
|
||||
title: Data Retrieval
|
||||
displayTitle: Data Retrieval
|
||||
license: |
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
|
@ -19,7 +19,20 @@ license: |
|
|||
limitations under the License.
|
||||
---
|
||||
|
||||
Spark SQL is a Apache Spark's module for working with structured data.
|
||||
This guide is a reference for Structured Query Language (SQL) for Apache
|
||||
Spark. This document describes the SQL constructs supported by Spark in detail
|
||||
along with usage examples when applicable.
|
||||
Spark supports <code>SELECT</code> statement that is used to retrieve rows
|
||||
from one or more tables according to the specified clauses. The full syntax
|
||||
and brief description of supported clauses are explained in
|
||||
[SELECT](sql-ref-syntax-qry-select.html) section. Spark also provides the
|
||||
ability to generate logical and physical plan for a given query using
|
||||
[EXPLAIN](sql-ref-syntax-qry-explain.html) statement.
|
||||
|
||||
|
||||
- [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||
- [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||
- [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||
- [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||
- [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||
- [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
||||
- [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||
- [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||
- [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
|
||||
|
|
Loading…
Reference in a new issue