spark-instrumented-optimizer/docs/sql-ref-syntax-qry-select-setops.md
Takeshi Yamamuro 179289f0bf [SPARK-31383][SQL][DOC] Clean up the SQL documents in docs/sql-ref*
### What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;

 - Fixes wrong syntaxes and capitalize sub-titles
 - Adds some DDL queries in `Examples` so that users can run examples there
 - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
 - Adds/Removes spaces, Indents, or blank lines to follow the format below;

```
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>

  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...
```

### Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Manually checked.

Closes #28151 from maropu/MakeRightAligned.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 23:40:36 -05:00

3.9 KiB

layout title displayTitle license
global Set Operators Set Operators Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Description

Set operators are used to combine two input relations into a single one. Spark SQL supports three types of set operators:

  • EXCEPT or MINUS
  • INTERSECT
  • UNION

Note that input relations must have the same number of columns and compatible data types for the respective columns.

EXCEPT

EXCEPT and EXCEPT ALL return the rows that are found in one relation but not the other. EXCEPT (alternatively, EXCEPT DISTINCT) takes only distinct rows while EXCEPT ALL does not remove duplicates from the result rows. Note that MINUS is an alias for EXCEPT.

Syntax

{% highlight sql %} [ ( ] relation [ ) ] EXCEPT | MINUS [ ALL | DISTINCT ] [ ( ] relation [ ) ] {% endhighlight %}

Examples

{% highlight sql %} -- Use number1 and number2 tables to demonstrate set operators in this page. SELECT * FROM number1; +---+ | c| +---+ | 3| | 1| | 2| | 2| | 3| | 4| +---+

SELECT * FROM number2; +---+ | c| +---+ | 5| | 1| | 2| | 2| +---+

SELECT c FROM number1 EXCEPT SELECT c FROM number2; +---+ | c| +---+ | 3| | 4| +---+

SELECT c FROM number1 MINUS SELECT c FROM number2; +---+ | c| +---+ | 3| | 4| +---+

SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2); +---+ | c| +---+ | 3| | 3| | 4| +---+

SELECT c FROM number1 MINUS ALL (SELECT c FROM number2); +---+ | c| +---+ | 3| | 3| | 4| +---+ {% endhighlight %}

INTERSECT

INTERSECT and INTERSECT ALL return the rows that are found in both relations. INTERSECT (alternatively, INTERSECT DISTINCT) takes only distinct rows while INTERSECT ALL does not remove duplicates from the result rows.

Syntax

{% highlight sql %} [ ( ] relation [ ) ] INTERSECT [ ALL | DISTINCT ] [ ( ] relation [ ) ] {% endhighlight %}

Examples

{% highlight sql %} (SELECT c FROM number1) INTERSECT (SELECT c FROM number2); +---+ | c| +---+ | 1| | 2| +---+

(SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2); +---+ | c| +---+ | 1| | 2| +---+

(SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2); +---+ | c| +---+ | 1| | 2| | 2| +---+ {% endhighlight %}

UNION

UNION and UNION ALL return the rows that are found in either relation. UNION (alternatively, UNION DISTINCT) takes only distinct rows while UNION ALL does not remove duplicates from the result rows.

Syntax

{% highlight sql %} [ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ] {% endhighlight %}

Examples

{% highlight sql %} (SELECT c FROM number1) UNION (SELECT c FROM number2); +---+ | c| +---+ | 1| | 3| | 5| | 4| | 2| +---+

(SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2); +---+ | c| +---+ | 1| | 3| | 5| | 4| | 2| +---+

SELECT c FROM number1 UNION ALL (SELECT c FROM number2); +---+ | c| +---+ | 3| | 1| | 2| | 2| | 3| | 4| | 5| | 1| | 2| | 2| +---+ {% endhighlight %}