Commit graph

9 commits

Author SHA1 Message Date
Wenchen Fan 87409c42bc [SPARK-31891][SQL][DOCS][FOLLOWUP] Fix typo in the description of MSCK REPAIR TABLE
### What changes were proposed in this pull request?
Fix typo and highlight that `ADD PARTITIONS` is the default.

### Why are the changes needed?
Fix a typo which can mislead users.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
n/a

Closes #31633 from MaxGekk/repair-table-drop-partitions-followup.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-24 21:13:58 +09:00
Max Gekk 7f27d33a3c [SPARK-31891][SQL] Support MSCK REPAIR TABLE .. [{ADD|DROP|SYNC} PARTITIONS]
### What changes were proposed in this pull request?

In the PR, I propose to extend the `MSCK REPAIR TABLE` command, and support new options `{ADD|DROP|SYNC} PARTITIONS`. In particular:

1. Extend the logical node `RepairTable`, and add two new flags `enableAddPartitions` and `enableDropPartitions`.
2. Add similar flags to the v1 execution node `AlterTableRecoverPartitionsCommand`
3. Add new method `dropPartitions()` to `AlterTableRecoverPartitionsCommand` which drops partitions from the catalog if their locations in the file system don't exist.
4. Updated public docs about the `MSCK REPAIR TABLE` command:
<img width="1037" alt="Screenshot 2021-02-16 at 13 46 39" src="https://user-images.githubusercontent.com/1580697/108052607-7446d280-705d-11eb-8e25-7398254787a4.png">

Closes #31097

### Why are the changes needed?
- The changes allow to recover tables with removed partitions. The example below portraits the problem:
```sql
spark-sql> create table tbl2 (col int, part int) partitioned by (part);
spark-sql> insert into tbl2 partition (part=1) select 1;
spark-sql> insert into tbl2 partition (part=0) select 0;
spark-sql> show table extended like 'tbl2' partition (part = 0);
default	tbl2	false	Partition Values: [part=0]
Location: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
...
```
Remove the partition (part = 0) from the filesystem:
```
$ rm -rf /Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
```
Even after recovering, we cannot query the table:
```sql
spark-sql> msck repair table tbl2;
spark-sql> select * from tbl2;
21/01/08 22:49:13 ERROR SparkSQLDriver: Failed in [select * from tbl2]
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
```

- To have feature parity with Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, we can query recovered table:
```sql
spark-sql> msck repair table tbl2 sync partitions;
spark-sql> select * from tbl2;
1	1
spark-sql> show partitions tbl2;
part=1
```

### How was this patch tested?
- By running the modified test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableParserSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *PlanResolutionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsParallelSuite"
```
- Added unified v1 and v2 tests for `MSCK REPAIR TABLE`:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableSuite"
```

Closes #31499 from MaxGekk/repair-table-drop-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-02-23 13:45:15 -08:00
Max Gekk 6ea4b5fda7 [SPARK-34401][SQL][DOCS] Update docs about altering cached tables/views
### What changes were proposed in this pull request?
Update public docs of SQL commands about altering cached tables/views. For instance:
<img width="869" alt="Screenshot 2021-02-08 at 15 11 48" src="https://user-images.githubusercontent.com/1580697/107217940-fd3b8980-6a1f-11eb-98b9-9b2e3fe7f4ef.png">

### Why are the changes needed?
To inform users about commands behavior in altering cached tables or views.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the command below and manually checking the docs:
```
$ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch
```

Closes #31524 from MaxGekk/doc-cmd-caching.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-02-22 04:32:09 +00:00
Huaxin Gao a75dc80a76 [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
### What changes were proposed in this pull request?
Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414

### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually build and check

Closes #28451 from huaxingao/html_cleanup.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-05-10 12:57:25 -05:00
Huaxin Gao 75da05038b [MINOR][SQL][DOCS] Remove two leading spaces from sql tables
### What changes were proposed in this pull request?
Remove two leading spaces from sql tables.

### Why are the changes needed?

Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

### Does this PR introduce any user-facing change?

before
```
SELECT * FROM  test;
  +-+
  ...
  +-+
```
after
```
SELECT * FROM  test;
+-+
...
+-+
```

### How was this patch tested?
Manually build and check

Closes #28348 from huaxingao/sql-format.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2020-05-01 10:11:43 -07:00
Takeshi Yamamuro 179289f0bf [SPARK-31383][SQL][DOC] Clean up the SQL documents in docs/sql-ref*
### What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;

 - Fixes wrong syntaxes and capitalize sub-titles
 - Adds some DDL queries in `Examples` so that users can run examples there
 - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
 - Adds/Removes spaces, Indents, or blank lines to follow the format below;

```
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>

  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...
```

### Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Manually checked.

Closes #28151 from maropu/MakeRightAligned.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 23:40:36 -05:00
Huaxin Gao babefdee1c [SPARK-30085][SQL][DOC] Standardize sql reference
### What changes were proposed in this pull request?
Standardize sql reference

### Why are the changes needed?
To have consistent docs

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Tested using jykyll build --serve

Closes #26721 from huaxingao/spark-30085.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-12-02 09:05:40 -06:00
Huaxin Gao 5a512e86e9 [SPARK-28800][DOC][SQL] Document REPAIR TABLE statement in SQL Reference
### What changes were proposed in this pull request?
Document REPAIR TABLE statement in SQL Reference.

### Why are the changes needed?
To complete SQL reference.

### Does this PR introduce any user-facing change?
Yes.

After the change, we will have the following
![image](https://user-images.githubusercontent.com/13592258/66271480-461f7480-e813-11e9-9b40-cbffec1221ae.png)

![image](https://user-images.githubusercontent.com/13592258/66261968-4fb1c980-e78c-11e9-9db0-fcd6f458fd39.png)

### How was this patch tested?
Tested using jykyll build --serve

Closes #25884 from huaxingao/spark-28800.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-06 11:19:13 -05:00
Dilip Biswal a5df5ff0fd [SPARK-28734][DOC] Initial table of content in the left hand side bar for SQL doc
## What changes were proposed in this pull request?
This is a initial PR that creates the table of content for SQL reference guide. The left side bar will displays additional menu items corresponding to supported SQL constructs. One this PR is merged, we will fill in the content incrementally.  Additionally this PR contains a minor change to make the left sidebar scrollable. Currently it is not possible to scroll in the left hand side window.

## How was this patch tested?
Used jekyll build and serve to verify.

Closes #25459 from dilipbiswal/ref-doc.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2019-08-18 23:17:50 -07:00