[SPARK-28794][SQL][DOC] Documentation for Create table Command

### What changes were proposed in this pull request?
Document CREATE TABLE statement in SQL Reference Guide.

### Why are the changes needed?
Adding documentation for SQL reference.

### Does this PR introduce any user-facing change?
yes

Before:
There was no documentation for this.

### How was this patch tested?
Used jekyll build and serve to verify.

Closes #26759 from PavithraRamachandran/create_doc.

Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
This commit is contained in:
Pavithra Ramachandran 2020-01-23 11:29:13 -06:00 committed by Sean Owen
parent 3c8b3609a1
commit afe70b3b53
4 changed files with 345 additions and 1 deletions

View file

@ -0,0 +1,115 @@
---
layout: global
title: CREATE DATASOURCE TABLE
displayTitle: CREATE DATASOURCE TABLE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
The `CREATE TABLE` statement defines a new table using a Data Source.
### Syntax
{% highlight sql %}
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
USING data_source
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name3, col_name4, ... )
[ SORTED BY ( col_name [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
{% endhighlight %}
### Parameters
<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b>
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>
<dl>
<dt><code><em>USING data_source</em></code></dt>
<dd>Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.</dd>
</dl>
<dl>
<dt><code><em>PARTITIONED BY</em></code></dt>
<dd>Partitions are created on the table, based on the columns specified.</dd>
</dl>
<dl>
<dt><code><em>CLUSTERED BY</em></code></dt>
<dd>
Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.<br><br>
<b>NOTE:</b>Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.<br>
<dt><code><em>SORTED BY</em></code></dt>
<dd>Determines the order in which the data is stored in buckets. Default is Ascending order.</dd>
</dd>
</dl>
<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.</dd>
</dl>
<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
</dl>
<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>
<dl>
<dt><code><em>AS select_statement</em></code></dt>
<dd>The table is populated using the data from the select statement.</dd>
</dl>
### Examples
{% highlight sql %}
--Using data source
CREATE TABLE Student (Id INT,name STRING ,age INT) USING CSV;
--Using data from another table
CREATE TABLE StudentInfo
AS SELECT * FROM Student;
--Partitioned and bucketed
CREATE TABLE Student (Id INT,name STRING ,age INT)
USING CSV
PARTITIONED BY (age)
CLUSTERED BY (Id) INTO 4 buckets;
{% endhighlight %}
### Related Statements
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)

View file

@ -0,0 +1,122 @@
---
layout: global
title: CREATE HIVEFORMAT TABLE
displayTitle: CREATE HIVEFORMAT TABLE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
The `CREATE TABLE` statement defines a new table using Hive format.
### Syntax
{% highlight sql %}
CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
{% endhighlight %}
### Parameters
<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b>
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>
<dl>
<dt><code><em>EXTERNAL</em></code></dt>
<dd>Table is defined using the path provided as LOCATION, does not use default location for this table.</dd>
</dl>
<dl>
<dt><code><em>PARTITIONED BY</em></code></dt>
<dd>Partitions are created on the table, based on the columns specified.</dd>
</dl>
<dl>
<dt><code><em>ROW FORMAT</em></code></dt>
<dd>SERDE is used to specify a custom SerDe or the DELIMITED clause in order to use the native SerDe.</dd>
</dl>
<dl>
<dt><code><em>STORED AS</em></code></dt>
<dd>File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.</dd>
</dl>
<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored, Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.</dd>
</dl>
<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
</dl>
<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>
Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>
<dl>
<dt><code><em>AS select_statement</em></code></dt>
<dd>The table is populated using the data from the select statement.</dd>
</dl>
### Examples
{% highlight sql %}
--Using Comment and loading data from another table into the created table
CREATE TABLE StudentInfo
COMMENT 'Table is created using existing data'
AS SELECT * FROM Student;
--Partitioned table
CREATE TABLE Student (Id INT,name STRING)
PARTITIONED BY (age INT)
TBLPROPERTIES ('owner'='xxxx');
CREATE TABLE Student (Id INT,name STRING,age INT)
PARTITIONED BY (name,age);
--Using Row Format and file format
CREATE TABLE Student (Id INT,name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
{% endhighlight %}
### Related Statements
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)

View file

@ -0,0 +1,97 @@
---
layout: global
title: CREATE TABLE LIKE
displayTitle: CREATE TABLE LIKE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
The `CREATE TABLE` statement defines a new table using the definition/metadata of an existing table or view.
### Syntax
{% highlight sql %}
CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier
USING data_source
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ LOCATION path ]
{% endhighlight %}
### Parameters
<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b> [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>
<dl>
<dt><code><em>USING data_source</em></code></dt>
<dd>Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.</dd>
</dl>
<dl>
<dt><code><em>ROW FORMAT</em></code></dt>
<dd>SERDE is used to specify a custom SerDe or the DELIMITED clause in order to use the native SerDe.</dd>
</dl>
<dl>
<dt><code><em>STORED AS</em></code></dt>
<dd>File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.</dd>
</dl>
<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>
<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored,Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc. Location to create an external table.</dd>
</dl>
### Examples
{% highlight sql %}
--Create table using an exsisting table
CREATE TABLE Student_Dupli like Student;
--Create table like using a data source
CREATE TABLE Student_Dupli like Student USING CSV;
--Table is created as external table at the location specified
CREATE TABLE Student_Dupli like Student location '/root1/home';
--Create table like using a rowformat
CREATE TABLE Student_Dupli like Student
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES ('owner'='xxxx');
{% endhighlight %}
### Related Statements
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)

View file

@ -19,4 +19,14 @@ license: |
limitations under the License.
---
**This page is under construction**
### Description
`CREATE TABLE` statement is used to define a table in an exsisting database.
The CREATE statements:
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)
### Related Statements
- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)