2018-10-18 14:59:06 -04:00
|
|
|
- text: Getting Started
|
|
|
|
url: sql-getting-started.html
|
|
|
|
subitems:
|
|
|
|
- text: "Starting Point: SparkSession"
|
|
|
|
url: sql-getting-started.html#starting-point-sparksession
|
|
|
|
- text: Creating DataFrames
|
|
|
|
url: sql-getting-started.html#creating-dataframes
|
|
|
|
- text: Untyped Dataset Operations (DataFrame operations)
|
|
|
|
url: sql-getting-started.html#untyped-dataset-operations-aka-dataframe-operations
|
|
|
|
- text: Running SQL Queries Programmatically
|
|
|
|
url: sql-getting-started.html#running-sql-queries-programmatically
|
|
|
|
- text: Global Temporary View
|
|
|
|
url: sql-getting-started.html#global-temporary-view
|
|
|
|
- text: Creating Datasets
|
|
|
|
url: sql-getting-started.html#creating-datasets
|
|
|
|
- text: Interoperating with RDDs
|
|
|
|
url: sql-getting-started.html#interoperating-with-rdds
|
2019-12-27 00:22:26 -05:00
|
|
|
- text: Scalar Functions
|
|
|
|
url: sql-getting-started.html#scalar-functions
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Aggregations
|
|
|
|
url: sql-getting-started.html#aggregations
|
|
|
|
- text: Data Sources
|
|
|
|
url: sql-data-sources.html
|
|
|
|
subitems:
|
|
|
|
- text: "Generic Load/Save Functions"
|
|
|
|
url: sql-data-sources-load-save-functions.html
|
2020-02-05 04:16:38 -05:00
|
|
|
- text: "Generic File Source Options"
|
|
|
|
url: sql-data-sources-generic-options.html
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Parquet Files
|
|
|
|
url: sql-data-sources-parquet.html
|
|
|
|
- text: ORC Files
|
|
|
|
url: sql-data-sources-orc.html
|
|
|
|
- text: JSON Files
|
|
|
|
url: sql-data-sources-json.html
|
|
|
|
- text: Hive Tables
|
|
|
|
url: sql-data-sources-hive-tables.html
|
|
|
|
- text: JDBC To Other Databases
|
|
|
|
url: sql-data-sources-jdbc.html
|
|
|
|
- text: Avro Files
|
|
|
|
url: sql-data-sources-avro.html
|
2019-12-27 00:22:26 -05:00
|
|
|
- text: Whole Binary Files
|
|
|
|
url: sql-data-sources-binaryFile.html
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Troubleshooting
|
|
|
|
url: sql-data-sources-troubleshooting.html
|
2018-10-23 00:19:31 -04:00
|
|
|
- text: Performance Tuning
|
|
|
|
url: sql-performance-tuning.html
|
2018-10-18 14:59:06 -04:00
|
|
|
subitems:
|
|
|
|
- text: Caching Data In Memory
|
2018-10-23 00:19:31 -04:00
|
|
|
url: sql-performance-tuning.html#caching-data-in-memory
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Other Configuration Options
|
2018-10-23 00:19:31 -04:00
|
|
|
url: sql-performance-tuning.html#other-configuration-options
|
2019-12-27 00:22:26 -05:00
|
|
|
- text: Join Strategy Hints for SQL Queries
|
|
|
|
url: sql-performance-tuning.html#join-strategy-hints-for-sql-queries
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Distributed SQL Engine
|
|
|
|
url: sql-distributed-sql-engine.html
|
|
|
|
subitems:
|
|
|
|
- text: "Running the Thrift JDBC/ODBC server"
|
|
|
|
url: sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server
|
|
|
|
- text: Running the Spark SQL CLI
|
|
|
|
url: sql-distributed-sql-engine.html#running-the-spark-sql-cli
|
|
|
|
- text: PySpark Usage Guide for Pandas with Apache Arrow
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html
|
|
|
|
subitems:
|
|
|
|
- text: Apache Arrow in Spark
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html#apache-arrow-in-spark
|
|
|
|
- text: "Enabling for Conversion to/from Pandas"
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas
|
|
|
|
- text: "Pandas UDFs (a.k.a. Vectorized UDFs)"
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html#pandas-udfs-aka-vectorized-udfs
|
2020-03-28 21:36:34 -04:00
|
|
|
- text: "Pandas Function APIs"
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html#pandas-function-apis
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: Usage Notes
|
|
|
|
url: sql-pyspark-pandas-with-arrow.html#usage-notes
|
|
|
|
- text: Migration Guide
|
[SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation
### What changes were proposed in this pull request?
Currently, there is no migration section for PySpark, SparkCore and Structured Streaming.
It is difficult for users to know what to do when they upgrade.
This PR proposes to create create a "Migration Guide" tap at Spark documentation.
![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png)
![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png)
This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring.
There are some new information added, which I will leave a comment inlined for easier review.
1. **MLlib**
Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html)
```
'docs/ml-guide.md'
↓ Merge new/old migration guides
'docs/ml-migration-guide.md'
```
2. **PySpark**
Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html
```
'docs/sql-migration-guide-upgrade.md'
↓ Extract PySpark specific items
'docs/pyspark-migration-guide.md'
```
3. **SparkR**
Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)
```
'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md'
Move migration guide section ↘ ↙ Extract SparkR specific items
docs/sparkr-migration-guide.md
```
4. **Core**
Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.
5. **Structured Streaming**
Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note.
6. **SQL**
Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
```
'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md'
Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items
'docs/sql-migration-guide.md'
```
### Why are the changes needed?
In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating.
### Does this PR introduce any user-facing change?
Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html.
### How was this patch tested?
Manually build the doc. This can be verified as below:
```bash
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```
Closes #25757 from HyukjinKwon/migration-doc.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 14:17:30 -04:00
|
|
|
url: sql-migration-old.html
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: SQL Reference
|
|
|
|
url: sql-ref.html
|
2018-10-18 14:59:06 -04:00
|
|
|
subitems:
|
|
|
|
- text: Data Types
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-datatypes.html
|
2019-09-09 16:41:17 -04:00
|
|
|
- text: Null Semantics
|
|
|
|
url: sql-ref-null-semantics.html
|
2018-10-18 14:59:06 -04:00
|
|
|
- text: NaN Semantics
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-nan-semantics.html
|
2020-02-13 13:53:55 -05:00
|
|
|
- text: ANSI Compliance
|
|
|
|
url: sql-ref-ansi-compliance.html
|
|
|
|
subitems:
|
|
|
|
- text: Arithmetic Operations
|
|
|
|
url: sql-ref-ansi-compliance.html#arithmetic-operations
|
|
|
|
- text: Type Conversion
|
|
|
|
url: sql-ref-ansi-compliance.html#type-conversion
|
|
|
|
- text: SQL Keywords
|
|
|
|
url: sql-ref-ansi-compliance.html#sql-keywords
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: SQL Syntax
|
|
|
|
url: sql-ref-syntax.html
|
|
|
|
subitems:
|
|
|
|
- text: Data Definition Statements
|
|
|
|
url: sql-ref-syntax-ddl.html
|
|
|
|
subitems:
|
|
|
|
- text: ALTER DATABASE
|
|
|
|
url: sql-ref-syntax-ddl-alter-database.html
|
|
|
|
- text: ALTER TABLE
|
|
|
|
url: sql-ref-syntax-ddl-alter-table.html
|
|
|
|
- text: ALTER VIEW
|
|
|
|
url: sql-ref-syntax-ddl-alter-view.html
|
|
|
|
- text: CREATE DATABASE
|
|
|
|
url: sql-ref-syntax-ddl-create-database.html
|
|
|
|
- text: CREATE FUNCTION
|
|
|
|
url: sql-ref-syntax-ddl-create-function.html
|
|
|
|
- text: CREATE TABLE
|
|
|
|
url: sql-ref-syntax-ddl-create-table.html
|
|
|
|
- text: CREATE VIEW
|
|
|
|
url: sql-ref-syntax-ddl-create-view.html
|
|
|
|
- text: DROP DATABASE
|
|
|
|
url: sql-ref-syntax-ddl-drop-database.html
|
|
|
|
- text: DROP FUNCTION
|
|
|
|
url: sql-ref-syntax-ddl-drop-function.html
|
|
|
|
- text: DROP TABLE
|
|
|
|
url: sql-ref-syntax-ddl-drop-table.html
|
|
|
|
- text: DROP VIEW
|
|
|
|
url: sql-ref-syntax-ddl-drop-view.html
|
|
|
|
- text: TRUNCATE TABLE
|
|
|
|
url: sql-ref-syntax-ddl-truncate-table.html
|
|
|
|
- text: REPAIR TABLE
|
|
|
|
url: sql-ref-syntax-ddl-repair-table.html
|
2020-03-31 19:42:15 -04:00
|
|
|
- text: USE DATABASE
|
|
|
|
url: sql-ref-syntax-qry-select-usedb.html
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: Data Manipulation Statements
|
|
|
|
url: sql-ref-syntax-dml.html
|
|
|
|
subitems:
|
|
|
|
- text: INSERT
|
|
|
|
url: sql-ref-syntax-dml-insert.html
|
|
|
|
- text: LOAD
|
|
|
|
url: sql-ref-syntax-dml-load.html
|
|
|
|
- text: Data Retrieval(Queries)
|
|
|
|
url: sql-ref-syntax-qry.html
|
|
|
|
subitems:
|
|
|
|
- text: SELECT
|
|
|
|
url: sql-ref-syntax-qry-select.html
|
|
|
|
subitems:
|
2020-01-29 09:41:40 -05:00
|
|
|
- text: WHERE Clause
|
|
|
|
url: sql-ref-syntax-qry-select-where.html
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: GROUP BY Clause
|
|
|
|
url: sql-ref-syntax-qry-select-groupby.html
|
|
|
|
- text: HAVING Clause
|
|
|
|
url: sql-ref-syntax-qry-select-having.html
|
2020-01-29 09:41:40 -05:00
|
|
|
- text: ORDER BY Clause
|
|
|
|
url: sql-ref-syntax-qry-select-orderby.html
|
|
|
|
- text: SORT BY Clause
|
|
|
|
url: sql-ref-syntax-qry-select-sortby.html
|
|
|
|
- text: CLUSTER BY Clause
|
|
|
|
url: sql-ref-syntax-qry-select-clusterby.html
|
|
|
|
- text: DISTRIBUTE BY Clause
|
|
|
|
url: sql-ref-syntax-qry-select-distribute-by.html
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: LIMIT Clause
|
|
|
|
url: sql-ref-syntax-qry-select-limit.html
|
2020-04-06 10:02:22 -04:00
|
|
|
- text: Join Hints
|
|
|
|
url: sql-ref-syntax-qry-select-hints.html
|
2019-08-19 02:17:50 -04:00
|
|
|
- text: EXPLAIN
|
|
|
|
url: sql-ref-syntax-qry-explain.html
|
2020-01-29 09:41:40 -05:00
|
|
|
- text: Auxiliary Statements
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux.html
|
|
|
|
subitems:
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: ANALYZE
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-analyze.html
|
|
|
|
subitems:
|
|
|
|
- text: ANALYZE TABLE
|
|
|
|
url: sql-ref-syntax-aux-analyze-table.html
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: CACHE
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-cache.html
|
|
|
|
subitems:
|
|
|
|
- text: CACHE TABLE
|
|
|
|
url: sql-ref-syntax-aux-cache-cache-table.html
|
|
|
|
- text: UNCACHE TABLE
|
|
|
|
url: sql-ref-syntax-aux-cache-uncache-table.html
|
|
|
|
- text: CLEAR CACHE
|
|
|
|
url: sql-ref-syntax-aux-cache-clear-cache.html
|
2019-09-13 02:00:42 -04:00
|
|
|
- text: REFRESH TABLE
|
|
|
|
url: sql-ref-syntax-aux-refresh-table.html
|
2019-12-31 10:36:41 -05:00
|
|
|
- text: REFRESH
|
2020-03-29 12:19:24 -04:00
|
|
|
url: sql-ref-syntax-aux-cache-refresh.html
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: DESCRIBE
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-describe.html
|
|
|
|
subitems:
|
|
|
|
- text: DESCRIBE DATABASE
|
|
|
|
url: sql-ref-syntax-aux-describe-database.html
|
|
|
|
- text: DESCRIBE TABLE
|
|
|
|
url: sql-ref-syntax-aux-describe-table.html
|
|
|
|
- text: DESCRIBE FUNCTION
|
|
|
|
url: sql-ref-syntax-aux-describe-function.html
|
|
|
|
- text: DESCRIBE QUERY
|
|
|
|
url: sql-ref-syntax-aux-describe-query.html
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: SHOW
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-show.html
|
|
|
|
subitems:
|
|
|
|
- text: SHOW COLUMNS
|
|
|
|
url: sql-ref-syntax-aux-show-columns.html
|
|
|
|
- text: SHOW DATABASES
|
|
|
|
url: sql-ref-syntax-aux-show-databases.html
|
|
|
|
- text: SHOW FUNCTIONS
|
|
|
|
url: sql-ref-syntax-aux-show-functions.html
|
|
|
|
- text: SHOW TABLE
|
|
|
|
url: sql-ref-syntax-aux-show-table.html
|
|
|
|
- text: SHOW TABLES
|
|
|
|
url: sql-ref-syntax-aux-show-tables.html
|
|
|
|
- text: SHOW TBLPROPERTIES
|
|
|
|
url: sql-ref-syntax-aux-show-tblproperties.html
|
|
|
|
- text: SHOW PARTITIONS
|
|
|
|
url: sql-ref-syntax-aux-show-partitions.html
|
|
|
|
- text: SHOW CREATE TABLE
|
|
|
|
url: sql-ref-syntax-aux-show-create-table.html
|
2020-04-07 12:25:01 -04:00
|
|
|
- text: SHOW VIEWS
|
|
|
|
url: sql-ref-syntax-aux-show-views.html
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: CONFIGURATION MANAGEMENT
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-conf-mgmt.html
|
|
|
|
subitems:
|
|
|
|
- text: SET
|
|
|
|
url: sql-ref-syntax-aux-conf-mgmt-set.html
|
|
|
|
- text: RESET
|
|
|
|
url: sql-ref-syntax-aux-conf-mgmt-reset.html
|
2020-02-16 10:53:12 -05:00
|
|
|
- text: RESOURCE MANAGEMENT
|
2019-08-19 02:17:50 -04:00
|
|
|
url: sql-ref-syntax-aux-resource-mgmt.html
|
|
|
|
subitems:
|
|
|
|
- text: ADD FILE
|
|
|
|
url: sql-ref-syntax-aux-resource-mgmt-add-file.html
|
|
|
|
- text: ADD JAR
|
|
|
|
url: sql-ref-syntax-aux-resource-mgmt-add-jar.html
|
2019-10-07 14:39:03 -04:00
|
|
|
- text: LIST FILE
|
|
|
|
url: sql-ref-syntax-aux-resource-mgmt-list-file.html
|
|
|
|
- text: LIST JAR
|
|
|
|
url: sql-ref-syntax-aux-resource-mgmt-list-jar.html
|
2020-04-03 01:36:03 -04:00
|
|
|
- text: Functions
|
|
|
|
url: sql-ref-functions.html
|
|
|
|
subitems:
|
|
|
|
- text: Build-in Functions
|
|
|
|
url: sql-ref-functions-builtin.html
|
|
|
|
subitems:
|
|
|
|
- text: Build-in Aggregate Functions
|
|
|
|
url: sql-ref-functions-builtin-aggregate.html
|
|
|
|
- text: Build-in Array Functions
|
|
|
|
url: sql-ref-functions-builtin-array.html
|
|
|
|
- text: Build-in Date Time Functions
|
|
|
|
url: sql-ref-functions-builtin-date-time.html
|
|
|
|
- text: UDFs (User-Defined Functions)
|
|
|
|
url: sql-ref-functions-udf.html
|
|
|
|
subitems:
|
|
|
|
- text: Scalar UDFs (User-Defined Functions)
|
|
|
|
url: sql-ref-functions-udf-scalar.html
|
|
|
|
- text: UDAFs (User-Defined Aggregate Functions)
|
|
|
|
url: sql-ref-functions-udf-aggregate.html
|
|
|
|
- text: Integration with Hive UDFs/UDAFs/UDTFs
|
|
|
|
url: sql-ref-functions-udf-hive.html
|
2020-03-11 02:11:13 -04:00
|
|
|
- text: Datetime Pattern
|
|
|
|
url: sql-ref-datetime-pattern.html
|