7d4eb38bbc
### What changes were proposed in this pull request? Currently, there is no migration section for PySpark, SparkCore and Structured Streaming. It is difficult for users to know what to do when they upgrade. This PR proposes to create create a "Migration Guide" tap at Spark documentation. ![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png) ![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png) This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring. There are some new information added, which I will leave a comment inlined for easier review. 1. **MLlib** Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html) ``` 'docs/ml-guide.md' ↓ Merge new/old migration guides 'docs/ml-migration-guide.md' ``` 2. **PySpark** Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html ``` 'docs/sql-migration-guide-upgrade.md' ↓ Extract PySpark specific items 'docs/pyspark-migration-guide.md' ``` 3. **SparkR** Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) ``` 'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md' Move migration guide section ↘ ↙ Extract SparkR specific items docs/sparkr-migration-guide.md ``` 4. **Core** Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 5. **Structured Streaming** Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 6. **SQL** Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html) ``` 'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md' Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items 'docs/sql-migration-guide.md' ``` ### Why are the changes needed? In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating. ### Does this PR introduce any user-facing change? Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html. ### How was this patch tested? Manually build the doc. This can be verified as below: ```bash cd docs SKIP_API=1 jekyll build open _site/index.html ``` Closes #25757 from HyukjinKwon/migration-doc. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
232 lines
9.6 KiB
YAML
232 lines
9.6 KiB
YAML
- text: Getting Started
|
|
url: sql-getting-started.html
|
|
subitems:
|
|
- text: "Starting Point: SparkSession"
|
|
url: sql-getting-started.html#starting-point-sparksession
|
|
- text: Creating DataFrames
|
|
url: sql-getting-started.html#creating-dataframes
|
|
- text: Untyped Dataset Operations (DataFrame operations)
|
|
url: sql-getting-started.html#untyped-dataset-operations-aka-dataframe-operations
|
|
- text: Running SQL Queries Programmatically
|
|
url: sql-getting-started.html#running-sql-queries-programmatically
|
|
- text: Global Temporary View
|
|
url: sql-getting-started.html#global-temporary-view
|
|
- text: Creating Datasets
|
|
url: sql-getting-started.html#creating-datasets
|
|
- text: Interoperating with RDDs
|
|
url: sql-getting-started.html#interoperating-with-rdds
|
|
- text: Aggregations
|
|
url: sql-getting-started.html#aggregations
|
|
- text: Data Sources
|
|
url: sql-data-sources.html
|
|
subitems:
|
|
- text: "Generic Load/Save Functions"
|
|
url: sql-data-sources-load-save-functions.html
|
|
- text: Parquet Files
|
|
url: sql-data-sources-parquet.html
|
|
- text: ORC Files
|
|
url: sql-data-sources-orc.html
|
|
- text: JSON Files
|
|
url: sql-data-sources-json.html
|
|
- text: Hive Tables
|
|
url: sql-data-sources-hive-tables.html
|
|
- text: JDBC To Other Databases
|
|
url: sql-data-sources-jdbc.html
|
|
- text: Avro Files
|
|
url: sql-data-sources-avro.html
|
|
- text: Troubleshooting
|
|
url: sql-data-sources-troubleshooting.html
|
|
- text: Performance Tuning
|
|
url: sql-performance-tuning.html
|
|
subitems:
|
|
- text: Caching Data In Memory
|
|
url: sql-performance-tuning.html#caching-data-in-memory
|
|
- text: Other Configuration Options
|
|
url: sql-performance-tuning.html#other-configuration-options
|
|
- text: Broadcast Hint for SQL Queries
|
|
url: sql-performance-tuning.html#broadcast-hint-for-sql-queries
|
|
- text: Distributed SQL Engine
|
|
url: sql-distributed-sql-engine.html
|
|
subitems:
|
|
- text: "Running the Thrift JDBC/ODBC server"
|
|
url: sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server
|
|
- text: Running the Spark SQL CLI
|
|
url: sql-distributed-sql-engine.html#running-the-spark-sql-cli
|
|
- text: PySpark Usage Guide for Pandas with Apache Arrow
|
|
url: sql-pyspark-pandas-with-arrow.html
|
|
subitems:
|
|
- text: Apache Arrow in Spark
|
|
url: sql-pyspark-pandas-with-arrow.html#apache-arrow-in-spark
|
|
- text: "Enabling for Conversion to/from Pandas"
|
|
url: sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas
|
|
- text: "Pandas UDFs (a.k.a. Vectorized UDFs)"
|
|
url: sql-pyspark-pandas-with-arrow.html#pandas-udfs-aka-vectorized-udfs
|
|
- text: Usage Notes
|
|
url: sql-pyspark-pandas-with-arrow.html#usage-notes
|
|
- text: Migration Guide
|
|
url: sql-migration-old.html
|
|
- text: SQL Reference
|
|
url: sql-ref.html
|
|
subitems:
|
|
- text: Data Types
|
|
url: sql-ref-datatypes.html
|
|
- text: Null Semantics
|
|
url: sql-ref-null-semantics.html
|
|
- text: NaN Semantics
|
|
url: sql-ref-nan-semantics.html
|
|
- text: SQL Syntax
|
|
url: sql-ref-syntax.html
|
|
subitems:
|
|
- text: Data Definition Statements
|
|
url: sql-ref-syntax-ddl.html
|
|
subitems:
|
|
- text: ALTER DATABASE
|
|
url: sql-ref-syntax-ddl-alter-database.html
|
|
- text: ALTER TABLE
|
|
url: sql-ref-syntax-ddl-alter-table.html
|
|
- text: ALTER VIEW
|
|
url: sql-ref-syntax-ddl-alter-view.html
|
|
- text: CREATE DATABASE
|
|
url: sql-ref-syntax-ddl-create-database.html
|
|
- text: CREATE FUNCTION
|
|
url: sql-ref-syntax-ddl-create-function.html
|
|
- text: CREATE TABLE
|
|
url: sql-ref-syntax-ddl-create-table.html
|
|
- text: CREATE VIEW
|
|
url: sql-ref-syntax-ddl-create-view.html
|
|
- text: DROP DATABASE
|
|
url: sql-ref-syntax-ddl-drop-database.html
|
|
- text: DROP FUNCTION
|
|
url: sql-ref-syntax-ddl-drop-function.html
|
|
- text: DROP TABLE
|
|
url: sql-ref-syntax-ddl-drop-table.html
|
|
- text: DROP VIEW
|
|
url: sql-ref-syntax-ddl-drop-view.html
|
|
- text: TRUNCATE TABLE
|
|
url: sql-ref-syntax-ddl-truncate-table.html
|
|
- text: REPAIR TABLE
|
|
url: sql-ref-syntax-ddl-repair-table.html
|
|
- text: Data Manipulation Statements
|
|
url: sql-ref-syntax-dml.html
|
|
subitems:
|
|
- text: INSERT
|
|
url: sql-ref-syntax-dml-insert.html
|
|
- text: LOAD
|
|
url: sql-ref-syntax-dml-load.html
|
|
- text: Data Retrieval(Queries)
|
|
url: sql-ref-syntax-qry.html
|
|
subitems:
|
|
- text: SELECT
|
|
url: sql-ref-syntax-qry-select.html
|
|
subitems:
|
|
- text: DISTINCT Clause
|
|
url: sql-ref-syntax-qry-select-distinct.html
|
|
- text: Joins
|
|
url: sql-ref-syntax-qry-select-join.html
|
|
- text: ORDER BY Clause
|
|
url: sql-ref-syntax-qry-select-orderby.html
|
|
- text: GROUP BY Clause
|
|
url: sql-ref-syntax-qry-select-groupby.html
|
|
- text: HAVING Clause
|
|
url: sql-ref-syntax-qry-select-having.html
|
|
- text: LIMIT Clause
|
|
url: sql-ref-syntax-qry-select-limit.html
|
|
- text: Set operations
|
|
url: sql-ref-syntax-qry-select-setops.html
|
|
- text: Common Table Expression(CTE)
|
|
url: sql-ref-syntax-qry-select-cte.html
|
|
- text: Subqueries
|
|
url: sql-ref-syntax-qry-select-subqueries.html
|
|
- text: Query hints
|
|
url: sql-ref-syntax-qry-select-hints.html
|
|
- text: SAMPLING
|
|
url: sql-ref-syntax-qry-sampling.html
|
|
- text: WINDOWING ANALYTIC FUNCTIONS
|
|
url: sql-ref-syntax-qry-window.html
|
|
- text: AGGREGATION (CUBE/ROLLUP/GROUPING)
|
|
url: sql-ref-syntax-qry-aggregation.html
|
|
- text: EXPLAIN
|
|
url: sql-ref-syntax-qry-explain.html
|
|
- text: Auxilarry Statements
|
|
url: sql-ref-syntax-aux.html
|
|
subitems:
|
|
- text: Analyze statement
|
|
url: sql-ref-syntax-aux-analyze.html
|
|
subitems:
|
|
- text: ANALYZE TABLE
|
|
url: sql-ref-syntax-aux-analyze-table.html
|
|
- text: Caching statements
|
|
url: sql-ref-syntax-aux-cache.html
|
|
subitems:
|
|
- text: CACHE TABLE
|
|
url: sql-ref-syntax-aux-cache-cache-table.html
|
|
- text: UNCACHE TABLE
|
|
url: sql-ref-syntax-aux-cache-uncache-table.html
|
|
- text: CLEAR CACHE
|
|
url: sql-ref-syntax-aux-cache-clear-cache.html
|
|
- text: REFRESH TABLE
|
|
url: sql-ref-syntax-aux-refresh-table.html
|
|
- text: Describe Commands
|
|
url: sql-ref-syntax-aux-describe.html
|
|
subitems:
|
|
- text: DESCRIBE DATABASE
|
|
url: sql-ref-syntax-aux-describe-database.html
|
|
- text: DESCRIBE TABLE
|
|
url: sql-ref-syntax-aux-describe-table.html
|
|
- text: DESCRIBE FUNCTION
|
|
url: sql-ref-syntax-aux-describe-function.html
|
|
- text: DESCRIBE QUERY
|
|
url: sql-ref-syntax-aux-describe-query.html
|
|
- text: Show commands
|
|
url: sql-ref-syntax-aux-show.html
|
|
subitems:
|
|
- text: SHOW COLUMNS
|
|
url: sql-ref-syntax-aux-show-columns.html
|
|
- text: SHOW DATABASES
|
|
url: sql-ref-syntax-aux-show-databases.html
|
|
- text: SHOW FUNCTIONS
|
|
url: sql-ref-syntax-aux-show-functions.html
|
|
- text: SHOW TABLE
|
|
url: sql-ref-syntax-aux-show-table.html
|
|
- text: SHOW TABLES
|
|
url: sql-ref-syntax-aux-show-tables.html
|
|
- text: SHOW TBLPROPERTIES
|
|
url: sql-ref-syntax-aux-show-tblproperties.html
|
|
- text: SHOW PARTITIONS
|
|
url: sql-ref-syntax-aux-show-partitions.html
|
|
- text: SHOW CREATE TABLE
|
|
url: sql-ref-syntax-aux-show-create-table.html
|
|
- text: Configuration Management Commands
|
|
url: sql-ref-syntax-aux-conf-mgmt.html
|
|
subitems:
|
|
- text: SET
|
|
url: sql-ref-syntax-aux-conf-mgmt-set.html
|
|
- text: RESET
|
|
url: sql-ref-syntax-aux-conf-mgmt-reset.html
|
|
- text: Resource Management Commands
|
|
url: sql-ref-syntax-aux-resource-mgmt.html
|
|
subitems:
|
|
- text: ADD FILE
|
|
url: sql-ref-syntax-aux-resource-mgmt-add-file.html
|
|
- text: ADD JAR
|
|
url: sql-ref-syntax-aux-resource-mgmt-add-jar.html
|
|
- text: Functions
|
|
url: sql-ref-functions.html
|
|
subitems:
|
|
- text: Builtin Functions
|
|
url: sql-ref-functions-builtin.html
|
|
subitems:
|
|
- text: Scalar functions
|
|
url: sql-ref-functions-builtin-scalar.html
|
|
- text: Aggregate functions
|
|
url: sql-ref-functions-builtin-aggregate.html
|
|
- text: User defined Functions
|
|
url: sql-ref-functions-udf.html
|
|
subitems:
|
|
- text: Scalar functions
|
|
url: sql-ref-functions-udf-scalar.html
|
|
- text: Aggregate functions
|
|
url: sql-ref-functions-udf-aggregate.html
|
|
- text: Arthmetic operations
|
|
url: sql-ref-arithmetic-ops.html
|