a3d83948b8
### What changes were proposed in this pull request? This PR is to audit the migration guides in Spark 3.0 release: - correct the grammar errors - clean up some items - replace HTML table by markdown table ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? No ### How was this patch tested? Screenshot: ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-04-04-21_36_29](https://user-images.githubusercontent.com/11567269/78467043-9477d800-76bd-11ea-8ab0-3d51ea5e9fa5.png) ![Screen Shot 2020-04-04 at 9 28 13 PM](https://user-images.githubusercontent.com/11567269/78467045-98a3f580-76bd-11ea-9e4b-927bf12e683a.png) ![Screen Shot 2020-04-04 at 9 28 02 PM](https://user-images.githubusercontent.com/11567269/78467046-98a3f580-76bd-11ea-8ea3-9f13cb8d200b.png) ![Screen Shot 2020-04-04 at 9 21 40 PM](https://user-images.githubusercontent.com/11567269/78467047-993c8c00-76bd-11ea-8c29-91afc68eb590.png) Closes #28125 from gatorsmile/updateMigrationGuide3.0. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
34 lines
2 KiB
Markdown
34 lines
2 KiB
Markdown
---
|
|
layout: global
|
|
title: "Migration Guide: Structured Streaming"
|
|
displayTitle: "Migration Guide: Structured Streaming"
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
* Table of contents
|
|
{:toc}
|
|
|
|
Note that this migration guide describes the items specific to Structured Streaming.
|
|
Many items of SQL migration can be applied when migrating Structured Streaming to higher versions.
|
|
Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.html).
|
|
|
|
## Upgrading from Structured Streaming 2.4 to 3.0
|
|
|
|
- In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`.
|
|
|
|
- Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs.
|