ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
ulysses-you	e0d2d8f1a6	[SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide ### What changes were proposed in this pull request? * improve docs in `docs/job-scheduling.md` * add migration guide docs in `docs/core-migration-guide.md` ### Why are the changes needed? Help user to migrate. ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? Pass CI Closes #33794 from ulysses-you/SPARK-35083-f. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit `90cbf9ca3e`) Signed-off-by: Kent Yao <yao@apache.org>	2021-08-20 21:33:06 +08:00
Gengliang Wang	fb56627f21	Revert "[SPARK-35083][FOLLOW-UP][CORE] Add migration guide for the re… …mote scheduler pool files support" This reverts commit `e3902d1975`. The feature is improvement instead of behavior change. Closes #33789 from gengliangwang/revertDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `b36b1c7e8a`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-19 21:30:19 +08:00
yi.wu	9544c24560	[SPARK-35083][FOLLOW-UP][CORE] Add migration guide for the remote scheduler pool files support ### What changes were proposed in this pull request? Add remote scheduler pool files support to the migration guide. ### Why are the changes needed? To highlight this useful support. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass exiting tests. Closes #33785 from Ngone51/SPARK-35083-follow-up. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: wuyi <yi.wu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `e3902d1975`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-19 16:29:19 +08:00
Dongjoon Hyun	00f06dd267	[SPARK-35131][K8S] Support early driver service clean-up during app termination ### What changes were proposed in this pull request? This PR aims to support a new configuration, `spark.kubernetes.driver.service.deleteOnTermination`, to clean up `Driver Service` resource during app termination. ### Why are the changes needed? The K8s service is one of the important resources and sometimes it's controlled by quota. ``` $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 1 3 ``` Apache Spark creates a service for driver whose lifecycle is the same with driver pod. It means a new Spark job submission fails if the number of completed Spark jobs equals the number of service quota. BEFORE ``` $ k get pod NAME READY STATUS RESTARTS AGE org-apache-spark-examples-sparkpi-a32c9278e7061b4d-driver 0/1 Completed 0 31m org-apache-spark-examples-sparkpi-a9f1f578e721ef62-driver 0/1 Completed 0 78s $ k get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 80m org-apache-spark-examples-sparkpi-a32c9278e7061b4d-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 31m org-apache-spark-examples-sparkpi-a9f1f578e721ef62-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 80s $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 3 3 $ bin/spark-submit... Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.64.50:8443/api/v1/namespaces/default/services. Message: Forbidden! User minikube doesn't have permission. services "org-apache-spark-examples-sparkpi-843f6978e722819c-driver-svc" is forbidden: exceeded quota: service, requested: services=1, used: services=3, limited: services=3. ``` AFTER ``` $ k get pod NAME READY STATUS RESTARTS AGE org-apache-spark-examples-sparkpi-23d5f278e77731a7-driver 0/1 Completed 0 26s org-apache-spark-examples-sparkpi-d1292278e7768ed4-driver 0/1 Completed 0 67s org-apache-spark-examples-sparkpi-e5bedf78e776ea9d-driver 0/1 Completed 0 44s $ k get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 172m $ k describe quota Name: service Namespace: default Resource Used Hard -------- ---- ---- services 1 3 ``` ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a new configuration, `spark.kubernetes.driver.service.deleteOnTermination`, and enables it by default. The change is documented at the migration guide. ### How was this patch tested? Pass the CIs. This is tested with K8s IT manually. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts - Run SparkR on simple dataframe.R example Run completed in 19 minutes, 9 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #32226 from dongjoon-hyun/SPARK-35131. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-04-19 12:11:08 -07:00
Sean Owen	700aa1769c	[SPARK-35050][DOCS][MESOS] Document deprecation of Apache Mesos in 3.2.0 ### What changes were proposed in this pull request? Deprecate Apache Mesos support for Spark 3.2.0 by adding documentation to this effect. ### Why are the changes needed? Apache Mesos is ceasing development (https://lists.apache.org/thread.html/rab2a820507f7c846e54a847398ab20f47698ec5bce0c8e182bfe51ba%40%3Cdev.mesos.apache.org%3E) ; at some point we'll want to drop support, so, deprecate it now. This doesn't mean it'll go away in 3.3.0. ### Does this PR introduce _any_ user-facing change? No, docs only. ### How was this patch tested? N/A Closes #32150 from srowen/SPARK-35050. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-04-14 13:17:58 +09:00
Josh Soref	d58587b60d	[SPARK-33717][LAUNCHER] deprecate spark.launcher.childConectionTimeout ### What changes were proposed in this pull request? Deprecating `spark.launcher.childConectionTimeout` in favor of `spark.launcher.childConnectionTimeout` ### Why are the changes needed? srowen suggested it https://github.com/apache/spark/pull/30323#discussion_r521449342 ### How was this patch tested? No testing. Not even compiled Closes #30679 from jsoref/spelling-connection. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2021-03-26 15:53:52 -05:00
Dongjoon Hyun	3bc6fe4e77	[SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default ### What changes were proposed in this pull request? This PR aims to enable `spark.hadoopRDD.ignoreEmptySplits` by default for Apache Spark 3.2.0. ### Why are the changes needed? Although this is a safe improvement, this hasn't been enabled by default to avoid the explicit behavior change. This PR aims to switch the default explicitly in Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? Yes, the behavior change is documented. ### How was this patch tested? Pass the existing CIs. Closes #31909 from dongjoon-hyun/SPARK-34809. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-03-21 14:34:02 -07:00
Dongjoon Hyun	2e31e2c5f3	[SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default ### What changes were proposed in this pull request? Apache Spark 3.0 introduced `spark.eventLog.compression.codec` configuration. For Apache Spark 3.2, this PR aims to set `zstd` as the default value for `spark.eventLog.compression.codec` configuration. This only affects creating a new log file. ### Why are the changes needed? The main purpose of event logs is archiving. Many logs are generated and occupy the storage, but most of them are never accessed by users. 1. Save storage resources (and money) In general, ZSTD is much smaller than LZ4. For example, in case of TPCDS (Scale 200) log, ZSTD generates about 3 times smaller log files than LZ4. \| CODEC \| SIZE (bytes) \| \|---------\|-------------\| \| LZ4 \| 184001434\| \| ZSTD \| 64522396\| And, the plain file is 17.6 times bigger. ``` -rw-r--r-- 1 dongjoon staff 1135464691 Feb 21 22:31 spark-a1843ead29834f46b1125a03eca32679 -rw-r--r-- 1 dongjoon staff 64522396 Feb 21 22:31 spark-a1843ead29834f46b1125a03eca32679.zstd ``` 2. Better Usability We cannot decompress Spark-generated LZ4 event log files via CLI while we can for ZSTD event log files. Spark's LZ4 event log files are inconvenient to some users who want to uncompress and access them. ``` $ lz4 -d spark-d3deba027bd34435ba849e14fc2c42ef.lz4 Decoding file spark-d3deba027bd34435ba849e14fc2c42ef Error 44 : Unrecognized header : file cannot be decoded ``` ``` $ zstd -d spark-a1843ead29834f46b1125a03eca32679.zstd spark-a1843ead29834f46b1125a03eca32679.zstd: 1135464691 bytes ``` 3. Speed The following results are collected by running [lzbench](https://github.com/inikep/lzbench) on the above Spark event log. Note that - This is not a direct comparison of Spark compression/decompression codec. - `lzbench` is an in-memory benchmark. So, it doesn't show the benefit of the reduced network traffic due to the small size of ZSTD. Here, - To get ZSTD 1.4.8-1 result, `lzbench` `master` branch is used because Spark is using ZSTD 1.4.8. - To get LZ4 1.7.5 result, `lzbench` `v1.7` branch is used because Spark is using LZ4 1.7.1. ``` Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 7393 MB/s 7166 MB/s 1135464691 100.00 spark-a1843ead29834f46b1125a03eca32679 zstd 1.4.8 -1 1344 MB/s 3351 MB/s 56665767 4.99 spark-a1843ead29834f46b1125a03eca32679 lz4 1.7.5 1385 MB/s 4782 MB/s 127662168 11.24 spark-a1843ead29834f46b1125a03eca32679 ``` ### Does this PR introduce _any_ user-facing change? - No for the apps which doesn't use `spark.eventLog.compress` because `spark.eventLog.compress` is disabled by default. - No for the apps using `spark.eventLog.compression.codec` explicitly because this is a change of the default value. - Yes for the apps using `spark.eventLog.compress` without setting `spark.eventLog.compression.codec`. In this case, previously `spark.io.compression.codec` value was used whose default is `lz4`. So this JIRA issue, SPARK-34503, is labeled with `releasenotes`. ### How was this patch tested? Pass the updated UT. Closes #31618 from dongjoon-hyun/SPARK-34503. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-02-23 16:37:29 -08:00
HyukjinKwon	d98c216e19	[SPARK-31960][YARN][DOCS][FOLLOW-UP] Document the behaviour change of Hadoop's classpath propagation in migration guide ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/28788, and proposes to update migration guide. ### Why are the changes needed? To tell users about the behaviour change. ### Does this PR introduce _any_ user-facing change? Yes, it updates migration guides for users. ### How was this patch tested? GitHub Actions' documentation build should test it. Closes #30903 from HyukjinKwon/SPARK-31960-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-23 18:04:28 +09:00
Dongjoon Hyun	90d6f86001	[SPARK-33870][CORE] Enable spark.storage.replication.proactive by default ### What changes were proposed in this pull request? This PR aims to enable `spark.storage.replication.proactive` by default for Apache Spark 3.2.0. ### Why are the changes needed? `spark.storage.replication.proactive` is added by SPARK-15355 at Apache Spark 2.2.0 and has been helpful when the block manager loss occurs frequently like K8s environment. ### Does this PR introduce _any_ user-facing change? Yes, this will make the Spark jobs more robust. ### How was this patch tested? Pass the existing UTs. Closes #30876 from dongjoon-hyun/SPARK-33870. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-12-22 21:59:53 -08:00
Takuya UESHIN	7deb67c28f	[SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors ### What changes were proposed in this pull request? This is a follow-up of #29278. This PR changes the config name to switch allow/disallow `SparkContext` in executors as per the comment https://github.com/apache/spark/pull/29278#pullrequestreview-460256338. ### Why are the changes needed? The config name `spark.executor.allowSparkContext` is more reasonable. ### Does this PR introduce _any_ user-facing change? Yes, the config name is changed. ### How was this patch tested? Updated tests. Closes #29340 from ueshin/issues/SPARK-32160/change_config_name. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-04 12:45:06 +09:00
Takuya UESHIN	8014b0b5d6	[SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to create SparkContext in executors ### What changes were proposed in this pull request? This is a follow-up of #28986. This PR adds a config to switch allow/disallow to create `SparkContext` in executors. - `spark.driver.allowSparkContextInExecutors` ### Why are the changes needed? Some users or libraries actually create `SparkContext` in executors. We shouldn't break their workloads. ### Does this PR introduce _any_ user-facing change? Yes, users will be able to create `SparkContext` in executors with the config enabled. ### How was this patch tested? More tests are added. Closes #29278 from ueshin/issues/SPARK-32160/add_configs. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-31 17:28:35 +09:00
yi.wu	6c018b31e2	[SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator v1 APIs ### What changes were proposed in this pull request? Add migration guide for removed accumulator v1 APIs. ### Why are the changes needed? Provide better guidance for users' migration. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass Jenkins. Closes #28309 from Ngone51/SPARK-16775-migration-guide. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-23 10:59:35 +00:00
gatorsmile	6c792a79c1	[SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration ### What changes were proposed in this pull request? This PR is the follow-up PR of https://github.com/apache/spark/pull/28003 - add a migration guide - add an end-to-end test case. ### Why are the changes needed? The original PR made the major behavior change in the user-facing RESET command. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added a new end-to-end test Closes #28265 from gatorsmile/spark-31234followup. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2020-04-20 13:08:55 -07:00
yi.wu	0d4e4df061	[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2020-04-15 11:29:55 -07:00
gatorsmile	a3d83948b8	[SPARK-31351][DOC] Migration Guide Auditing for Spark 3.0 Release ### What changes were proposed in this pull request? This PR is to audit the migration guides in Spark 3.0 release: - correct the grammar errors - clean up some items - replace HTML table by markdown table ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? No ### How was this patch tested? Screenshot: ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-04-04-21_36_29](https://user-images.githubusercontent.com/11567269/78467043-9477d800-76bd-11ea-8ab0-3d51ea5e9fa5.png) ![Screen Shot 2020-04-04 at 9 28 13 PM](https://user-images.githubusercontent.com/11567269/78467045-98a3f580-76bd-11ea-9e4b-927bf12e683a.png) ![Screen Shot 2020-04-04 at 9 28 02 PM](https://user-images.githubusercontent.com/11567269/78467046-98a3f580-76bd-11ea-8ea3-9f13cb8d200b.png) ![Screen Shot 2020-04-04 at 9 21 40 PM](https://user-images.githubusercontent.com/11567269/78467047-993c8c00-76bd-11ea-8c29-91afc68eb590.png) Closes #28125 from gatorsmile/updateMigrationGuide3.0. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-08 12:27:40 +09:00
“attilapiros”	cdc8fc6233	[SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol ### What changes were proposed in this pull request? When `spark.shuffle.useOldFetchProtocol` is enabled then switching off the direct disk reading of host-local shuffle blocks and falling back to remote block fetching (and this way avoiding the `GetLocalDirsForExecutors` block transfer message which is introduced from Spark 3.0.0). ### Why are the changes needed? In `[SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host` a new block transfer message is introduced, `GetLocalDirsForExecutors`. This new message could be sent to the external shuffle service and as it is not supported by the previous version of external shuffle service it should be avoided when `spark.shuffle.useOldFetchProtocol` is true. In the migration guide I changed the exception type as `org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Decoder#fromByteBuffer` throws a IllegalArgumentException with the given text and uses the message type which is just a simple number (byte). I have checked and this is true for version 2.4.4 too. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This specific case (considering one extra boolean to switch off host local disk reading feature) is not tested but existing tests were run. Closes #26869 from attilapiros/SPARK-30235. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-17 10:32:15 -08:00
Marcelo Vanzin	56a0b5421e	[SPARK-29399][CORE] Remove old ExecutorPlugin interface SPARK-29397 added new interfaces for creating driver and executor plugins. These were added in a new, more isolated package that does not pollute the main o.a.s package. The old interface is now redundant. Since it's a DeveloperApi and we're about to have a new major release, let's remove it instead of carrying more baggage forward. Closes #26390 from vanzin/SPARK-29399. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-13 09:52:40 +09:00
Jungtaek Lim (HeartSaVioR)	81b6f11a3a	[SPARK-29160][CORE] Use UTF-8 explicitly for reading/writing event log file ### What changes were proposed in this pull request? Credit to vanzin as he found and commented on this while reviewing #25670 - [comment](https://github.com/apache/spark/pull/25670#discussion_r325383512). This patch proposes to specify UTF-8 explicitly while reading/writer event log file. ### Why are the changes needed? The event log file is being read/written as default character set of JVM process which may open the chance to bring some problems on reading event log files from another machines. Spark's de facto standard character set is UTF-8, so it should be explicitly set to. ### Does this PR introduce any user-facing change? Yes, if end users have been running Spark process with different default charset than "UTF-8", especially their driver JVM processes. No otherwise. ### How was this patch tested? Existing UTs, as ReplayListenerSuite contains "end-to-end" event logging/reading tests (both uncompressed/compressed). Closes #25845 from HeartSaVioR/SPARK-29160. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-09-21 23:59:37 +09:00
HyukjinKwon	7d4eb38bbc	[SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation ### What changes were proposed in this pull request? Currently, there is no migration section for PySpark, SparkCore and Structured Streaming. It is difficult for users to know what to do when they upgrade. This PR proposes to create create a "Migration Guide" tap at Spark documentation. ![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png) ![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png) This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring. There are some new information added, which I will leave a comment inlined for easier review. 1. MLlib Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html) ``` 'docs/ml-guide.md' ↓ Merge new/old migration guides 'docs/ml-migration-guide.md' ``` 2. PySpark Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html ``` 'docs/sql-migration-guide-upgrade.md' ↓ Extract PySpark specific items 'docs/pyspark-migration-guide.md' ``` 3. SparkR Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) ``` 'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md' Move migration guide section ↘ ↙ Extract SparkR specific items docs/sparkr-migration-guide.md ``` 4. Core Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 5. Structured Streaming Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 6. SQL Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html) ``` 'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md' Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items 'docs/sql-migration-guide.md' ``` ### Why are the changes needed? In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating. ### Does this PR introduce any user-facing change? Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html. ### How was this patch tested? Manually build the doc. This can be verified as below: ```bash cd docs SKIP_API=1 jekyll build open _site/index.html ``` Closes #25757 from HyukjinKwon/migration-doc. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-15 11:17:30 -07:00

20 commits