ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Gengliang Wang	5d45a415f3	Preparing Spark release v3.2.0-rc7	2021-10-06 11:45:26 +00:00
Ye Zhou	88f4809142	[SPARK-36892][CORE] Disable batch fetch for a shuffle when push based shuffle is enabled We found an issue where user configured both AQE and push based shuffle, but the job started to hang after running some stages. We took the thread dump from the Executors, which showed the task is still waiting to fetch shuffle blocks. Proposed changes in the PR to fix the issue. ### What changes were proposed in this pull request? Disabled Batch fetch when push based shuffle is enabled. ### Why are the changes needed? Without this patch, enabling AQE and Push based shuffle will have a chance to hang the tasks. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested the PR within our PR, with Spark shell and the queries are: sql("""SELECT CASE WHEN rand() < 0.8 THEN 100 ELSE CAST(rand() * 30000000 AS INT) END AS s_item_id, CAST(rand() * 100 AS INT) AS s_quantity, DATE_ADD(current_date(), - CAST(rand() * 360 AS INT)) AS s_date FROM RANGE(1000000000)""").createOrReplaceTempView("sales") // Dynamically coalesce partitions sql("""SELECT s_date, sum(s_quantity) AS q FROM sales GROUP BY s_date ORDER BY q DESC""").collect Unit tests to be added. Closes #34156 from zhouyejoe/SPARK-36892. Authored-by: Ye Zhou <yezhou@linkedin.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `31b6f614d3`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-10-06 15:42:40 +08:00
Hyukjin Kwon	939c4d93b5	[MINOR][DOCS] Mention other Python dependency tools in documentation ### What changes were proposed in this pull request? Self-contained. ### Why are the changes needed? For user's more information on available Python dependency management in PySpark. ### Does this PR introduce _any_ user-facing change? Yes, documentation change. ### How was this patch tested? Manaully built the docs and checked the results: <img width="918" alt="Screen Shot 2021-09-29 at 10 11 56 AM" src="https://user-images.githubusercontent.com/6477701/135186536-2f271378-d06b-4c6b-a4be-691ce395db9f.png"> <img width="976" alt="Screen Shot 2021-09-29 at 10 12 22 AM" src="https://user-images.githubusercontent.com/6477701/135186541-0f4c5615-bc49-48e2-affd-dc2f5c0334bf.png"> <img width="920" alt="Screen Shot 2021-09-29 at 10 12 42 AM" src="https://user-images.githubusercontent.com/6477701/135186551-0b613096-7c86-4562-b345-ddd60208367b.png"> Closes #34134 from HyukjinKwon/minor-docs-py-deps. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `13c2b711e4`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-09-29 14:46:18 +09:00
Gengliang Wang	4bd358474b	Preparing development version 3.2.1-SNAPSHOT	2021-09-28 10:53:42 +00:00
Gengliang Wang	dde73e2e1c	Preparing Spark release v3.2.0-rc6	2021-09-28 10:53:35 +00:00
Gengliang Wang	0c57bb8f7f	Preparing development version 3.2.1-SNAPSHOT	2021-09-27 08:24:50 +00:00
Gengliang Wang	49aea14c5a	Preparing Spark release v3.2.0-rc5	2021-09-27 08:24:44 +00:00
Gengliang Wang	2348cce37e	Preparing development version 3.2.1-SNAPSHOT	2021-09-26 12:28:46 +00:00
Gengliang Wang	2ed8c08c5b	Preparing Spark release v3.2.0-rc5	2021-09-26 12:28:40 +00:00
Gengliang Wang	da722d43cb	Preparing development version 3.2.1-SNAPSHOT	2021-09-24 10:03:23 +00:00
Gengliang Wang	9e35703211	Preparing Spark release v3.2.0-rc5	2021-09-24 10:03:16 +00:00
Gengliang Wang	0fb7127f85	Preparing development version 3.2.1-SNAPSHOT	2021-09-23 08:46:28 +00:00
Gengliang Wang	b609f2fe0c	Preparing Spark release v3.2.0-rc4	2021-09-23 08:46:22 +00:00
jiaoqb	d203ed51ca	[SPARK-36791][DOCS] Fix spelling mistakes in running-on-yarn.md file where JHS_POST should be JHS_HOST ### What changes were proposed in this pull request? The PR fixes SPARK-36791 by replacing JHS_POST with JHS_HOST ### Why are the changes needed? There are spelling mistakes in running-on-yarn.md file where JHS_POST should be JHS_HOST ### Does this PR introduce any user-facing change? No ### How was this patch tested? Not needed for docs Closes #34031 from jiaoqingbo/jiaoqingbo. Authored-by: jiaoqb <jiaoqb@asiainfo.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `8a1a91bd71`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-09-23 12:48:06 +09:00
Gengliang Wang	b0249851f6	Preparing development version 3.2.1-SNAPSHOT	2021-09-18 11:30:12 +00:00
Gengliang Wang	96044e9735	Preparing Spark release v3.2.0-rc3	2021-09-18 11:30:06 +00:00
Dongjoon Hyun	fbd24621ce	[SPARK-36759][BUILD][FOLLOWUP] Update version in scala-2.12 profile and doc ### What changes were proposed in this pull request? This is a follow-up to fix the leftover during switching the Scala version. ### Why are the changes needed? This should be consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is not tested by UT. We need to check manually. There is no more `2.12.14`. ``` $ git grep 2.12.14 R/pkg/tests/fulltests/test_sparkSQL.R: c(as.Date("2012-12-14"), as.Date("2013-12-15"), as.Date("2014-12-16"))) data/mllib/ridge-data/lpsa.data:3.5307626,0.987291634724086 -0.36279314978779 -0.922212414640967 0.232904453212813 -0.522940888712441 1.79270085261407 0.342627053981254 1.26288870310799 sql/hive/src/test/resources/data/files/over10k:-3\|454\|65705\|4294967468\|62.12\|14.32\|true\|mike white\|2013-03-01 09:11:58.703087\|40.18\|joggying ``` Closes #34020 from dongjoon-hyun/SPARK-36759-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `adbea252db`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-16 05:11:05 -07:00
Gengliang Wang	d20ed030a8	[SPARK-36775][DOCS] Add documentation for ANSI store assignment rules ### What changes were proposed in this pull request? Add documentation for ANSI store assignment rules for - the valid source/target type combinations - runtime error will happen on numberic overflow ### Why are the changes needed? Better docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview: ![image](https://user-images.githubusercontent.com/1097932/133554600-8c80c0a9-8753-4c01-94d0-994d8082e319.png) Closes #34014 from gengliangwang/addStoreAssignDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `ff7705ad2a`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2021-09-16 15:50:57 +08:00
Liang-Chi Hsieh	d22182e474	[SPARK-34479][SQL][DOC][FOLLOWUP] Add zstandard to avro supported codecs ### What changes were proposed in this pull request? Adding `zstandard` to avro supported codecs. ### Why are the changes needed? To improve the document. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Doc only. Closes #33943 from viirya/minor-doc. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit `647ffe655f`) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>	2021-09-08 23:21:38 -07:00
Kousuke Saruta	a41dc4516e	[SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property ### What changes were proposed in this pull request? This PR fixes the description about the possible values of `spark.sql.catalogImplementation` property. It was added in SPARK-36153 (#33362) but the possible values are `hive` or `in-memory` rather than `true` or `false`. ### Why are the changes needed? To fix wrong description. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I just confirmed `in-memory` and `hive` are the valid values with SparkShell. Closes #33923 from sarutak/fix-doc-about-catalogImplementation. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `a5fe5d368c`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-09-07 11:39:53 +09:00
Hyukjin Kwon	e9f2e34261	[SPARK-36631][R] Ask users if they want to download and install SparkR in non Spark scripts ### What changes were proposed in this pull request? This PR proposes to ask users if they want to download and install SparkR when they install SparkR from CRAN. `SPARKR_ASK_INSTALLATION` environment variable was added in case other notebook projects are affected. ### Why are the changes needed? This is required for CRAN. Currently SparkR is removed: https://cran.r-project.org/web/packages/SparkR/index.html. See also https://lists.apache.org/thread.html/r02b9046273a518e347dfe85f864d23d63d3502c6c1edd33df17a3b86%40%3Cdev.spark.apache.org%3E ### Does this PR introduce _any_ user-facing change? Yes, `sparkR.session(...)` will ask if users want to download and install Spark package or not if they are in the plain R shell or `Rscript`. ### How was this patch tested? R shell Valid input (`n`): ``` > sparkR.session(master="local") Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n ``` ``` Error in sparkCheckInstall(sparkHome, master, deployMode) : Please make sure Spark package is installed in this machine. - If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME. - If not, you may run install.spark function to do the job. ``` Invalid input: ``` > sparkR.session(master="local") Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc ``` ``` Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): ``` Valid input (`y`): ``` > sparkR.session(master="local") Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): y Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... Preferred mirror site found: https://ftp.riken.jp/net/apache/spark Downloading spark-3.3.0 for Hadoop 2.7 from: - https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz trying URL 'https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz' ... ``` Rscript ``` cat tmp.R ``` ``` library(SparkR, lib.loc = c(file.path(".", "R", "lib"))) sparkR.session(master="local") ``` ``` Rscript tmp.R ``` Valid input (`n`): ``` Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n ``` ``` Error in sparkCheckInstall(sparkHome, master, deployMode) : Please make sure Spark package is installed in this machine. - If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME. - If not, you may run install.spark function to do the job. Calls: sparkR.session -> sparkCheckInstall ``` Invalid input: ``` Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc ``` ``` Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): ``` Valid input (`y`): ``` ... Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): y Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... Preferred mirror site found: https://ftp.riken.jp/net/apache/spark Downloading spark-3.3.0 for Hadoop 2.7 from: ... ``` `bin/sparkR` and `bin/spark-submit *.R` are not affected (tested). Closes #33887 from HyukjinKwon/SPARK-36631. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `e983ba8fce`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-09-02 13:27:55 +09:00
Gengliang Wang	1bad04d028	Preparing development version 3.2.1-SNAPSHOT	2021-08-31 17:04:14 +00:00
Gengliang Wang	03f5d23e96	Preparing Spark release v3.2.0-rc2	2021-08-31 17:04:08 +00:00
Yuanjian Li	f50f2d474c	[SPARK-35611][SS][FOLLOW-UP] Improve the user guide document ### What changes were proposed in this pull request? Improve the user guide document. ### Why are the changes needed? Make the user guide clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Doc change only. Closes #33854 from xuanyuanking/SPARK-35611-follow. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `dd3f0fa8c2`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-27 10:27:37 +09:00
Leona Yoda	36be232eea	[SPARK-36541][DOCS][PYTHON] Replace the word Koalas to pandas-on-Spark ### What changes were proposed in this pull request? Replace images in pyspark on pandas document because those images uses the word Koalas ### Why are the changes needed? Images in Transform and apply a function documentation still uses the word Koalas, althogh the word was replaced to panas-on-Spark by this PR . https://github.com/apache/spark/pull/32835 I think we have to match the word on that images ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `make html` Screen shots ![130179112-8485fdde-b422-4834-8b23-fe69e7402118](https://user-images.githubusercontent.com/14937752/130186051-d6ff65f0-c121-40bd-b4f1-2fbc10e76f3e.png) ![130179239-8dae7812-4d81-4f8c-8558-b75e4eae3787](https://user-images.githubusercontent.com/14937752/130186063-17d4a95f-0b9d-49d3-85c7-13ea07e4b6bb.png) ![130179273-10f9fbc3-0a62-4e1a-ab6e-7049d75653a1](https://user-images.githubusercontent.com/14937752/130186074-7d684669-b9ef-4a4e-8a2d-c63bb9800ddb.png) ![130179311-616545af-dde2-4dec-807f-dde0a0d4bfbe](https://user-images.githubusercontent.com/14937752/130186095-20669673-b1d3-4552-97bf-86bbc1a5d43b.png) Environment - Windows 10 - Google Chrome 92.0.4515.159 [images.pptx](https://github.com/apache/spark/files/7029087/images.pptx) Closes #33786 from yoda-mon/replace-pyspark-doc-images. Authored-by: Leona Yoda <yodal@oss.nttdata.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `aeb3da2798`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-26 19:03:11 +09:00
Max Gekk	5198c0c316	[SPARK-35581][SPARK-36567][SQL][DOCS][FOLLOWUP] Update the SQL migration guide about foldable special datetime values ### What changes were proposed in this pull request? In the PR, I propose to update an existing item in the SQL migration guide, and mention that Spark 3.2 supports foldable special datetime values as well. <img width="1292" alt="Screenshot 2021-08-25 at 23 29 51" src="https://user-images.githubusercontent.com/1580697/130860184-27f0ba56-6c2d-4a5a-91a8-195f2f8aa5da.png"> ### Why are the changes needed? To inform users about actual Spark SQL behavior introduced by https://github.com/apache/spark/pull/33816 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By generating docs, and checking results manually. Closes #33840 from MaxGekk/special-datetime-cast-migr-guide. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `c4e739fb4b`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2021-08-26 10:02:15 +08:00
Kousuke Saruta	beabf91ea1	[SPARK-35236][SQL][DOCS][FOLLOWUP] Mention ARCHIVE as an acceptable resource type for CREATE FUNCTION statement ### What changes were proposed in this pull request? This PR modifies `sql-ref-syntax-ddl-create-function.md` to mention `ARCHIVE` as an acceptable resource type for `CREATE FUNCTION` statement. `ARCHIVE` is acceptable as of SPARK-35236 (#32359). ### Why are the changes needed? To maintain the document. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `SKIP_API=1 bundle exec jekyll build` ![create-function-archive](https://user-images.githubusercontent.com/4736016/130630637-dcddfd8c-543b-4d21-997c-d2deaf917a4f.png) Closes #33823 from sarutak/create-function-archive. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `bd0a4950ae`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-25 10:05:00 +09:00
Gengliang Wang	5463caac0d	Revert "[SPARK-34415][ML] Randomization in hyperparameter optimization" ### What changes were proposed in this pull request? Revert `397b843890` and `5a48eb8d00` ### Why are the changes needed? As discussed in https://github.com/apache/spark/pull/33800#issuecomment-904140869, there is correctness issue in the current implementation. Let's revert the code changes from branch 3.2 and fix it on master branch later ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ci tests Closes #33819 from gengliangwang/revert-SPARK-34415. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `de932f51ce`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-08-24 13:39:29 -07:00
Gengliang Wang	eea7d0037e	[SPARK-36557][DOCS] Update the MAVEN_OPTS in Spark build docs ### What changes were proposed in this pull request? As Jacek Laskowski pointed out in the dev list, there is StackOverflowError if compiling Spark with the current MAVEN_OPTS in Spark documentation. We should update it with `-Xss64m` to avoid it. ### Why are the changes needed? Correct the documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test. The MAVEN_OPTS is consistent with our github action build. Closes #33804 from gengliangwang/updateBuildDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `3da0e9500f`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-23 09:46:41 +09:00
Venkata krishnan Sowrirajan	0f2e318894	[SPARK-36374][FOLLOW-UP] Change config key spark.shuffle.server.mergedShuffleFileManagerImpl to spark.shuffle.push.server.mergedShuffleFileManagerImpl ### What changes were proposed in this pull request? Minor changes to change the config key name from `spark.shuffle.server.mergedShuffleFileManagerImpl` to `spark.shuffle.push.server.mergedShuffleFileManagerImpl`. This is missed out in https://github.com/apache/spark/pull/33615. ### Why are the changes needed? To keep the config names consistent ### Does this PR introduce _any_ user-facing change? Yes, this is a change in the config key name. But the new config name changes are yet to be released. Technically there is no user facing change because of this change. ### How was this patch tested? Existing tests. Closes #33799 from venkata91/SPARK-36374-follow-up. Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit `7b2842e986`) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>	2021-08-22 01:29:36 -05:00
Liang-Chi Hsieh	212a21ee4f	[MINOR][SS][DOCS] Update doc for streaming deduplication ### What changes were proposed in this pull request? This patch fixes an error about streaming dedupliaction is Structured Streaming, and also updates an item about unsupported operation. ### Why are the changes needed? Update the user document. ### Does this PR introduce _any_ user-facing change? No. It's a doc only change. ### How was this patch tested? Doc only change. Closes #33801 from viirya/minor-ss-deduplication. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit `5876e04de2`) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>	2021-08-21 18:20:27 -07:00
Angerszhuuuu	45c4b751f3	[SPARK-36549][SQL] Add taskStatus supports multiple value to monitoring doc ### What changes were proposed in this pull request? In Stage related restful API, we support `taskStatus` parameter as a list ``` QueryParam("taskStatus") taskStatus: JList[TaskStatus] ``` In restful we should write like ``` taskStatus=SUCCESS&taskStatus=FAILED ``` It's usefule but not show in the doc, and many user don't know how to write the list parameters. So add this feature to monitoring doc too. ### Why are the changes needed? Make doc clear ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? With restful request ``` http://localhost:4040/api/v1/applications/local-1629432414554/stages/0?details=true&taskStatus=FAILED ``` Resultful request result tasks ``` tasks" : { "0" : { "taskId" : 0, "index" : 0, "attempt" : 0, "launchTime" : "2021-08-20T04:06:55.515GMT", "duration" : 273, "executorId" : "driver", "host" : "host", "status" : "FAILED", "taskLocality" : "PROCESS_LOCAL", "speculative" : false, "accumulatorUpdates" : [ ], "errorMessage" : "java.lang.RuntimeException\n\tat org.apache.spark.ui.UISuite.$anonfun$new$8(UISuite.scala:95)\n\tat scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)\n\tat scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1003)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1003)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:136)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n", "taskMetrics" : { "executorDeserializeTime" : 0, "executorDeserializeCpuTime" : 0, "executorRunTime" : 206, "executorCpuTime" : 0, "resultSize" : 0, "jvmGcTime" : 0, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "peakExecutionMemory" : 0, "inputMetrics" : { "bytesRead" : 0, "recordsRead" : 0 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, "localBytesRead" : 0, "recordsRead" : 0 }, "shuffleWriteMetrics" : { "bytesWritten" : 0, "writeTime" : 0, "recordsWritten" : 0 } }, "executorLogs" : { }, "schedulerDelay" : 67, "gettingResultTime" : 0 } }, ``` With restful request ``` http://localhost:4040/api/v1/applications/local-1629432414554/stages/0?details=true&taskStatus=FAILED&taskStatus=SUCCESS ``` Restful result tasks ``` "tasks" : { "1" : { "taskId" : 1, "index" : 1, "attempt" : 0, "launchTime" : "2021-08-20T04:06:55.786GMT", "duration" : 16, "executorId" : "driver", "host" : "host", "status" : "SUCCESS", "taskLocality" : "PROCESS_LOCAL", "speculative" : false, "accumulatorUpdates" : [ ], "taskMetrics" : { "executorDeserializeTime" : 2, "executorDeserializeCpuTime" : 2638000, "executorRunTime" : 2, "executorCpuTime" : 1993000, "resultSize" : 837, "jvmGcTime" : 0, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "peakExecutionMemory" : 0, "inputMetrics" : { "bytesRead" : 0, "recordsRead" : 0 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, "localBytesRead" : 0, "recordsRead" : 0 }, "shuffleWriteMetrics" : { "bytesWritten" : 0, "writeTime" : 0, "recordsWritten" : 0 } }, "executorLogs" : { }, "schedulerDelay" : 12, "gettingResultTime" : 0 }, "0" : { "taskId" : 0, "index" : 0, "attempt" : 0, "launchTime" : "2021-08-20T04:06:55.515GMT", "duration" : 273, "executorId" : "driver", "host" : "host", "status" : "FAILED", "taskLocality" : "PROCESS_LOCAL", "speculative" : false, "accumulatorUpdates" : [ ], "errorMessage" : "java.lang.RuntimeException\n\tat org.apache.spark.ui.UISuite.$anonfun$new$8(UISuite.scala:95)\n\tat scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)\n\tat scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1003)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1003)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:136)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n", "taskMetrics" : { "executorDeserializeTime" : 0, "executorDeserializeCpuTime" : 0, "executorRunTime" : 206, "executorCpuTime" : 0, "resultSize" : 0, "jvmGcTime" : 0, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "peakExecutionMemory" : 0, "inputMetrics" : { "bytesRead" : 0, "recordsRead" : 0 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, "localBytesRead" : 0, "recordsRead" : 0 }, "shuffleWriteMetrics" : { "bytesWritten" : 0, "writeTime" : 0, "recordsWritten" : 0 } }, "executorLogs" : { }, "schedulerDelay" : 67, "gettingResultTime" : 0 } }, ``` Closes #33793 from AngersZhuuuu/SPARK-36549. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `5740d5641d`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-22 09:45:34 +09:00
ulysses-you	e0d2d8f1a6	[SPARK-35083][CORE][FOLLLOWUP] Improve docs and migration guide ### What changes were proposed in this pull request? * improve docs in `docs/job-scheduling.md` * add migration guide docs in `docs/core-migration-guide.md` ### Why are the changes needed? Help user to migrate. ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? Pass CI Closes #33794 from ulysses-you/SPARK-35083-f. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit `90cbf9ca3e`) Signed-off-by: Kent Yao <yao@apache.org>	2021-08-20 21:33:06 +08:00
Gengliang Wang	69be513c5e	Preparing development version 3.2.1-SNAPSHOT	2021-08-20 12:40:47 +00:00
Gengliang Wang	6bb3523d8e	Preparing Spark release v3.2.0-rc1	2021-08-20 12:40:40 +00:00
Gengliang Wang	fafdc1482b	Revert "Preparing Spark release v3.2.0-rc1" This reverts commit `8e58fafb05`.	2021-08-20 20:07:02 +08:00
Gengliang Wang	c829ed53ff	Revert "Preparing development version 3.2.1-SNAPSHOT" This reverts commit `4f1d21571d`.	2021-08-20 20:07:01 +08:00
Gengliang Wang	f47a519721	[SPARK-36551][BUILD] Add sphinx-plotly-directive in Spark release Dockerfile ### What changes were proposed in this pull request? After https://github.com/apache/spark/pull/32726, Python doc build requires `sphinx-plotly-directive`. This PR is to install it from `spark-rm/Dockerfile` to make sure `do-release-docker.sh` can run successfully. Also, this PR mentions it in the README of docs. ### Why are the changes needed? Fix release script and update README of docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test locally. Closes #33797 from gengliangwang/fixReleaseDocker. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `42eebb84f5`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-20 20:02:44 +08:00
Yuanjian Li	36c24a03bd	[SPARK-35312][SS][FOLLOW-UP] More documents and checking logic for the new options ### What changes were proposed in this pull request? Add more documents and checking logic for the new options `minOffsetPerTrigger` and `maxTriggerDelay`. ### Why are the changes needed? Have a clear description of the behavior introduced in SPARK-35312 ### Does this PR introduce _any_ user-facing change? Yes. If the user set minOffsetsPerTrigger > maxOffsetsPerTrigger, the new code will throw an AnalysisException. The original behavior is to ignore the maxOffsetsPerTrigger silenctly. ### How was this patch tested? Existing tests. Closes #33792 from xuanyuanking/SPARK-35312-follow. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> (cherry picked from commit `a0b24019ed`) Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>	2021-08-20 10:41:54 +09:00
Gengliang Wang	4f1d21571d	Preparing development version 3.2.1-SNAPSHOT	2021-08-19 14:08:32 +00:00
Gengliang Wang	8e58fafb05	Preparing Spark release v3.2.0-rc1	2021-08-19 14:08:26 +00:00
Gengliang Wang	fb56627f21	Revert "[SPARK-35083][FOLLOW-UP][CORE] Add migration guide for the re… …mote scheduler pool files support" This reverts commit `e3902d1975`. The feature is improvement instead of behavior change. Closes #33789 from gengliangwang/revertDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `b36b1c7e8a`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-19 21:30:19 +08:00
yi.wu	9544c24560	[SPARK-35083][FOLLOW-UP][CORE] Add migration guide for the remote scheduler pool files support ### What changes were proposed in this pull request? Add remote scheduler pool files support to the migration guide. ### Why are the changes needed? To highlight this useful support. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass exiting tests. Closes #33785 from Ngone51/SPARK-35083-follow-up. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: wuyi <yi.wu@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `e3902d1975`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-19 16:29:19 +08:00
Wenchen Fan	8f3b4c4b7d	[SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/30648 ANALYZE TABLE and TABLES are essentially the same command, it's weird to put them in 2 different doc pages. This PR proposes to merge them into one doc page. ### Why are the changes needed? simplify the doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #33781 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `07d173a8b0`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2021-08-19 11:04:20 +08:00
Wenchen Fan	5107ad3157	[SPARK-36535][SQL] Refine the sql reference doc ### What changes were proposed in this pull request? Refine the SQL reference doc: - remove useless subitems in the sidebar - remove useless sub-menu-pages (e.g. `sql-ref-syntax-aux.md`) - avoid using `#####` in `sql-ref-literals.md` ### Why are the changes needed? The subitems in the sidebar are quite useless, as the menu page serves the same functionalities: <img width="1040" alt="WX20210817-2358402x" src="https://user-images.githubusercontent.com/3182036/129765924-d7e69bc1-e351-4581-a6de-f2468022f372.png"> It's also extra work to keep the manu page and sidebar subitems in sync (The ANSI compliance page is already out of sync). The sub-menu-pages are only referenced by the sidebar, and duplicates the content of the menu page. As a result, the `sql-ref-syntax-aux.md` is already outdated compared to the menu page. It's easier to just look at the menu page. The `#####` is not rendered properly: <img width="776" alt="WX20210818-0001192x" src="https://user-images.githubusercontent.com/3182036/129766760-6f385443-e597-44aa-888d-14d128d45f84.png"> It's better to avoid using it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #33767 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit `4b015e8d7d`) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-08-17 12:46:49 -07:00
Gengliang Wang	70635b4b26	Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases" ### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](https://github.com/apache/spark/pull/32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. Also, external DBMS such as PostgreSQL and MySQL allow grouping by alias, too. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests Closes #33758 from gengliangwang/revertGroupByAlias. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `8bfb4f1e72`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-17 20:24:09 +08:00
Yuanjian Li	4caa43e398	[SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide ### What changes were proposed in this pull request? Add the document for the new RocksDBStateStoreProvider. ### Why are the changes needed? User guide for the new feature. ### Does this PR introduce _any_ user-facing change? No, doc only. ### How was this patch tested? Doc only. Closes #33683 from xuanyuanking/SPARK-36041. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit `3d57e00a7f`) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>	2021-08-16 12:32:19 -07:00
Venkata krishnan Sowrirajan	233af3d239	[SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation ### What changes were proposed in this pull request? Document the push-based shuffle feature with a high level overview of the feature and corresponding configuration options for both shuffle server side as well as client side. This is how the changes to the doc looks on the browser ([img](https://user-images.githubusercontent.com/8871522/129231582-ad86ee2f-246f-4b42-9528-4ccd693e86d2.png)) ### Why are the changes needed? Helps users understand the feature ### Does this PR introduce _any_ user-facing change? Docs ### How was this patch tested? N/A Closes #33615 from venkata91/SPARK-36374. Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit `2270ecf32f`) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>	2021-08-16 10:25:33 -05:00
Liang-Chi Hsieh	3aa933b162	[SPARK-36465][SS] Dynamic gap duration in session window ### What changes were proposed in this pull request? This patch supports dynamic gap duration in session window. ### Why are the changes needed? The gap duration used in session window for now is a static value. To support more complex usage, it is better to support dynamic gap duration which determines the gap duration by looking at the current data. For example, in our usecase, we may have different gap by looking at the certain column in the input rows. ### Does this PR introduce _any_ user-facing change? Yes, users can specify dynamic gap duration. ### How was this patch tested? Modified existing tests and new test. Closes #33691 from viirya/dynamic-session-window-gap. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> (cherry picked from commit `8b8d91cf64`) Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>	2021-08-16 11:06:16 +09:00
Max Gekk	8dbcbebc36	[SPARK-36468][SQL][DOCS] Update docs about ANSI interval literals ### What changes were proposed in this pull request? In the PR, I propose to update the doc page https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal, and describe formats of ANSI interval literals. <img width="1032" alt="Screenshot 2021-08-11 at 10 31 36" src="https://user-images.githubusercontent.com/1580697/128988454-7a6ac435-409b-4961-9b79-ebecfb141d5e.png"> <img width="1030" alt="Screenshot 2021-08-10 at 20 58 04" src="https://user-images.githubusercontent.com/1580697/128912018-a4ea3ee5-f252-49c7-a90e-5beaf7ac868f.png"> ### Why are the changes needed? To improve UX with Spark SQL, and inform users about recently added ANSI interval literals. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually checked the generated docs: ``` $ SKIP_API=1 SKIP_RDOC=1 SKIP_PYTHONDOC=1 SKIP_SCALADOC=1 bundle exec jekyll build ``` Closes #33693 from MaxGekk/doc-ansi-interval-literals. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit `bbf988bd73`) Signed-off-by: Max Gekk <max.gekk@gmail.com>	2021-08-11 13:38:52 +03:00

1 2 3 4 5 ...

3267 commits