ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
“attilapiros”	cdc8fc6233	[SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol ### What changes were proposed in this pull request? When `spark.shuffle.useOldFetchProtocol` is enabled then switching off the direct disk reading of host-local shuffle blocks and falling back to remote block fetching (and this way avoiding the `GetLocalDirsForExecutors` block transfer message which is introduced from Spark 3.0.0). ### Why are the changes needed? In `[SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host` a new block transfer message is introduced, `GetLocalDirsForExecutors`. This new message could be sent to the external shuffle service and as it is not supported by the previous version of external shuffle service it should be avoided when `spark.shuffle.useOldFetchProtocol` is true. In the migration guide I changed the exception type as `org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Decoder#fromByteBuffer` throws a IllegalArgumentException with the given text and uses the message type which is just a simple number (byte). I have checked and this is true for version 2.4.4 too. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This specific case (considering one extra boolean to switch off host local disk reading feature) is not tested but existing tests were run. Closes #26869 from attilapiros/SPARK-30235. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-17 10:32:15 -08:00
Aman Omer	297f406425	[SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType ### What changes were proposed in this pull request? Use `TypeCoercion.findWiderTypeForTwo()` instead of `TypeCoercion.findTightestCommonType()` while preprocessing `inputTypes` in `ArrayContains`. ### Why are the changes needed? `TypeCoercion.findWiderTypeForTwo()` also handles cases for DecimalType. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Test cases to be added. Closes #26811 from amanomer/29600. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-18 01:30:28 +08:00
Sean Owen	fac6b9bde8	Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies This reverts commit `709387d660`. See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions. ### What changes were proposed in this pull request? Revert the addition of skeleton graph API modules for Spark 3.0. ### Why are the changes needed? It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release. ### Does this PR introduce any user-facing change? No, the modules were not released. ### How was this patch tested? Existing tests, but mostly N/A. Closes #26928 from srowen/Revert27300. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-17 09:06:23 -08:00
Zhenhua Wang	18431c7baa	[SPARK-30269][SQL] Should use old partition stats to decide whether to update stats when analyzing partition ### What changes were proposed in this pull request? It's an obvious bug: currently when analyzing partition stats, we use old table stats to compare with newly computed stats to decide whether it should update stats or not. ### Why are the changes needed? bug fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? add new tests Closes #26908 from wzhfy/failto_update_part_stats. Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-17 22:21:26 +09:00
Kent Yao	bf7215c510	[SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache ### What changes were proposed in this pull request? A followup for #26699, clear the size field for interval column cache, which is needless and can reduce the memory cost. ### Why are the changes needed? followup ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing ut. Closes #26906 from yaooqinn/SPARK-30066-f. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-12-17 15:36:21 +09:00
Xingbo Jiang	1c714befd8	[SPARK-25100][TEST][FOLLOWUP] Refactor test cases in `FileSuite` and `KryoSerializerSuite` ### What changes were proposed in this pull request? Refactor test cases added by https://github.com/apache/spark/pull/26714, to improve code compactness. ### How was this patch tested? Tested locally. Closes #26916 from jiangxb1987/SPARK-25100. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-16 21:11:15 -08:00
ulysses	1da7e8295c	[SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT ### What changes were proposed in this pull request? Now spark use `ObjectInspectorCopyOption.JAVA` as oi option which will convert any string to UTF-8 string. When write non UTF-8 code data, then `EFBFBD` will appear. We should use `ObjectInspectorCopyOption.DEFAULT` to support pass the bytes. ### Why are the changes needed? Here is the way to reproduce: 1. make a file contains 16 radix 'AABBCC' which is not the UTF-8 code. 2. create table test1 (c string) location '$file_path'; 3. select hex(c) from test1; // AABBCC 4. craete table test2 (c string) as select c from test1; 5. select hex(c) from test2; // EFBFBDEFBFBDEFBFBD ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Closes #26831 from ulysses-you/SPARK-30201. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-17 12:15:53 +08:00
Terry Kim	e75d9afb2f	[SPARK-30094][SQL] Apply current namespace for the single-part table name ### What changes were proposed in this pull request? This PR applies the current namespace for the single-part table name if the current catalog is a non-session catalog. Note that the reason the current namespace is not applied for the session catalog is that the single-part name could be referencing a temp view which doesn't belong to any namespaces. The empty namespace for a table inside the session catalog is resolved by the session catalog implementation. ### Why are the changes needed? It's fixing the following bug where the current namespace is not respected: ``` sql("CREATE TABLE testcat.ns.t USING foo AS SELECT 1 AS id") sql("USE testcat.ns") sql("SHOW CURRENT NAMESPACE").show +-------+---------+ \|catalog\|namespace\| +-------+---------+ \|testcat\| ns\| +-------+---------+ // `t` is not resolved since the current namespace `ns` is not used. sql("DESCRIBE t").show Failed to analyze query: org.apache.spark.sql.AnalysisException: Table not found: t;; ``` ### Does this PR introduce any user-facing change? Yes, the above `DESCRIBE` command will succeed. ### How was this patch tested? Added tests. Closes #26894 from imback82/current_namespace. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-17 11:13:27 +08:00
Yuming Wang	696288f623	[INFRA] Reverts commit `56dcd79` and `c216ef1` ### What changes were proposed in this pull request? 1. Revert "Preparing development version 3.0.1-SNAPSHOT": `56dcd79` 2. Revert "Preparing Spark release v3.0.0-preview2-rc2": `c216ef1` ### Why are the changes needed? Shouldn't change master. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master Closes #26915 from wangyum/revert-master. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-16 19:57:44 -07:00
Yuming Wang	56dcd79992	Preparing development version 3.0.1-SNAPSHOT	2019-12-17 01:57:27 +00:00
Yuming Wang	c216ef1d03	Preparing Spark release v3.0.0-preview2-rc2	2019-12-17 01:57:21 +00:00
Yuming Wang	5de5e46624	[SPARK-30268][INFRA] Fix incorrect pyspark version when releasing preview versions ### What changes were proposed in this pull request? This PR fix incorrect pyspark version when releasing preview versions. ### Why are the changes needed? Failed to make Spark binary distribution: ``` cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: signing failed: No such file or directory gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory ``` ``` yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/ total 214140 drwxr-xr-x 2 yumwang stack 4096 Dec 16 06:17 ./ drwxr-xr-x 9 yumwang stack 4096 Dec 16 06:17 ../ -rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz ``` ``` /usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2' normalized_version, ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2 LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" \| sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/" 3.0.0.dev2 ``` Closes #26909 from wangyum/SPARK-30268. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-17 10:22:29 +09:00
Maxim Gekk	b03ce63c05	[SPARK-30258][TESTS] Eliminate warnings of deprecated Spark APIs in tests ### What changes were proposed in this pull request? In the PR, I propose to move all tests that use deprecated Spark APIs to separate test classes, and add the annotation: ```scala deprecated("This test suite will be removed.", "3.0.0") ``` The annotation suppress warnings from already deprecated methods and classes. ### Why are the changes needed? The warnings about deprecated Spark APIs in tests does not indicate any issues because the tests use such APIs intentionally. Eliminating the warnings allows to highlight other warnings that could show real problems. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites and by - DeprecatedAvroFunctionsSuite - DeprecatedDateFunctionsSuite - DeprecatedDatasetAggregatorSuite - DeprecatedStreamingAggregationSuite - DeprecatedWholeStageCodegenSuite Closes #26885 from MaxGekk/eliminate-deprecate-warnings. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-16 18:24:32 -06:00
Huaxin Gao	5ed72a1940	[SPARK-30247][PYSPARK] GaussianMixtureModel in py side should expose gaussian ### What changes were proposed in this pull request? expose gaussian in PySpark ### Why are the changes needed? A ```GaussianMixtureModel``` contains two parts of coefficients: ```weights``` & ```gaussians```. However, ```gaussians``` is not exposed on Python side. ### Does this PR introduce any user-facing change? Yes. ```GaussianMixtureModel.gaussians``` is exposed in PySpark. ### How was this patch tested? add doctest Closes #26882 from huaxingao/spark-30247. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-16 18:15:40 -06:00
shahid	dd217e10fc	[SPARK-25392][CORE][WEBUI] Prevent error page when accessing pools page from history server ### What changes were proposed in this pull request? ### Why are the changes needed? Currently from history server, we will not able to access the pool info, as we aren't writing pool information to the event log other than pool name. Already spark is hiding pool table when accessing from history server. But from the pool column in the stage table will redirect to the pools table, and that will throw error when accessing the pools page. To prevent error page, we need to hide the pool column also in the stage table ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manual test Before change: ![Screenshot 2019-11-21 at 6 49 40 AM](https://user-images.githubusercontent.com/23054875/69293868-219b2280-0c30-11ea-9b9a-17140d024d3a.png) ![Screenshot 2019-11-21 at 6 48 51 AM](https://user-images.githubusercontent.com/23054875/69293834-147e3380-0c30-11ea-9dec-d5f67665486d.png) After change: ![Screenshot 2019-11-21 at 7 29 01 AM](https://user-images.githubusercontent.com/23054875/69293991-9cfcd400-0c30-11ea-98a0-7a6268a4e5ab.png) Closes #26616 from shahidki31/poolHistory. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-16 15:02:34 -08:00
turbofei	5954311739	[SPARK-29043][CORE] Improve the concurrent performance of History Server Even we set spark.history.fs.numReplayThreads to a large number, such as 30. The history server still replays logs slowly. We found that, if there is a straggler in a batch of replay tasks, all the other threads will wait for this straggler. In this PR, we create processing to save the logs which are being replayed. So that the replay tasks can execute Asynchronously. It can accelerate the speed to replay logs for history server. No. UT. Closes #25797 from turboFei/SPARK-29043. Authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-16 14:45:27 -08:00
Niranjan Artal	dddfeca175	[SPARK-30209][SQL][WEB-UI] Display stageId, attemptId and taskId for max metrics in Spark UI ### What changes were proposed in this pull request? SPARK-30209 discusses about adding additional metrics such as stageId, attempId and taskId for max metrics. We have the data required to display in LiveStageMetrics. Need to capture and pass these metrics to display on the UI. To minimize memory used for variables, we are saving maximum of each metric id per stage. So per stage additional memory usage is (#metrics * 4 * sizeof(Long)). Then max is calculated for each metric id among all stages which is passed in the stringValue method. Memory used is minimal. Ran the benchmark for runtime. Stage.Proc time has increased to around 1.5-2.5x but the Aggregate time has decreased. ### Why are the changes needed? These additional metrics stageId, attemptId and taskId could help in debugging the jobs quicker. For a given operator, it will be easy to identify the task which is taking maximum time to complete from the SQL tab itself. ### Does this PR introduce any user-facing change? Yes. stageId, attemptId and taskId is shown only for executor side metrics. For driver metrics, "(driver)" is displayed on UI. ![image (3)](https://user-images.githubusercontent.com/50492963/70763041-929d9980-1d07-11ea-940f-88ac6bdce9b5.png) "Driver" ![image (4)](https://user-images.githubusercontent.com/50492963/70763043-94675d00-1d07-11ea-95ab-3478728cb435.png) ### How was this patch tested? Manually tested, ran benchmark script for runtime. Closes #26843 from nartal1/SPARK-30209. Authored-by: Niranjan Artal <nartal@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2019-12-16 15:27:34 -06:00
Shahin Shakeri	b573f23ed1	[SPARK-29574][K8S] Add SPARK_DIST_CLASSPATH to the executor class path ### What changes were proposed in this pull request? Include `$SPARK_DIST_CLASSPATH` in class path when launching `CoarseGrainedExecutorBackend` on Kubernetes executors using the provided `entrypoint.sh` ### Why are the changes needed? For user provided Hadoop, `$SPARK_DIST_CLASSPATH` contains the required jars. ### Does this PR introduce any user-facing change? no ### How was this patch tested? Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to `-cp ` param of entrypoint.sh enables launching the executors correctly. Closes #26493 from sshakeri/master. Authored-by: Shahin Shakeri <shahin.shakeri@pwc.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-16 10:11:50 -08:00
HyukjinKwon	23b1312324	[SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) ### What changes were proposed in this pull request? This PR adds the documentation of the new `mode` added to `Dataset.explain`. ### Why are the changes needed? To let users know the new modes. ### Does this PR introduce any user-facing change? No (doc-only change). ### How was this patch tested? Manually built the doc: ![Screen Shot 2019-12-16 at 3 34 28 PM](https://user-images.githubusercontent.com/6477701/70884617-d64f1680-2019-11ea-9336-247ade7f8768.png) Closes #26903 from HyukjinKwon/SPARK-30200-doc. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-16 21:35:37 +09:00
Yuming Wang	ba0f59bfaf	[SPARK-30265][INFRA] Do not change R version when releasing preview versions ### What changes were proposed in this pull request? This PR makes it do not change R version when releasing preview versions. ### Why are the changes needed? Failed to make Spark binary distribution: ``` ++ . /opt/spark-rm/output/spark-3.0.0-preview2-bin-hadoop2.7/R/find-r.sh +++ '[' -z /usr/bin ']' ++ /usr/bin/Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }' Loading required package: usethis Updating SparkR documentation First time using roxygen2. Upgrading automatically... Loading SparkR Invalid DESCRIPTION: Malformed package version. See section 'The DESCRIPTION file' in the 'Writing R Extensions' manual. Error: invalid version specification '3.0.0-preview2' In addition: Warning message: roxygen2 requires Encoding: UTF-8 Execution halted [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:804) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:751) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:313) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289) at org.apache.maven.cli.MavenCli.main (MavenCli.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 18.619 s] [INFO] Spark Project Tags ................................. SUCCESS [ 13.652 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 5.673 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.081 s] [INFO] Spark Project Networking ........................... SUCCESS [ 3.509 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 0.993 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 7.556 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 5.522 s] [INFO] Spark Project Core ................................. FAILURE [01:06 min] [INFO] Spark Project ML Local Library ..................... SKIPPED [INFO] Spark Project GraphX ............................... SKIPPED [INFO] Spark Project Streaming ............................ SKIPPED [INFO] Spark Project Catalyst ............................. SKIPPED [INFO] Spark Project SQL .................................. SKIPPED [INFO] Spark Project ML Library ........................... SKIPPED [INFO] Spark Project Tools ................................ SKIPPED [INFO] Spark Project Hive ................................. SKIPPED [INFO] Spark Project Graph API ............................ SKIPPED [INFO] Spark Project Cypher ............................... SKIPPED [INFO] Spark Project Graph ................................ SKIPPED [INFO] Spark Project REPL ................................. SKIPPED [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED [INFO] Spark Integration for Kafka 0.10 ................... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED [INFO] Spark Avro ......................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:04 min [INFO] Finished at: 2019-12-16T08:02:45Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (sparkr-pkg) on project spark-core_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :spark-core_2.12 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ```diff diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index cdb59093781..b648c51e010 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R -336,8 +336,8 sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion) + # Remove -preview2 from jvm versions + jvmVersionStrip <- gsub("-preview2", "", jvmVersion) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { ``` Closes #26904 from wangyum/SPARK-30265. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-16 04:54:12 -07:00
Wenchen Fan	fdcd0e71b9	[SPARK-30192][SQL] support column position in DS v2 ### What changes were proposed in this pull request? update DS v2 API to support add/alter column with column position ### Why are the changes needed? We have a parser rule for column position, but we fail the query if it's specified, because the builtin catalog can't support add/alter column with column position. Since we have the catalog plugin API now, we should let the catalog implementation to decide if it supports column position or not. ### Does this PR introduce any user-facing change? not yet ### How was this patch tested? new tests Closes #26817 from cloud-fan/parser. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-16 18:55:17 +08:00
Terry Kim	72f5597ce2	[SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier ### What changes were proposed in this pull request? As discussed in https://github.com/apache/spark/pull/26741#discussion_r357504518, `LookupCatalog.AsTemporaryViewIdentifier` is no longer used and can be removed. ### Why are the changes needed? Code clean up ### Does this PR introduce any user-facing change? No ### How was this patch tested? Removed tests that were testing solely `AsTemporaryViewIdentifier` extractor. Closes #26897 from imback82/30104-followup. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-16 17:43:01 +08:00
Boris Boutkov	3bf5498b4a	[MINOR][DOCS] Fix documentation for slide function ### What changes were proposed in this pull request? This PR proposes to fix documentation for slide function. Fixed the spacing issue and added some parameter related info. ### Why are the changes needed? Documentation improvement ### Does this PR introduce any user-facing change? No (doc-only change). ### How was this patch tested? Manually tested by documentation build. Closes #26896 from bboutkov/pyspark_doc_fix. Authored-by: Boris Boutkov <boris.boutkov@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-16 16:29:09 +09:00
Yuming Wang	1fc353d51a	Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` ### What changes were proposed in this pull request? This reverts commit `7c0ce285`. ### Why are the changes needed? Failed to make distribution: ``` [INFO] -----------------< org.apache.spark:spark-sketch_2.12 >----------------- [INFO] Building Spark Project Sketch 3.0.0-preview2 [3/33] [INFO] --------------------------------[ jar ]--------------------------------- [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2-tests.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 26.513 s] [INFO] Spark Project Tags ................................. SUCCESS [ 48.393 s] [INFO] Spark Project Sketch ............................... FAILURE [ 0.034 s] [INFO] Spark Project Local DB ............................. SKIPPED [INFO] Spark Project Networking ........................... SKIPPED [INFO] Spark Project Shuffle Streaming Service ............ SKIPPED [INFO] Spark Project Unsafe ............................... SKIPPED [INFO] Spark Project Launcher ............................. SKIPPED [INFO] Spark Project Core ................................. SKIPPED [INFO] Spark Project ML Local Library ..................... SKIPPED [INFO] Spark Project GraphX ............................... SKIPPED [INFO] Spark Project Streaming ............................ SKIPPED [INFO] Spark Project Catalyst ............................. SKIPPED [INFO] Spark Project SQL .................................. SKIPPED [INFO] Spark Project ML Library ........................... SKIPPED [INFO] Spark Project Tools ................................ SKIPPED [INFO] Spark Project Hive ................................. SKIPPED [INFO] Spark Project Graph API ............................ SKIPPED [INFO] Spark Project Cypher ............................... SKIPPED [INFO] Spark Project Graph ................................ SKIPPED [INFO] Spark Project REPL ................................. SKIPPED [INFO] Spark Project YARN Shuffle Service ................. SKIPPED [INFO] Spark Project YARN ................................. SKIPPED [INFO] Spark Project Mesos ................................ SKIPPED [INFO] Spark Project Kubernetes ........................... SKIPPED [INFO] Spark Project Hive Thrift Server ................... SKIPPED [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED [INFO] Spark Integration for Kafka 0.10 ................... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED [INFO] Spark Avro ......................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:15 min [INFO] Finished at: 2019-12-16T05:29:43Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project spark-sketch_2.12: Could not resolve dependencies for project org.apache.spark:spark-sketch_2.12🫙3.0.0-preview2: Could not find artifact org.apache.spark:spark-tags_2.12🫙tests:3.0.0-preview2 in central (https://repo.maven.apache.org/maven2) -> [Help 1] [ERROR] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test. Closes #26902 from wangyum/SPARK-30056. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-15 23:16:17 -07:00
HyukjinKwon	0a2afcec7d	[SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes ### What changes were proposed in this pull request? This PR mainly targets: 1. Expose only explain(mode: String) in Scala side 2. Clean up related codes - Hide `ExplainMode` under private `execution` package. No particular reason but just because `ExplainUtils` exists there - Use `case object` + `trait` pattern in `ExplainMode` to look after `ParseMode`. - Move `Dataset.toExplainString` to `QueryExecution.explainString` to look after `QueryExecution.simpleString`, and deduplicate the codes at `ExplainCommand`. - Use `ExplainMode` in `ExplainCommand` too. - Add `explainString` to `PythonSQLUtils` to avoid unexpected test failure of PySpark during refactoring Scala codes side. ### Why are the changes needed? To minimised exposed APIs, deduplicate, and clean up. ### Does this PR introduce any user-facing change? `Dataset.explain(mode: ExplainMode)` will be removed (which only exists in master). ### How was this patch tested? Manually tested and existing tests should cover. Closes #26898 from HyukjinKwon/SPARK-30200-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-16 14:42:35 +09:00
Yuming Wang	26b658f6fb	[SPARK-30253][INFRA] Do not add commits when releasing preview version ### What changes were proposed in this pull request? This PR add support do not add commits to master branch when releasing preview version. ### Why are the changes needed? We need manual revert this change, example: ![image](https://user-images.githubusercontent.com/5399861/70788945-f9d15180-1dcc-11ea-81f5-c0d89c28440a.png) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test Closes #26879 from wangyum/SPARK-30253. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-15 19:44:29 -07:00
Maxim Gekk	67b644c3d7	[SPARK-30166][SQL] Eliminate compilation warnings in JSONOptions ### What changes were proposed in this pull request? In the PR, I propose to replace `setJacksonOptions()` in `JSONOptions` by `buildJsonFactory()` which builds `JsonFactory` using `JsonFactoryBuilder`. This allows to avoid using deprecated feature configurations from `JsonParser.Feature`. ### Why are the changes needed? - The changes eliminate the following compilation warnings in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala`: ``` Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, allowNumericLeadingZeros) Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, allowNonNumericNumbers) Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) ``` - This put together building JsonFactory and set options from JSONOptions. So, we will not forget to call `setJacksonOptions` in the future. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By `JsonSuite`, `JsonFunctionsSuite`, `JsonExpressionsSuite`. Closes #26797 from MaxGekk/eliminate-warning. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-15 08:45:57 -06:00
Nicholas Chammas	58b29392f8	[SPARK-30173][INFRA] Automatically close stale PRs ### What changes were proposed in this pull request? This PR adds [a GitHub workflow to automatically close stale PRs](https://github.com/marketplace/actions/close-stale-issues). ### Why are the changes needed? This will help cut down the number of open but stale PRs and keep the PR queue manageable. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I'm not sure how to test this PR without impacting real PRs on the repo. See: https://github.com/actions/stale/issues/32 Closes #26877 from nchammas/SPARK-30173-stale-prs. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-15 08:42:16 -06:00
Marcelo Vanzin	a9fbd31030	[SPARK-30240][CORE] Support HTTP redirects directly to a proxy server ### What changes were proposed in this pull request? The PR adds a new config option to configure an address for the proxy server, and a new handler that intercepts redirects and replaces the URL with one pointing at the proxy server. This is needed on top of the "proxy base path" support because redirects use full URLs, not just absolute paths from the server's root. ### Why are the changes needed? Spark's web UI has support for generating links to paths with a prefix, to support a proxy server, but those do not apply when the UI is responding with redirects. In that case, Spark is sending its own URL back to the client, and if it's behind a dumb proxy server that doesn't do rewriting (like when using stunnel for HTTPS support) then the client will see the wrong URL and may fail. ### Does this PR introduce any user-facing change? Yes. It's a new UI option. ### How was this patch tested? Tested with added unit test, with Spark behind stunnel, and in a more complicated app using a different HTTPS proxy. Closes #26873 from vanzin/SPARK-30240. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 17:39:06 -08:00
xiaodeshan	fb2f5a4906	[SPARK-25100][CORE] Register TaskCommitMessage to KyroSerializer ## What changes were proposed in this pull request? Fix the bug when invoking saveAsNewAPIHadoopDataset to store data, the job will fail because the class TaskCommitMessage hasn't be registered if serializer is KryoSerializer and spark.kryo.registrationRequired is true ## How was this patch tested? UT Closes #26714 from deshanxiao/SPARK-25100. Authored-by: xiaodeshan <xiaodeshan@xiaomi.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 17:15:30 -08:00
fuwhu	4cbef8988e	[SPARK-30259][SQL] Fix CREATE TABLE behavior when session catalog is specified explicitly ### What changes were proposed in this pull request? Fix bug : CREATE TABLE throw error when session catalog specified explicitly. ### Why are the changes needed? Currently, Spark throw error when the session catalog is specified explicitly in "CREATE TABLE" and "CREATE TABLE AS SELECT" command, eg. > CREATE TABLE spark_catalog.tbl USING json AS SELECT 1 AS i; the error message is like below: > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_table : db=spark_catalog tbl=tbl > 19/12/14 10:56:08 INFO HiveMetaStore: 0: get_database: spark_catalog > 19/12/14 10:56:08 INFO audit: ugi=fuwhu ip=unknown-ip-addr cmd=get_database: spark_catalog > 19/12/14 10:56:08 WARN ObjectStore: Failed to get database spark_catalog, returning NoSuchObjectException > Error in query: Database 'spark_catalog' not found; ### Does this PR introduce any user-facing change? Yes, after this PR, "CREATE TALBE" and "CREATE TABLE AS SELECT" can complete successfully when session catalog "spark_catalog" specified explicitly. ### How was this patch tested? New unit tests added. Closes #26887 from fuwhu/SPARK-30259. Authored-by: fuwhu <bestwwg@163.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 15:36:14 -08:00
Takeshi Yamamuro	f483a13d4a	[SPARK-30231][SQL][PYTHON][FOLLOWUP] Make error messages clear in PySpark df.explain ### What changes were proposed in this pull request? This pr is a followup of #26861 to address minor comments from viirya. ### Why are the changes needed? For better error messages. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually tested. Closes #26886 from maropu/SPARK-30231-FOLLOWUP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 14:26:50 -08:00
Sean Owen	46e950bea8	[SPARK-30263][CORE] Don't log potentially sensitive value of non-Spark properties ignored in spark-submit ### What changes were proposed in this pull request? The value of non-Spark config properties ignored in spark-submit is no longer logged. ### Why are the changes needed? The value isn't really needed in the logs, and could contain potentially sensitive info. While we can redact the values selectively too, I figured it's more robust to just not log them at all here, as the values aren't important in this log statement. ### Does this PR introduce any user-facing change? Other than the change to logging above, no. ### How was this patch tested? Existing tests Closes #26893 from srowen/SPARK-30263. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 13:13:54 -08:00
Kent Yao	d3ec8b1735	[SPARK-30066][SQL] Support columnar execution on interval types ### What changes were proposed in this pull request? Columnar execution support for interval types ### Why are the changes needed? support cache tables with interval columns improve performance too ### Does this PR introduce any user-facing change? Yes cache table with accept interval columns ### How was this patch tested? add ut Closes #26699 from yaooqinn/SPARK-30066. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 13:10:46 -08:00
John Ayad	f197204f03	[SPARK-30236][SQL][DOCS] Clarify date and time patterns supported in docs ### What changes were proposed in this pull request? Link to appropriate Java Class with list of date/time patterns supported ### Why are the changes needed? Avoid confusion on the end-user's side of things, as seen in questions like [this](https://stackoverflow.com/questions/54496878/date-format-conversion-is-adding-1-year-to-the-border-dates) on StackOverflow ### Does this PR introduce any user-facing change? Yes, Docs are updated. ### How was this patch tested? `date_format`: ![image](https://user-images.githubusercontent.com/2394761/70796647-b5c55900-1d9a-11ea-89f9-7a8661641c09.png) `to_unix_timestamp`: ![image](https://user-images.githubusercontent.com/2394761/70796664-c07fee00-1d9a-11ea-9029-e82d899e3f59.png) `unix_timestamp`: ![image](https://user-images.githubusercontent.com/2394761/70796688-caa1ec80-1d9a-11ea-8868-a18c437a5d49.png) `from_unixtime`: ![image](https://user-images.githubusercontent.com/2394761/70796703-d4c3eb00-1d9a-11ea-85fe-3c672e0cda28.png) `to_date`: ![image](https://user-images.githubusercontent.com/2394761/70796718-dd1c2600-1d9a-11ea-81f4-a0966eeb0f1d.png) `to_timestamp`: ![image](https://user-images.githubusercontent.com/2394761/70796735-e6a58e00-1d9a-11ea-8ef7-d3e1d9b5370f.png) Closes #26864 from johnhany97/SPARK-30236. Authored-by: John Ayad <johnhany97@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-14 13:08:15 -08:00
Burak Yavuz	4c37a8a3f4	[SPARK-30143][SS] Add a timeout on stopping a streaming query ### What changes were proposed in this pull request? Add a timeout configuration for StreamingQuery.stop() ### Why are the changes needed? The stop() method on a Streaming Query awaits the termination of the stream execution thread. However, the stream execution thread may block forever depending on the streaming source implementation (like in Kafka, which runs UninterruptibleThreads). This causes control flow applications to hang indefinitely as well. We'd like to introduce a timeout to stop the execution thread, so that the control flow thread can decide to do an action if a timeout is hit. ### Does this PR introduce any user-facing change? By default, no. If the timeout configuration is set, then a TimeoutException will be thrown if a stream cannot be stopped within the given timeout. ### How was this patch tested? Unit tests Closes #26771 from brkyvz/stopTimeout. Lead-authored-by: Burak Yavuz <brkyvz@gmail.com> Co-authored-by: Burak Yavuz <burak@databricks.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com>	2019-12-13 15:16:00 -08:00
Kousuke Saruta	61ebc81186	[SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly ### What changes were proposed in this pull request? In the current implementation of `SparkShellLoggingFilter`, if the log level of the root logger and the log level of a message are different, whether a message should logged is decided based on log4j's configuration but whether the message should be output to the REPL's console is not cared. So, if the log level of the root logger is `DEBUG`, the log level of REPL's logger is `WARN` and the log level of a message is `INFO`, the message will output to the REPL's console even though `INFO < WARN`. https://github.com/apache/spark/pull/26798/files#diff-bfd5810d8aa78ad90150e806d830bb78L237 The ideal behavior should be like as follows and implemented them in this change. 1. If the log level of a message is greater than or equal to the log level of the root logger, the message should be logged but whether the message is output to the REPL's console should be decided based on whether the log level of the message is greater than or equal to the log level of the REPL's logger. 2. If a log level or custom appenders are explicitly defined for a category, whether a log message via the logger corresponding to the category is logged and output to the REPL's console should be decided baed on the log level of the category. We can confirm whether a log level or appenders are explicitly set to a logger for a category by `Logger#getLevel` and `Logger#getAllAppenders.hasMoreElements`. ### Why are the changes needed? This is a bug breaking a compatibility. #9816 enabled REPL's log4j configuration to override root logger but #23675 seems to have broken the feature. You can see one example when you modifies the default log4j configuration like as follows. ``` # Change the log level for rootCategory to DEBUG log4j.rootCategory=DEBUG, console ... # The log level for repl.Main remains WARN log4j.logger.org.apache.spark.repl.Main=WARN ``` If you launch REPL with the configuration, INFO level logs appear even though the log level for REPL is WARN. ``` ・・・ 19/12/08 23:31:38 INFO Utils: Successfully started service 'sparkDriver' on port 33083. 19/12/08 23:31:38 INFO SparkEnv: Registering MapOutputTracker 19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMaster 19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMasterHeartbeat ・・・ ``` Before #23675 was applied, those INFO level logs are not shown with the same log4j.properties. ### Does this PR introduce any user-facing change? Yes. The logging behavior for REPL is fixed. ### How was this patch tested? Manual test and newly added unit test. Closes #26798 from sarutak/fix-spark-shell-loglevel. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>	2019-12-13 14:30:11 -08:00
sharan.gk	ec26dde36b	[SPARK-29455][WEBUI] Improve tooltip information for Stages ### What changes were proposed in this pull request? Adding tooltip to Stages tab for better usability. ### Why are the changes needed? There are a few common points of confusion in the UI that could be clarified with tooltips. We should add tooltips to explain. ### Does this PR introduce any user-facing change? Yes ![image](https://user-images.githubusercontent.com/29914590/70693889-5a389400-1ce4-11ea-91bb-ee1e997a5c35.png) ### How was this patch tested? Manual Closes #26859 from sharangk/tooltip1. Authored-by: sharan.gk <sharan.gk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 11:35:00 -08:00
Yuming Wang	e1ee3fb72f	[SPARK-30216][INFRA] Use python3 in Docker release image ### What changes were proposed in this pull request? - Reverts commit `1f94bf4` and `d6be46e` - Switches python to python3 in Docker release image. ### Why are the changes needed? `dev/make-distribution.sh` and `python/setup.py` are use python3. https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236 https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release Output directory already exists. Overwrite and continue? [y/n] y Branch [branch-2.4]: master Current branch version is 3.0.0-SNAPSHOT. Release [3.0.0]: 3.0.0-preview2 RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [master]: ASF user [yumwang]: Full name [Yuming Wang]: GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 ================ Release details: BRANCH: master VERSION: 3.0.0-preview2 TAG: v3.0.0-preview2-rc1 NEXT: 3.0.1-SNAPSHOT ASF USER: yumwang GPG KEY: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 FULL NAME: Yuming Wang E-MAIL: yumwangapache.org ================ Is this info correct [y/n]? y GPG passphrase: ======================== = Building spark-rm image with tag latest... Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm Log file: docker-build.log Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output gpg: directory '/home/spark-rm/.gnupg' created gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported gpg: key 6E1B4122F6A3A338: secret key imported gpg: Total number processed: 1 gpg: imported: 1 gpg: secret keys read: 1 gpg: secret keys imported: 1 ======================== = Creating release tag v3.0.0-preview2-rc1... Command: /opt/spark-rm/release-tag.sh Log file: tag.log It may take some time for the tag to be synchronized to github. Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available. ======================== = Building Spark... Command: /opt/spark-rm/release-build.sh package Log file: build.log ======================== = Building documentation... Command: /opt/spark-rm/release-build.sh docs Log file: docs.log ======================== = Publishing release Command: /opt/spark-rm/release-build.sh publish-release Log file: publish.log ``` Generated doc: ![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png) Closes #26848 from wangyum/SPARK-30216. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 11:31:31 -08:00
Gengliang Wang	4da9780bc0	Revert "[SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%'" This reverts commit `cada5beef7`. Closes #26883 from gengliangwang/revert. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 11:23:55 -08:00
Dongjoon Hyun	cc276f8a6e	[SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4 ### What changes were proposed in this pull request? This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`. ### Why are the changes needed? This will bring the latest bug fixes. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with K8s integration test. Closes #26874 from dongjoon-hyun/SPARK-30243. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 08:25:51 -08:00
Terry Kim	ac9b1881a2	[SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier ### What changes were proposed in this pull request? If a table name is qualified with session catalog name `spark_catalog`, the `DROP TABLE` command fails. For example, the following ``` sql("CREATE TABLE tbl USING json AS SELECT 1 AS i") sql("DROP TABLE spark_catalog.tbl") ``` fails with: ``` org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'spark_catalog' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:42) at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists$(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:336) ``` This PR correctly resolves `spark_catalog` as a catalog. ### Why are the changes needed? It's fixing a bug. ### Does this PR introduce any user-facing change? Yes, now, the `spark_catalog.tbl` in the above example is dropped as expected. ### How was this patch tested? Added a test. Closes #26878 from imback82/fix_drop_table. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-13 21:45:35 +08:00
Takeshi Yamamuro	64c7b94d64	[SPARK-30231][SQL][PYTHON] Support explain mode in PySpark df.explain ### What changes were proposed in this pull request? This pr intends to support explain modes implemented in #26829 for PySpark. ### Why are the changes needed? For better debugging info. in PySpark dataframes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UTs. Closes #26861 from maropu/ExplainModeInPython. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-13 17:44:23 +09:00
Jungtaek Lim (HeartSaVioR)	94eb66593a	[SPARK-30227][SQL] Add close() on DataWriter interface ### What changes were proposed in this pull request? This patch adds close() method to the DataWriter interface, which will become the place to cleanup the resource. ### Why are the changes needed? The lifecycle of DataWriter instance ends at either commit() or abort(). That makes datasource implementors to feel they can place resource cleanup in both sides, but abort() can be called when commit() fails; so they have to ensure they don't do double-cleanup if cleanup is not idempotent. ### Does this PR introduce any user-facing change? Depends on the definition of user; if they're developers of custom DSv2 source, they have to add close() in their DataWriter implementations. It's OK to just add close() with empty content as they should have already dealt with resource cleanup in commit/abort, but they would love to migrate the resource cleanup logic to close() as it avoids double cleanup. If they're just end users using the provided DSv2 source (regardless of built-in/3rd party), no change. ### How was this patch tested? Existing tests. Closes #26855 from HeartSaVioR/SPARK-30227. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-12-13 16:12:41 +08:00
Pablo Langa	cb6d2b3f83	[SPARK-30040][SQL] DROP FUNCTION should do multi-catalog resolution ### What changes were proposed in this pull request? Add DropFunctionStatement and make DROP FUNCTION go through the same catalog/table resolution framework of v2 commands. ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing DROP FUNCTION namespace.function ### Does this PR introduce any user-facing change? Yes. When running DROP FUNCTION namespace.function Spark fails the command if the current catalog is set to a v2 catalog. ### How was this patch tested? Unit tests. Closes #26854 from planga82/feature/SPARK-30040_DropFunctionV2Catalog. Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-12 15:15:54 -08:00
Anton Okolnychyi	5114389aef	[SPARK-30107][SQL] Expose nested schema pruning to all V2 sources ### What changes were proposed in this pull request? This PR exposes the existing logic for nested schema pruning to all sources, which is in line with the description of `SupportsPushDownRequiredColumns` . Right now, `SchemaPruning` (rule, not helper utility) is applied in the optimizer directly on certain instances of `Table` ignoring `SupportsPushDownRequiredColumns` that is part of `ScanBuilder`. I think it would be cleaner to perform schema pruning and filter push-down in one place. Therefore, this PR moves all the logic into `V2ScanRelationPushDown`. ### Why are the changes needed? This change allows all V2 data sources to benefit from nested column pruning (if they support it). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This PR mostly relies on existing tests. On top, it adds one test to verify that top-level schema pruning works as well as one test for predicates with subqueries. Closes #26751 from aokolnychyi/nested-schema-pruning-ds-v2. Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2019-12-12 13:40:46 -08:00
Wenchen Fan	982f72f4c3	[SPARK-30238][SQL] hive partition pruning can only support string and integral types ### What changes were proposed in this pull request? Check the partition column data type and only allow string and integral types in hive partition pruning. ### Why are the changes needed? Currently we only support string and integral types in hive partition pruning, but the check is done for literals. If the predicate is `InSet`, then there is no literal and we may pass an unsupported partition predicate to Hive and cause problems. ### Does this PR introduce any user-facing change? yes. fix a bug. A query fails before and can run now. ### How was this patch tested? a new test Closes #26871 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-12 13:07:20 -08:00
ulysses	cada5beef7	[SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%' ### What changes were proposed in this pull request? Since [25001](https://github.com/apache/spark/pull/25001), spark support like escape syntax. But '%' and '_' is the reserve char in `Like` expression. We can not use them as escape char. ### Why are the changes needed? Avoid some unexpect problem when using like escape syntax. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add UT. Closes #26860 from ulysses-you/SPARK-30230. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2019-12-12 09:52:27 -08:00
Yuming Wang	39c0696a39	[MINOR] Fix google style guide address ### What changes were proposed in this pull request? This PR update google style guide address to `https://google.github.io/styleguide/javaguide.html`. ### Why are the changes needed? `https://google-styleguide.googlecode.com/svn-history/r130/trunk/javaguide.html` 404: ![image](https://user-images.githubusercontent.com/5399861/70717915-431c9500-1d2a-11ea-895b-024be953a116.png) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Closes #26865 from wangyum/fix-google-styleguide. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-12 11:04:01 -06:00
HyukjinKwon	cc087a3ac5	[SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2 implementation ### What changes were proposed in this pull request? This PR proposes to add `PushedFilters` into metadata to show the pushed filters in Parquet DSv2 implementation. In case of ORC, it is already added at https://github.com/apache/spark/pull/24719/files#diff-0fc82694b20da3cd2cbb07206920eef7R62-R64 ### Why are the changes needed? In order for users to be able to debug, and to match with ORC. ### Does this PR introduce any user-facing change? ```scala spark.range(10).write.mode("overwrite").parquet("/tmp/foo") spark.read.parquet("/tmp/foo").filter("5 > id").explain() ``` Before: ``` == Physical Plan == (1) Project [id#20L] +- (1) Filter (isnotnull(id#20L) AND (5 > id#20L)) +- (1) ColumnarToRow +- BatchScan[id#20L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint> ``` After:* ``` == Physical Plan == (1) Project [id#13L] +- (1) Filter (isnotnull(id#13L) AND (5 > id#13L)) +- *(1) ColumnarToRow +- BatchScan[id#13L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>, PushedFilters: [IsNotNull(id), LessThan(id,5)] ``` ### How was this patch tested? Unittest were added and manually tested. Closes #26857 from HyukjinKwon/SPARK-30162. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-12 08:33:33 -08:00

... 3 4 5 6 7 ...

26211 commits