ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
HyukjinKwon	d98c216e19	[SPARK-31960][YARN][DOCS][FOLLOW-UP] Document the behaviour change of Hadoop's classpath propagation in migration guide ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/28788, and proposes to update migration guide. ### Why are the changes needed? To tell users about the behaviour change. ### Does this PR introduce _any_ user-facing change? Yes, it updates migration guides for users. ### How was this patch tested? GitHub Actions' documentation build should test it. Closes #30903 from HyukjinKwon/SPARK-31960-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-23 18:04:28 +09:00
Dongjoon Hyun	90d6f86001	[SPARK-33870][CORE] Enable spark.storage.replication.proactive by default ### What changes were proposed in this pull request? This PR aims to enable `spark.storage.replication.proactive` by default for Apache Spark 3.2.0. ### Why are the changes needed? `spark.storage.replication.proactive` is added by SPARK-15355 at Apache Spark 2.2.0 and has been helpful when the block manager loss occurs frequently like K8s environment. ### Does this PR introduce _any_ user-facing change? Yes, this will make the Spark jobs more robust. ### How was this patch tested? Pass the existing UTs. Closes #30876 from dongjoon-hyun/SPARK-33870. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-12-22 21:59:53 -08:00
Kent Yao	a3dd8dacee	[SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list We support a column list of INSERT for Spark v3.1.0 (See: SPARK-32976 (https://github.com/apache/spark/pull/29893)). So, this PR targets at documenting it in the SQL documents. ### What changes were proposed in this pull request? improve doc ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? doc ### How was this patch tested? passing GA doc gen. ![image](https://user-images.githubusercontent.com/8326978/102954876-8994fa00-450f-11eb-81f9-931af6d1f69b.png) ![image](https://user-images.githubusercontent.com/8326978/102954900-99acd980-450f-11eb-9733-115ad37d2319.png) ![image](https://user-images.githubusercontent.com/8326978/102954935-af220380-450f-11eb-9aaa-fdae0725d41e.png) ![image](https://user-images.githubusercontent.com/8326978/102954949-bc3ef280-450f-11eb-8a0d-d7b688efa7bb.png) Closes #30888 from yaooqinn/SPARK-33877. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-12-22 19:46:37 -08:00
ulysses-you	bc46d273e0	[SPARK-33840][DOCS] Add spark.sql.files.minPartitionNum to performence tuning doc ### What changes were proposed in this pull request? Add `spark.sql.files.minPartitionNum` and it's description to sql-performence-tuning.md. ### Why are the changes needed? Help user to find it. ### Does this PR introduce _any_ user-facing change? Yes, it's the doc. ### How was this patch tested? Pass CI. Closes #30838 from ulysses-you/SPARK-33840. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-18 20:27:16 +09:00
Liang-Chi Hsieh	42e1831ebb	[SPARK-33797][SS][DOCS] Update SS doc about State Store and task locality ### What changes were proposed in this pull request? This updates SS documentation to document about State Store and task locality. ### Why are the changes needed? During running some tests for structured streaming, I found state store locality becomes an issue sometimes and it is not very straightforward for end-users. It'd be great if we can document it. ### Does this PR introduce _any_ user-facing change? No, only doc change. ### How was this patch tested? No, only doc change. Closes #30789 from viirya/ss-statestore-doc. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>	2020-12-18 10:48:51 +09:00
Gengliang Wang	dd042f58e7	[SPARK-33796][DOCS] Show hidden text from the left menu of Spark Doc ### What changes were proposed in this pull request? If the text in the left menu of Spark is too long, it will be hidden. ![sql1](https://user-images.githubusercontent.com/1097932/102249583-5ae7a580-3eb7-11eb-813c-f2e2fe019d28.jpeg) This PR is to fix the style issue. ### Why are the changes needed? Improve the UI of Spark documentation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test After changes: ![sql2](https://user-images.githubusercontent.com/1097932/102249603-5fac5980-3eb7-11eb-806d-4e7b8248e6b6.jpeg) Closes #30786 from gengliangwang/fixDocStyle. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-16 10:07:35 +09:00
David McWhorter	87c58367cd	[SPARK-22256][MESOS] Introduce spark.mesos.driver.memoryOverhead ### What changes were proposed in this pull request? This is a simple change to support allocating a specified amount of overhead memory for the driver's mesos container. This is already supported for executors. ### Why are the changes needed? This is needed to keep the driver process from exceeding memory limits and being killed off when running on mesos. ### Does this PR introduce _any_ user-facing change? Yes, it adds a `spark.mesos.driver.memoryOverhead` configuration option. Documentation changes for this option are included in the PR. ### How was this patch tested? Test cases covering allocation of driver memory overhead are included in the changes. ### Other notes This is a second attempt to get this change reviewed, accepted and merged. The original pull request was closed as stale back in January: https://github.com/apache/spark/pull/21006. For this pull request, I took the original change by pmackles, rebased it onto the current master branch, and added a test case that was requested in the original code review. I'm happy to make any further edits or do anything needed so that this can be included in a future spark release. I keep having to build custom spark distributions so that we can use spark within our mesos clusters. Closes #30739 from dmcwhorter/dmcwhorter-SPARK-22256. Lead-authored-by: David McWhorter <david_mcwhorter@premierinc.com> Co-authored-by: Paul Mackles <pmackles@adobe.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-15 14:00:38 -08:00
HyukjinKwon	a99a47ca1d	[SPARK-33748][K8S] Respect environment variables and configurations for Python executables ### What changes were proposed in this pull request? This PR proposes: - Respect `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations in Kubernates just like other cluster types in Spark. - Depreate `spark.kubernetes.pyspark.pythonVersion` and guide users to set the environment variables and configurations for Python executables. NOTE that `spark.kubernetes.pyspark.pythonVersion` is already a no-op configuration without this PR. Default is `3` and other values are disallowed. - In order for Python executable settings to be consistently used, fix `spark.archives` option to unpack into the current working directory in the driver of Kubernates' cluster mode. This behaviour is identical with Yarn's cluster mode. By doing this, users can leverage Conda or virtuenenv in cluster mode as below: ```python conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack conda activate pyspark_conda_env conda pack -f -o pyspark_conda_env.tar.gz PYSPARK_PYTHON=./environment/bin/python spark-submit --archives pyspark_conda_env.tar.gz#environment app.py ``` - Removed several unused or useless codes such as `extractS3Key` and `renameResourcesToLocalFS` ### Why are the changes needed? - To provide a consistent support of PySpark by using `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations. - To provide Conda and virtualenv support via `spark.archives` options. ### Does this PR introduce _any_ user-facing change? Yes: - `spark.kubernetes.pyspark.pythonVersion` is deprecated. - `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, and `spark.pyspark.python` and `spark.pyspark.driver.python` configurations are respected. ### How was this patch tested? Manually tested via: ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.2.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Unittests were also added. Closes #30735 from HyukjinKwon/SPARK-33748. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-15 08:56:45 +09:00
Linhong Liu	b7c8210135	[SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases ### What changes were proposed in this pull request? Addressed comments in PR #30567, including: 1. add test case for SPARK-33647 and SPARK-33142 2. add migration guide 3. add `getRawTempView` and `getRawGlobalTempView` to return the raw view info (i.e. TemporaryViewRelation) 4. other minor code clean ### Why are the changes needed? Code clean and more test cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing and newly added test cases Closes #30666 from linhongliu-db/SPARK-33142-followup. Lead-authored-by: Linhong Liu <linhong.liu@databricks.com> Co-authored-by: Linhong Liu <67896261+linhongliu-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-12-14 08:31:50 +00:00
Gengliang Wang	6e862792fb	[SPARK-33723][SQL] ANSI mode: Casting String to Date should throw exception on parse error ### What changes were proposed in this pull request? Currently, when casting a string as timestamp type in ANSI mode, Spark throws a runtime exception on parsing error. However, the result for casting a string to date is always null. We should throw an exception on parsing error as well. ### Why are the changes needed? Add missing feature for ANSI mode ### Does this PR introduce _any_ user-facing change? Yes for ANSI mode, Casting string to date will throw an exception on parsing error ### How was this patch tested? Unit test Closes #30687 from gengliangwang/castDate. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-14 10:22:37 +09:00
Takeshi Yamamuro	8197ee3b15	[SPARK-33690][SQL] Escape meta-characters in showString ### What changes were proposed in this pull request? This PR intends to escape meta-characters (e.g., \n and \t) in `Dataset.showString`. Before this PR: ``` scala> Seq("aaa\nbbb\t\tccccc").toDF("value").show() +--------------+ \| value\| +--------------+ \|aaa bbb ccccc\| +--------------+ ``` After this PR: ``` +-----------------+ \| value\| +-----------------+ \|aaa\nbbb\t\tccccc\| +-----------------+ ``` ### Why are the changes needed? For better output. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes #30647 from maropu/EscapeMetaInShow. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-13 15:04:23 -08:00
Gengliang Wang	9959d49942	[SPARK-33719][DOC] Add make_date/make_timestamp/make_interval into the doc of ANSI Compliance ### What changes were proposed in this pull request? Add make_date/make_timestamp/make_interval into the doc of ANSI Compliance ### Why are the changes needed? Users can know that these functions throw runtime exceptions under ANSI mode if the result is not valid. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build doc and check it in browser: ![image](https://user-images.githubusercontent.com/1097932/101608930-34a79e80-39bb-11eb-9294-9d9b8c3f6faa.png) Closes #30683 from gengliangwang/improveDoc. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-09 19:47:20 +09:00
Kent Yao	c88eddac3b	[SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types ### What changes were proposed in this pull request? Add migration guide for CHAR VARCHAR types ### Why are the changes needed? for migration ### Does this PR introduce _any_ user-facing change? doc change ### How was this patch tested? passing ci Closes #30654 from yaooqinn/SPARK-33641-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-12-09 06:44:10 +00:00
Dongjoon Hyun	031c5ef280	[SPARK-33679][SQL] Enable spark.sql.adaptive.enabled by default ### What changes were proposed in this pull request? This PR aims to enable `spark.sql.adaptive.enabled` by default for Apache Spark 3.2.0. ### Why are the changes needed? By switching the default for Apache Spark 3.2, the whole community can focus more on the stabilizing this feature in the various situation more seriously. ### Does this PR introduce _any_ user-facing change? Yes, but this is an improvement and it's supposed to have no bugs. ### How was this patch tested? Pass the CIs. Closes #30628 from dongjoon-hyun/SPARK-33679. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-07 23:10:35 -08:00
Dongjoon Hyun	de9818f043	[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT ### What changes were proposed in this pull request? This PR aims to update `master` branch version to 3.2.0-SNAPSHOT. ### Why are the changes needed? Start to prepare Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? Pass the CIs. Closes #30606 from dongjoon-hyun/SPARK-3.2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-04 14:10:42 -08:00
german	d671e053e9	[SPARK-33660][DOCS][SS] Fix Kafka Headers Documentation ### What changes were proposed in this pull request? Update kafka headers documentation, type is not longer a map but an array [jira](https://issues.apache.org/jira/browse/SPARK-33660) ### Why are the changes needed? To help users ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? It is only documentation Closes #30605 from Gschiavon/SPARK-33660-fix-kafka-headers-documentation. Authored-by: german <germanschiavon@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-12-05 06:51:54 +09:00
HyukjinKwon	990bee9c58	[SPARK-33615][K8S] Make 'spark.archives' working in Kubernates ### What changes were proposed in this pull request? This PR proposes to make `spark.archives` configuration working in Kubernates. It works without a problem in standalone cluster but there seems a bug in Kubernates. It fails to fetch the file on the driver side as below: ``` 20/12/03 13:33:53 INFO SparkContext: Added JAR file:/tmp/spark-75004286-c83a-4369-b624-14c5d2d2a748/spark-examples_2.12-3.1.0-SNAPSHOT.jar at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar with timestamp 1607002432558 20/12/03 13:33:53 INFO SparkContext: Added archive file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz with timestamp 1607002432558 20/12/03 13:33:53 INFO TransportClientFactory: Successfully created connection to spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc/172.17.0.4:7078 after 83 ms (47 ms spent in bootstraps) 20/12/03 13:33:53 INFO Utils: Fetching spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz to /tmp/spark-66573e24-27a3-427c-99f4-36f06d9e9cd5/fetchFileTemp2665785666227461849.tmp 20/12/03 13:33:53 ERROR SparkContext: Error initializing SparkContext. java.lang.RuntimeException: Stream '/files/tmp4542734800151332666.txt.tar.gz' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:242) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) ``` This is because `spark.archives` was not actually added on the driver side correctly. The changes here fix it by adding and resolving URIs correctly. ### Why are the changes needed? `spark.archives` feature can be leveraged for many things such as Conda support. We should make it working in Kubernates as well. This is a bug fix too. ### Does this PR introduce _any_ user-facing change? No, this feature is not out yet. ### How was this patch tested? I manually tested with Minikube 1.15.1. For an environment issue (?), I had to use a custom namespace, service account and roles. `default` service account does not work for me and complains it doesn't have permissions to get/list pods, etc. ```bash minikube delete minikube start --cpus 12 --memory 16384 kubectl create namespace spark-integration-test cat <<EOF \| kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: spark namespace: spark-integration-test EOF kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test dev/make-distribution.sh --pip --tgz -Pkubernetes resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz --service-account spark --namespace spark-integration-test ``` Closes #30581 from HyukjinKwon/SPARK-33615. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-04 19:37:03 +09:00
Gengliang Wang	e8380665c7	[SPARK-33658][SQL] Suggest using Datetime conversion functions for invalid ANSI casting ### What changes were proposed in this pull request? Suggest users using Datetime conversion functions in the error message of invalid ANSI explicit casting. ### Why are the changes needed? In ANSI mode, explicit cast between DateTime types and Numeric types is not allowed. As of now, we have introduced new functions `UNIX_SECONDS`/`UNIX_MILLIS`/`UNIX_MICROS`/`UNIX_DATE`/`DATE_FROM_UNIX_DATE`, we can show suggestions to users so that they can complete these type conversions precisely and easily in ANSI mode. ### Does this PR introduce _any_ user-facing change? Yes, better error messages ### How was this patch tested? Unit test Closes #30603 from gengliangwang/improveErrorMsgOfExplicitCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-04 16:24:41 +09:00
Gengliang Wang	29e415deac	[SPARK-33649][SQL][DOC] Improve the doc of spark.sql.ansi.enabled ### What changes were proposed in this pull request? Improve the documentation of SQL configuration `spark.sql.ansi.enabled` ### Why are the changes needed? As there are more and more new features under the SQL configuration `spark.sql.ansi.enabled`, we should make it more clear about: 1. what exactly it is 2. where can users find all the features of the ANSI mode 3. whether all the features are exactly from the SQL standard ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It's just doc change. Closes #30593 from gengliangwang/reviseAnsiDoc. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-12-04 10:58:41 +08:00
yangjie01	92bfbcb2e3	[SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md ### What changes were proposed in this pull request? SPARK-9767 remove `ConnectionManager` and related files, the configuration `spark.core.connection.ack.wait.timeout` previously used by `ConnectionManager` is no longer used by other Spark code, but it still exists in the `configuration.md`. So this pr cleans up the useless configuration item spark.core.connection.ack.wait.timeout` from `configuration.md`. ### Why are the changes needed? Clean up useless configuration from `configuration.md`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the Jenkins or GitHub Action Closes #30569 from LuciferYang/SPARK-33631. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-02 12:58:41 -08:00
Gabor Somogyi	e5bb2937f6	[SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API ### What changes were proposed in this pull request? Deprecated `KafkaConsumer.poll(long)` API calls may cause infinite wait in the driver. In this PR I've added a new `AdminClient` based offset fetching which is turned off by default. There is a new flag named `spark.sql.streaming.kafka.useDeprecatedOffsetFetching` (default: `true`) which can be set to `false` to reach the newly added functionality. The Structured Streaming migration guide contains more information what migration consideration must be done. Please see the following [doc](https://docs.google.com/document/d/1gAh0pKgZUgyqO2Re3sAy-fdYpe_SxpJ6DkeXE8R1P7E/edit?usp=sharing) for further details. The PR contains the following changes: * Added `AdminClient` based offset fetching * GroupId prefix feature removed from driver but only in `AdminClient` based approach (`AdminClient` doesn't need any GroupId) * GroupId override feature removed from driver but only in `AdminClient` based approach (`AdminClient` doesn't need any GroupId) * Additional unit tests * Code comment changes * Minor bugfixes here and there * Removed Kafka auto topic creation feature but only in `AdminClient` based approach (please see doc for rationale). In short, it's super hidden, not sure anybody ever used in production + error prone. * Added documentation to `ss-migration-guide` and `structured-streaming-kafka-integration` ### Why are the changes needed? Driver may hang forever. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing + additional unit tests. Cluster test with simple Kafka topic to another topic query. Documentation: ``` cd docs/ SKIP_API=1 jekyll build ``` Manual webpage check. Closes #29729 from gaborgsomogyi/SPARK-32032. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-12-01 20:34:00 +09:00
Jungtaek Lim (HeartSaVioR)	52e5cc46bc	[SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files ### What changes were proposed in this pull request? This patch proposes to provide a new option to specify time-to-live (TTL) for output file entries in FileStreamSink. TTL is defined via current timestamp - the last modified time for the file. This patch will filter out outdated output files in metadata while compacting batches (other batches don't have functionality to clean entries), which helps metadata to not grow linearly, as well as filtered out files will be "eventually" no longer seen in reader queries which leverage File(Stream)Source. ### Why are the changes needed? The metadata log greatly helps to easily achieve exactly-once but given the output path is open to arbitrary readers, there's no way to compact the metadata log, which ends up growing the metadata file as query runs for long time, especially for compacted batch. Lots of end users have been reporting the issue: see comments in [SPARK-24295](https://issues.apache.org/jira/browse/SPARK-24295) and [SPARK-29995](https://issues.apache.org/jira/browse/SPARK-29995), and [SPARK-30462](https://issues.apache.org/jira/browse/SPARK-30462). (There're some reports from end users which include their workarounds: SPARK-24295) ### Does this PR introduce any user-facing change? No, as the configuration is new and by default it is not applied. ### How was this patch tested? New UT. Closes #28363 from HeartSaVioR/SPARK-27188-v2. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-12-01 14:42:48 +09:00
HyukjinKwon	1a042cc414	[SPARK-33530][CORE] Support --archives and spark.archives option natively ### What changes were proposed in this pull request? TL;DR: - This PR completes the support of archives in Spark itself instead of Yarn-only - It makes `--archives` option work in other cluster modes too and adds `spark.archives` configuration. - After this PR, PySpark users can leverage Conda to ship Python packages together as below: ```python conda create -y -n pyspark_env -c conda-forge pyarrow==2.0.0 pandas==1.1.4 conda-pack==0.5.0 conda activate pyspark_env conda pack -f -o pyspark_env.tar.gz PYSPARK_DRIVER_PYTHON=python PYSPARK_PYTHON=./environment/bin/python pyspark --archives pyspark_env.tar.gz#environment ``` - Issue a warning that undocumented and hidden behavior of partial archive handling in `spark.files` / `SparkContext.addFile` will be deprecated, and users can use `spark.archives` and `SparkContext.addArchive`. This PR proposes to add Spark's native `--archives` in Spark submit, and `spark.archives` configuration. Currently, both are supported only in Yarn mode: ```bash ./bin/spark-submit --help ``` ``` Options: ... Spark on YARN only: --queue QUEUE_NAME The YARN queue to submit to (Default: "default"). --archives ARCHIVES Comma separated list of archives to be extracted into the working directory of each executor. ``` This `archives` feature is useful often when you have to ship a directory and unpack into executors. One example is native libraries to use e.g. JNI. Another example is to ship Python packages together by Conda environment. Especially for Conda, PySpark currently does not have a nice way to ship a package that works in general, please see also https://hyukjin-spark.readthedocs.io/en/stable/user_guide/python_packaging.html#using-zipped-virtual-environment (PySpark new documentation demo for 3.1.0). The neatest way is arguably to use Conda environment by shipping zipped Conda environment but this is currently dependent on this archive feature. NOTE that we are able to use `spark.files` by relying on its undocumented behaviour that untars `tar.gz` but I don't think we should document such ways and promote people to more rely on it. Also, note that this PR does not target to add the feature parity of `spark.files.overwrite`, `spark.files.useFetchCache`, etc. yet. I documented that this is an experimental feature as well. ### Why are the changes needed? To complete the feature parity, and to provide a better support of shipping Python libraries together with Conda env. ### Does this PR introduce _any_ user-facing change? Yes, this makes `--archives` works in Spark instead of Yarn-only, and adds a new configuration `spark.archives`. ### How was this patch tested? I added unittests. Also, manually tested in standalone cluster, local-cluster, and local modes. Closes #30486 from HyukjinKwon/native-archive. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-01 13:43:02 +09:00
Wenchen Fan	5cfbdddefe	[SPARK-33480][SQL] Support char/varchar type ### What changes were proposed in this pull request? This PR adds the char/varchar type which is kind of a variant of string type: 1. Char type is fixed-length string. When comparing char type values, we need to pad the shorter one to the longer length. 2. Varchar type is string with a length limitation. To implement the char/varchar semantic, this PR: 1. Do string length check when writing to char/varchar type columns. 2. Do string padding when reading char type columns. We don't do it at the writing side to save storage space. 3. Do string padding when comparing char type column with string literal or another char type column. (string literal is fixed length so should be treated as char type as well) To simplify the implementation, this PR doesn't propagate char/varchar type info through functions/operators(e.g. `substring`). That said, a column can only be char/varchar type if it's a table column, not a derived column like `SELECT substring(col)`. To be safe, this PR doesn't add char/varchar type to the query engine(expression input check, internal row framework, codegen framework, etc.). We will replace char/varchar type by string type with metadata (`Attribute.metadata` or `StructField.metadata`) that includes the original type string before it goes into the query engine. That said, the existing code will not see char/varchar type but only string type. char/varchar type may come from several places: 1. v1 table from hive catalog. 2. v2 table from v2 catalog. 3. user-specified schema in `spark.read.schema` and `spark.readStream.schema` 4. `Column.cast` 5. schema string in places like `from_json`, pandas UDF, etc. These places use SQL parser which replaces char/varchar with string already, even before this PR. This PR covers all the above cases, implements the length check and padding feature by looking at string type with special metadata. ### Why are the changes needed? char and varchar are standard SQL types. varchar is widely used in other databases instead of string type. ### Does this PR introduce _any_ user-facing change? For hive tables: now the table insertion fails if the value exceeds char/varchar length. Previously we truncate the value silently. For other tables: 1. now char type is allowed. 2. now we have length check when inserting to varchar columns. Previously we write the value as it is. ### How was this patch tested? new tests Closes #30412 from cloud-fan/char. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-30 09:23:05 +00:00
Josh Soref	485145326a	[MINOR] Spelling bin core docs external mllib repl ### What changes were proposed in this pull request? This PR intends to fix typos in the sub-modules: * `bin` * `core` * `docs` * `external` * `mllib` * `repl` * `pom.xml` Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618 NOTE: The misspellings have been reported at `706a726f87 (commitcomment-44064356)` ### Why are the changes needed? Misspelled words make it harder to read / understand content. ### Does this PR introduce _any_ user-facing change? There are various fixes to documentation, etc... ### How was this patch tested? No testing was performed Closes #30530 from jsoref/spelling-bin-core-docs-external-mllib-repl. Authored-by: Josh Soref <jsoref@users.noreply.github.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-30 13:59:51 +09:00
liucht	3d54774fb9	[SPARK-33517][SQL][DOCS] Fix the correct menu items and page links in PySpark Usage Guide for Pandas with Apache Arrow ### What changes were proposed in this pull request? Change "Apache Arrow in Spark" to "Apache Arrow in PySpark" and the link to “/sql-pyspark-pandas-with-arrow.html#apache-arrow-in-pyspark” ### Why are the changes needed? When I click on the menu item it doesn't point to the correct page, and from the parent menu I can infer that the correct menu item name and link should be "Apache Arrow in PySpark". like this: image ![image](https://user-images.githubusercontent.com/28332082/99954725-2b64e200-2dbe-11eb-9576-cf6a3d758980.png) ### Does this PR introduce any user-facing change? Yes, clicking on the menu item will take you to the correct guide page ### How was this patch tested? Manually build the doc. This can be verified as below: cd docs SKIP_API=1 jekyll build open _site/sql-pyspark-pandas-with-arrow.html Closes #30466 from liucht-inspur/master. Authored-by: liucht <liucht@inspur.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-30 10:03:18 +09:00
Kazuaki Ishizaki	b94ff1e870	[SPARK-33590][DOCS][SQL] Add missing sub-bullets in Spark SQL Guide ### What changes were proposed in this pull request? Add the missing sub-bullets in the left side of `Spark SQL Guide` ### Why are the changes needed? The three sub-bullets in the left side is not consistent with the contents (five bullets) in the right side. ![image](https://user-images.githubusercontent.com/1315079/100546388-7a21e880-32a4-11eb-922d-62a52f4f9f9b.png) ### Does this PR introduce _any_ user-facing change? Yes, you can see more lines in the left menu. ### How was this patch tested? Manually build the doc as follows. This can be verified as attached: ``` cd docs SKIP_API=1 jekyll build firefox _site/sql-pyspark-pandas-with-arrow.html ``` ![image](https://user-images.githubusercontent.com/1315079/100546399-8ad25e80-32a4-11eb-80ac-44af0aebc717.png) Closes #30537 from kiszk/SPARK-33590. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-29 11:24:58 -08:00
luluorta	35ded12fc6	[SPARK-33141][SQL] Capture SQL configs when creating permanent views ### What changes were proposed in this pull request? This PR makes CreateViewCommand/AlterViewAsCommand capturing runtime SQL configs and store them as view properties. These configs will be applied during the parsing and analysis phases of the view resolution. Users can set `spark.sql.legacy.useCurrentConfigsForView` to `true` to restore the behavior before. ### Why are the changes needed? This PR is a sub-task of [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138) that proposes to unify temp view and permanent view behaviors. This PR makes permanent views mimicking the temp view behavior that "fixes" view semantic by directly storing resolved LogicalPlan. For example, if a user uses spark 2.4 to create a view that contains null values from division-by-zero expressions, she may not want that other users' queries which reference her view throw exceptions when running on spark 3.x with ansi mode on. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? added UT + existing UTs (improved) Closes #30289 from luluorta/SPARK-33141. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-27 13:32:25 +00:00
xuewei.linxuewei	b9f2f78de5	[SPARK-33498][SQL] Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid ### What changes were proposed in this pull request? Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid, when ANSI mode is enable. This patch should update GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast. ### Why are the changes needed? For ANSI mode. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT and Existing UT. Closes #30442 from leanken/leanken-SPARK-33498. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-27 13:24:11 +00:00
Gengliang Wang	05921814e2	[SPARK-33479][DOC][FOLLOWUP] DocSearch: Support filtering search results by version ### What changes were proposed in this pull request? In the discussion https://github.com/apache/spark/pull/30292#issuecomment-725613417, we planned to apply a new API key for each Spark release. However, it turns that DocSearch supports crawling multiple URLs from one website and filtering by fact key: https://docsearch.algolia.com/docs/config-file/#using-regular-expressions Thanks to the help from shortcuts, our Spark doc supports multiple version now: https://github.com/algolia/docsearch-configs/pull/2868 This PR is to add the fact key in the search script and update the instruction in the comment. ### Why are the changes needed? To support filtering Spark documentation search results by the current document version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test Closes #30469 from gengliangwang/apiKeyFollowUp. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-24 09:27:44 +09:00
CC Highman	d338af3101	[SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source ### What changes were proposed in this pull request? Two new options, _modifiiedBefore_ and _modifiedAfter_, is provided expecting a value in 'YYYY-MM-DDTHH:mm:ss' format. _PartioningAwareFileIndex_ considers these options during the process of checking for files, just before considering applied _PathFilters_ such as `pathGlobFilter.` In order to filter file results, a new PathFilter class was derived for this purpose. General house-keeping around classes extending PathFilter was performed for neatness. It became apparent support was needed to handle multiple potential path filters. Logic was introduced for this purpose and the associated tests written. ### Why are the changes needed? When loading files from a data source, there can often times be thousands of file within a respective file path. In many cases I've seen, we want to start loading from a folder path and ideally be able to begin loading files having modification dates past a certain point. This would mean out of thousands of potential files, only the ones with modification dates greater than the specified timestamp would be considered. This saves a ton of time automatically and reduces significant complexity managing this in code. ### Does this PR introduce _any_ user-facing change? This PR introduces an option that can be used with batch-based Spark file data sources. A documentation update was made to reflect an example and usage of the new data source option. Example Usages _Load all CSV files modified after date:_ `spark.read.format("csv").option("modifiedAfter","2020-06-15T05:00:00").load()` _Load all CSV files modified before date:_ `spark.read.format("csv").option("modifiedBefore","2020-06-15T05:00:00").load()` _Load all CSV files modified between two dates:_ `spark.read.format("csv").option("modifiedAfter","2019-01-15T05:00:00").option("modifiedBefore","2020-06-15T05:00:00").load() ` ### How was this patch tested? A handful of unit tests were added to support the positive, negative, and edge case code paths. It's also live in a handful of our Databricks dev environments. (quoted from cchighman) Closes #30411 from HeartSaVioR/SPARK-31962. Lead-authored-by: CC Highman <christopher.highman@microsoft.com> Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-11-23 08:30:41 +09:00
ulysses	3384bda453	[SPARK-33468][SQL] ParseUrl in ANSI mode should fail if input string is not a valid url ### What changes were proposed in this pull request? With `ParseUrl`, instead of return null we throw exception if input string is not a vaild url. ### Why are the changes needed? For ANSI mode. ### Does this PR introduce _any_ user-facing change? Yes, user will get exception if `set spark.sql.ansi.enabled=true`. ### How was this patch tested? Add test. Closes #30399 from ulysses-you/SPARK-33468. Lead-authored-by: ulysses <youxiduo@weidian.com> Co-authored-by: ulysses-you <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-20 13:23:08 +00:00
liucht	cbc8be24c8	[SPARK-33422][DOC] Fix the correct display of left menu item ### What changes were proposed in this pull request? Limit the height of the menu area on the left to display vertical scroll bar ### Why are the changes needed? The bottom menu item cannot be displayed when the left menu tree is long ### Does this PR introduce any user-facing change? Yes, if the menu item shows more, you'll see it by pulling down the vertical scroll bar before: ![image](https://user-images.githubusercontent.com/28332082/98805115-16995d80-2452-11eb-933a-3b72c14bea78.png) after: ![image](https://user-images.githubusercontent.com/28332082/98805418-7e4fa880-2452-11eb-9a9b-8d265078297c.png) ### How was this patch tested? NA Closes #30335 from liucht-inspur/master. Authored-by: liucht <liucht@inspur.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-20 22:19:35 +09:00
Gengliang Wang	4267ca98fa	[SPARK-33479][DOC] Make the API Key of DocSearch configurable ### What changes were proposed in this pull request? Make the API key of DocSearch configurable and avoid hardcoding in the HTML template ### Why are the changes needed? After https://github.com/apache/spark/pull/30292, our Spark documentation site supports searching. However, the default API key always points to the latest release doc. We have to set different API keys for different releases. Otherwise, the search results are always based on the latest documentation(https://spark.apache.org/docs/latest/) even when visiting the documentation of previous releases. As per discussion in https://github.com/apache/spark/pull/30292#issuecomment-725613417, we should make the API key configurable and set different values for different releases. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test Closes #30409 from gengliangwang/apiKey. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-19 11:20:18 +09:00
zero323	56a8510e19	[SPARK-33304][R][SQL] Add from_avro and to_avro functions to SparkR ### What changes were proposed in this pull request? Adds `from_avro` and `to_avro` functions to SparkR. ### Why are the changes needed? Feature parity. ### Does this PR introduce _any_ user-facing change? New functions exposed in SparkR API. ### How was this patch tested? New unit tests. Closes #30216 from zero323/SPARK-33304. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-19 09:52:29 +09:00
Gengliang Wang	9a4c79073b	[SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode ### What changes were proposed in this pull request? In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. ![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png) Comparing the ANSI CAST syntax rules with the current default behavior of Spark: ![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png) To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the following casting in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now ``` Numeric <=> Boolean String <=> Binary ``` ### Why are the changes needed? Better ANSI SQL compliance ### Does this PR introduce _any_ user-facing change? Yes, the following castings will not be allowed in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` ### How was this patch tested? Unit test The ANSI Compliance doc preview: ![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png) Closes #30260 from gengliangwang/ansiCanCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-19 09:23:36 +09:00
Dongjoon Hyun	594c7c613a	[SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes ### What changes were proposed in this pull request? This PR aims to generalize executor metrics to support user-given file system schemes instead of the fixed `file,hdfs` scheme. ### Why are the changes needed? For the users using only cloud storages like `S3A`, we need to be able to expose `S3A` metrics. Also, we can skip unused `hdfs` metrics. ### Does this PR introduce _any_ user-facing change? Yes, but compatible for the existing users which uses `hdfs` and `file` filesystem scheme only. ### How was this patch tested? Manually do the following. ``` $ build/sbt -Phadoop-cloud package $ sbin/start-master.sh; sbin/start-slave.sh spark://$(hostname):7077 $ bin/spark-shell --master spark://$(hostname):7077 -c spark.executor.metrics.fileSystemSchemes=file,s3a -c spark.metrics.conf.executor.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink scala> spark.read.textFile("s3a://dongjoon/README.md").collect() ``` Separately, launch `jconsole` and check `.executor.filesystem.s3a.`. Also, confirm that there is no `.executor.filesystem.hdfs.` ``` $ jconsole ``` ![Screen Shot 2020-11-17 at 9 26 03 PM](https://user-images.githubusercontent.com/9700541/99487609-94121180-291b-11eb-9ed2-964546146981.png) Closes #30405 from dongjoon-hyun/SPARK-33476. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-18 08:04:14 -08:00
Pascal Gillet	9ab0f82a59	[SPARK-23499][MESOS] Support for priority queues in Mesos scheduler ### What changes were proposed in this pull request? I push this PR as I could not re-open the stale one https://github.com/apache/spark/pull/20665 . As for Yarn or Kubernetes, Mesos users should be able to specify priority queues to define a workload management policy for queued drivers in the Mesos Cluster Dispatcher. This would ensure scheduling order while enqueuing Spark applications for a Mesos cluster. ### Why are the changes needed? Currently, submitted drivers are kept in order of their submission: the first driver added to the queue will be the first one to be executed (FIFO), regardless of their priority. See https://issues.apache.org/jira/projects/SPARK/issues/SPARK-23499 for more details. ### Does this PR introduce _any_ user-facing change? The MesosClusterDispatcher UI shows now Spark jobs along with the queue to which they are submitted. ### How was this patch tested? Unit tests. Also, this feature has been in production for 3 years now as we use a modified Spark 2.4.0 since then. Closes #30352 from pgillet/mesos-scheduler-priority-queue. Lead-authored-by: Pascal Gillet <pascal.gillet@stack-labs.com> Co-authored-by: pgillet <pascalgillet@ymail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-16 16:54:08 -08:00
xuewei.linxuewei	b5eca18af0	[SPARK-33460][SQL] Accessing map values should fail if key is not found ### What changes were proposed in this pull request? Instead of returning NULL, throws runtime NoSuchElementException towards invalid key accessing in map-like functions, such as element_at, GetMapValue, when ANSI mode is on. ### Why are the changes needed? For ANSI mode. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT and Existing UT. Closes #30386 from leanken/leanken-SPARK-33460. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-16 16:14:31 +00:00
aof00	0933f1c6c2	[SPARK-33451][DOCS] Change to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' in documentation ### What changes were proposed in this pull request? In the 'Optimizing Skew Join' section of the following two pages: 1. [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html) 2. [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html) The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', The former is missing the 'skewJoin'. ### Why are the changes needed? To document the correct name of configuration ### Does this PR introduce _any_ user-facing change? Yes, this is a user-facing doc change. ### How was this patch tested? Jenkins / CI builds in this PR. Closes #30376 from aof00/doc_change. Authored-by: aof00 <x14562573449@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-16 10:32:00 +09:00
Thomas Graves	acfd846753	[SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for Kubernetes ### What changes were proposed in this pull request? This adds support for Stage level scheduling to kubernetes. Kubernetes can support dynamic allocation via the shuffle tracking option which means we can support stage level scheduling by getting new executors. The main changes here are having the k8s cluster manager pass the resource profile id into the executors and then the ExecutorsPodsAllocator has to request executors based on the individual resource profiles. I tried to keep code changes here to a minimum. I specifically choose to leave the ExecutorPodsSnapshot the way it was and construct the resource profile to pod states on the fly, with a fast path when not using other resource profiles, to keep the impact to a minimum. This results in the main changes required are just wrapping the allocation logic in a for loop over each profile. The other main change is in the basic feature step we have to look at the resources in the ResourceProfile to request pods with the correct resources. Much of the other logic like in the executor life cycle manager doesn't need to be resource profile. This also adds support for [SPARK-32661]Spark executors on K8S should request extra memory for off-heap allocations because the stage level scheduling api has support for this and it made sense to make consistent with YARN. This was started with PR https://github.com/apache/spark/pull/29477 but never updated so I just did it here. To do this I moved a few functions around that were now used by both YARN and kubernetes so you will see some changes in Utils. ### Why are the changes needed? Add the feature to Kubernetes based on customer feedback. ### Does this PR introduce _any_ user-facing change? Yes the feature now works with K8s, but not underlying API changes. ### How was this patch tested? Tested manually on kubernetes cluster and with unit tests. Closes #30204 from tgravescs/stagek8sOrigSnapshotsRebase. Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-11-13 16:04:13 -06:00
xuewei.linxuewei	234711a328	Revert "[SPARK-33139][SQL] protect setActionSession and clearActiveSession" ### What changes were proposed in this pull request? In [SPARK-33139] we defined `setActionSession` and `clearActiveSession` as deprecated API, it turns out it is widely used, and after discussion, even if without this PR, it should work with unify view feature, it might only be a risk if user really abuse using these two API. So revert the PR is needed. [SPARK-33139] has two commit, include a follow up. Revert them both. ### Why are the changes needed? Revert. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UT. Closes #30367 from leanken/leanken-revert-SPARK-33139. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-13 13:35:45 +00:00
gengjiaan	f80fe213bd	[SPARK-33166][DOC] Provide Search Function in Spark docs site ### What changes were proposed in this pull request? In the last few releases, our Spark documentation https://spark.apache.org/docs/latest/ becomes richer. It would nice to provide a search function to make our users find contents faster. [DocSearch](https://docsearch.algolia.com/) is entirely free and automated. This PR will use it to provides search function. The screenshots show below: ![overview](https://user-images.githubusercontent.com/8486025/98756802-30d82a80-23c3-11eb-9ca2-73bb20fb54c4.png) ### Why are the changes needed? Let the users of Spark documentation could find the needed information effectively. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? build on my machine and look on brower. Closes #30292 from beliefer/SPARK-33166. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-11-13 16:51:06 +08:00
Liang-Chi Hsieh	2c64b731ae	[SPARK-33259][SS] Disable streaming query with possible correctness issue by default ### What changes were proposed in this pull request? This patch proposes to disable the streaming query with possible correctness issue in chained stateful operators. The behavior can be controlled by a SQL config, so if users understand the risk and still want to run the query, they can disable the check. ### Why are the changes needed? The possible correctness in chained stateful operators in streaming query is not straightforward for users. From users perspective, it will be considered as a Spark bug. It is also possible the worse case, users are not aware of the correctness issue and use wrong results. A better approach should be to disable such queries and let users choose to run the query if they understand there is such risk, instead of implicitly running the query and let users to find out correctness issue by themselves and report this known to Spark community. ### Does this PR introduce _any_ user-facing change? Yes. Streaming query with possible correctness issue will be blocked to run, except for users explicitly disable the SQL config. ### How was this patch tested? Unit test. Closes #30210 from viirya/SPARK-33259. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-12 15:31:57 -08:00
Kent Yao	4335af075a	[MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only ### What changes were proposed in this pull request? Remove "in cluster mode" from the description of `spark.executor.memoryOverhead` ### Why are the changes needed? fix correctness issue in documentaion ### Does this PR introduce _any_ user-facing change? yes, users may not get confused about the description `spark.executor.memoryOverhead` ### How was this patch tested? pass GA doc generation Closes #30311 from yaooqinn/minordoc. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-12 18:53:06 +09:00
xuewei.linxuewei	6d31daeb6a	[SPARK-33386][SQL] Accessing array elements in ElementAt/Elt/GetArrayItem should failed if index is out of bound ### What changes were proposed in this pull request? Instead of returning NULL, throws runtime ArrayIndexOutOfBoundsException when ansiMode is enable for `element_at`，`elt`, `GetArrayItem` functions. ### Why are the changes needed? For ansiMode. ### Does this PR introduce any user-facing change? When `spark.sql.ansi.enabled` = true, Spark will throw `ArrayIndexOutOfBoundsException` if out-of-range index when accessing array elements ### How was this patch tested? Added UT and existing UT. Closes #30297 from leanken/leanken-SPARK-33386. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-11-12 08:50:32 +00:00
Kent Yao	036c11b0d4	[SPARK-33397][YARN][DOC] Fix generating md to html for available-patterns-for-shs-custom-executor-log-url ### What changes were proposed in this pull request? 1. replace `{{}}` with `{{}}` 2. using `<code></code>` in td-tag ### Why are the changes needed? to fix this. ![image](https://user-images.githubusercontent.com/8326978/98544155-8c74bc00-22ce-11eb-8889-8dacb726b762.png) ### Does this PR introduce _any_ user-facing change? yes, you will see the correct online doc with this change ![image](https://user-images.githubusercontent.com/8326978/98545256-2e48d880-22d0-11eb-9dd9-b8cae3df8659.png) ### How was this patch tested? shown as the above pic via jekyll serve. Closes #30298 from yaooqinn/SPARK-33397. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-11-10 10:15:55 +09:00
Chao Sun	1a704793f4	[SPARK-33290][SQL][DOCS][FOLLOW-UP] Update SQL migration guide ### What changes were proposed in this pull request? Update SQL migration guide for SPARK-33290 ### Why are the changes needed? Make the change better documented. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #30256 from sunchao/SPARK-33290-2. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-11-05 10:09:28 -08:00
Sarvesh Dave	e66201b30b	[MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outer join between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). Screenshots 1: Scala/Java/python example (similar issue) _Before:_ <img width="941" alt="Screenshot 2020-11-05 at 12 16 09 AM" src="https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png"> _After:_ <img width="922" alt="Screenshot 2020-11-05 at 12 17 12 AM" src="https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png"> Screenshots 2: R example (Make it consistent with above change) _Before:_ <img width="896" alt="Screenshot 2020-11-05 at 12 19 57 AM" src="https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png"> _After:_ <img width="919" alt="Screenshot 2020-11-05 at 12 20 51 AM" src="https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png"> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave <sarveshdave1@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-11-05 16:22:31 +09:00
Luca Canali	b7fff03973	[SPARK-31711][CORE] Register the executor source with the metrics system when running in local mode ### What changes were proposed in this pull request? This PR proposes to register the executor source with the Spark metrics system when running in local mode. ### Why are the changes needed? The Apache Spark metrics system provides many useful insights on the Spark workload. In particular, the [executor source metrics](https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) provide detailed info, including the number of active tasks, I/O metrics, and several task metrics details. The executor source metrics, contrary to other sources (for example ExecutorMetrics source), is not available when running in local mode. Having executor metrics in local mode can be useful when testing and troubleshooting Spark workloads in a development environment. The metrics can be fed to a dashboard to see the evolution of resource usage and can be used to troubleshoot performance, as [in this example](https://github.com/cerndb/spark-dashboard). Currently users will have to deploy on a cluster to be able to collect executor source metrics, while the possibility of having them in local mode is handy for testing. ### Does this PR introduce _any_ user-facing change? - This PR exposes executor source metrics data when running in local mode. ### How was this patch tested? - Manually tested by running in local mode and inspecting the metrics listed in http://localhost:4040/metrics/json/ - Also added a test in `SourceConfigSuite` Closes #28528 from LucaCanali/metricsWithLocalMode. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-11-04 16:48:55 -06:00
Wenchen Fan	034070a23a	Revert "[SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size" This reverts commit `0c943cd2fb`.	2020-11-04 12:30:38 +08:00
Gengliang Wang	2b6dfa5f7b	[SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url ### What changes were proposed in this pull request? Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services. ### Why are the changes needed? This PR is to take over https://github.com/apache/spark/pull/17455. After changes, Spark allows showing customized prefix URL in all the `href` links of the HTML pages. ### Does this PR introduce _any_ user-facing change? Yes, all the links of UI pages will be contains the value of `spark.ui.reverseProxyUrl` if it is configurated. ### How was this patch tested? New HTML Unit tests in MasterSuite Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #29820 from gengliangwang/revertProxyURL. Lead-authored-by: Gengliang Wang <gengliang.wang@databricks.com> Co-authored-by: Oliver Köth <okoeth@de.ibm.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-11-01 23:57:57 +08:00
Thomas Graves	72ad9dcd5d	[SPARK-32037][CORE] Rename blacklisting feature ### What changes were proposed in this pull request? this PR renames the blacklisting feature. I ended up using "excludeOnFailure" or "excluded" in most cases but there is a mix. I renamed the BlacklistTracker to HealthTracker, but for the TaskSetBlacklist HealthTracker didn't make sense to me since its not the health of the taskset itself but rather tracking the things its excluded on so I renamed it to be TaskSetExcludeList. Everything else I tried to use the context and in most cases excluded made sense. It made more sense to me then blocked since you are basically excluding those executors and nodes from scheduling tasks on them. Then can be unexcluded later after timeouts and such. The configs I changed the name to use excludeOnFailure which I thought explained it. I unfortunately couldn't get rid of some of them because its part of the event listener and history files. To keep backwards compatibility I kept the events and some of the parsing so that the history server would still properly read older history files. It is not forward compatible though - meaning a new application write the "Excluded" events so the older history server won't properly read display them as being blacklisted. A few of the files below are showing up as deleted and recreated even though I did a git mv on them. I'm not sure why. ### Why are the changes needed? get rid of problematic language ### Does this PR introduce _any_ user-facing change? Config name changes but the old configs still work but are deprecated. ### How was this patch tested? updated tests and also manually tested the UI changes and manually tested the history server reading older versions of history files and vice versa. Closes #29906 from tgravescs/SPARK-32037. Lead-authored-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Thomas Graves <tgraves@apache.org> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-10-30 17:16:53 -05:00
angerszhu	0c943cd2fb	[SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size ### What changes were proposed in this pull request? Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size. Since we can't decide whether it's a but and some use need it behavior same as Hive. ### Why are the changes needed? Provides a compatible choice between historical behavior and Hive ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existed UT Closes #30156 from AngersZhuuuu/SPARK-33284. Lead-authored-by: angerszhu <angers.zhu@gmail.com> Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-30 14:11:25 +09:00
Max Gekk	b409025641	[SPARK-33281][SQL] Return SQL schema instead of Catalog string from the `SchemaOfCsv` expression ### What changes were proposed in this pull request? Return schema in SQL format instead of Catalog string from the SchemaOfCsv expression. ### Why are the changes needed? To unify output of the `schema_of_json()` and `schema_of_csv()`. ### Does this PR introduce _any_ user-facing change? Yes, they can but `schema_of_csv()` is usually used in combination with `from_csv()`, so, the format of schema shouldn't be much matter. Before: ``` > SELECT schema_of_csv('1,abc'); struct<_c0:int,_c1:string> ``` After: ``` > SELECT schema_of_csv('1,abc'); STRUCT<`_c0`: INT, `_c1`: STRING> ``` ### How was this patch tested? By existing test suites `CsvFunctionsSuite` and `CsvExpressionsSuite`. Closes #30180 from MaxGekk/schema_of_csv-sql-schema. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-29 21:02:10 +09:00
Max Gekk	9d5e48ea95	[SPARK-33270][SQL] Return SQL schema instead of Catalog string from the `SchemaOfJson` expression ### What changes were proposed in this pull request? Return schema in SQL format instead of Catalog string from the `SchemaOfJson` expression. ### Why are the changes needed? In some cases, `from_json()` cannot parse schemas returned by `schema_of_json`, for instance, when JSON fields have spaces (gaps). Such fields will be quoted after the changes, and can be parsed by `from_json()`. Here is the example: ```scala val in = Seq("""{"a b": 1}""").toDS() in.select(from_json('value, schema_of_json("""{"a b": 100}""")) as "parsed") ``` raises the exception: ``` == SQL == struct<a b:bigint> ------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableSchema(ParseDriver.scala:76) at org.apache.spark.sql.types.DataType$.fromDDL(DataType.scala:131) at org.apache.spark.sql.catalyst.expressions.ExprUtils$.evalTypeExpr(ExprUtils.scala:33) at org.apache.spark.sql.catalyst.expressions.JsonToStructs.<init>(jsonExpressions.scala:537) at org.apache.spark.sql.functions$.from_json(functions.scala:4141) ``` ### Does this PR introduce _any_ user-facing change? Yes. For example, `schema_of_json` for the input `{"col":0}`. Before: `struct<col:bigint>` After: `STRUCT<`col`: BIGINT>` ### How was this patch tested? By existing test suites `JsonFunctionsSuite` and `JsonExpressionsSuite`. Closes #30172 from MaxGekk/schema_of_json-sql-schema. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-29 10:30:41 +09:00
Takeshi Yamamuro	c2bea045e3	[SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents ### What changes were proposed in this pull request? This PR intends to add a dedicated page for SQL-on-file in SQL documents. This comes from the comment: https://github.com/apache/spark/pull/30095/files#r508965149 ### Why are the changes needed? For better documentations. ### Does this PR introduce _any_ user-facing change? <img width="544" alt="Screen Shot 2020-10-28 at 9 56 59" src="https://user-images.githubusercontent.com/692303/97378051-c1fbcb80-1904-11eb-86c0-a88c5269d41c.png"> ### How was this patch tested? N/A Closes #30165 from maropu/DocForFile. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-28 11:21:35 +09:00
Stuart White	7d11d972c3	[SPARK-33246][SQL][DOCS] Correct documentation for null semantics of "NULL AND False" ### What changes were proposed in this pull request? The documentation of the Spark SQL null semantics states that "NULL AND False" yields NULL. This is incorrect. "NULL AND False" yields False. ``` Seq[(java.lang.Boolean, java.lang.Boolean)]( (null, false) ) .toDF("left_operand", "right_operand") .withColumn("AND", 'left_operand && 'right_operand) .show(truncate = false) +------------+-------------+-----+ \|left_operand\|right_operand\|AND \| +------------+-------------+-----+ \|null \|false \|false\| +------------+-------------+-----+ ``` I propose the documentation be updated to reflect that "NULL AND False" yields False. This contribution is my original work and I license it to the project under the project’s open source license. ### Why are the changes needed? This change improves the accuracy of the documentation. ### Does this PR introduce _any_ user-facing change? Yes. This PR introduces a fix to the documentation. ### How was this patch tested? Since this is only a documentation change, no tests were added. Closes #30161 from stwhit/SPARK-33246. Authored-by: Stuart White <stuart@spotright.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-28 08:36:14 +09:00
HyukjinKwon	9818f079aa	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency ### What changes were proposed in this pull request? This PR proposes to initiate the migration to NumPy documentation style (from reST style) in PySpark docstrings. This PR also adds one migration example of `SparkContext`. - Before: ... ![Screen Shot 2020-10-26 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/97161090-a8ea0200-17c0-11eb-8204-0e70d18fc571.png) ... ![Screen Shot 2020-10-26 at 7 02 09 PM](https://user-images.githubusercontent.com/6477701/97161100-aab3c580-17c0-11eb-92ad-f5ad4441ce16.png) ... - After: ... ![Screen Shot 2020-10-26 at 7 24 08 PM](https://user-images.githubusercontent.com/6477701/97161219-d636b000-17c0-11eb-80ab-d17a570ecb4b.png) ... See also https://numpydoc.readthedocs.io/en/latest/format.html ### Why are the changes needed? There are many reasons for switching to NumPy documentation style. 1. Arguably reST style doesn't fit well when the docstring grows large because it provides (arguably) less structures and syntax. 2. NumPy documentation style provides a better human readable docstring format. For example, notebook users often just do `help(...)` by `pydoc`. 3. NumPy documentation style is pretty commonly used in data science libraries, for example, pandas, numpy, Dask, Koalas, matplotlib, ... Using NumPy documentation style can give users a consistent documentation style. ### Does this PR introduce _any_ user-facing change? The dependency itself doesn't change anything user-facing. The documentation change in `SparkContext` does, as shown above. ### How was this patch tested? Manually tested via running `cd python` and `make clean html`. Closes #30149 from HyukjinKwon/SPARK-33243. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-27 14:03:57 +09:00
Shiqi Sun	f659527727	[SPARK-30821][K8S] Handle executor failure with multiple containers Handle executor failure with multiple containers Added a spark property spark.kubernetes.executor.checkAllContainers, with default being false. When it's true, the executor snapshot will take all containers in the executor into consideration when deciding whether the executor is in "Running" state, if the pod restart policy is "Never". Also, added the new spark property to the doc. ### What changes were proposed in this pull request? Checking of all containers in the executor pod when reporting executor status, if the `spark.kubernetes.executor.checkAllContainers` property is set to true. ### Why are the changes needed? Currently, a pod remains "running" as long as there is at least one running container. This prevents Spark from noticing when a container has failed in an executor pod with multiple containers. With this change, user can configure the behavior to be different. Namely, if any container in the executor pod has failed, either the executor process or one of its sidecars, the pod is considered to be failed, and it will be rescheduled. ### Does this PR introduce _any_ user-facing change? Yes, new spark property added. User is now able to choose whether to turn on this feature using the `spark.kubernetes.executor.checkAllContainers` property. ### How was this patch tested? Unit test was added and all passed. I tried to run integration test by following the instruction [here](https://spark.apache.org/developer-tools.html) (section "Testing K8S") and also [here](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md), but I wasn't able to run it smoothly as it fails to talk with minikube cluster. Maybe it's because my minikube version is too new (I'm using v1.13.1)...? Since I've been trying it for two days and still can't make it work, I decided to submit this PR and hopefully the Jenkins test will pass. Closes #29924 from huskysun/exec-sidecar-failure. Authored-by: Shiqi Sun <s.sun@salesforce.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-10-24 09:55:57 -07:00
Max Gekk	ba13b94f6b	[SPARK-33210][SQL] Set the rebasing mode for parquet INT96 type to `EXCEPTION` by default ### What changes were proposed in this pull request? 1. Set the default value for the SQL configs `spark.sql.legacy.parquet.int96RebaseModeInWrite` and `spark.sql.legacy.parquet.int96RebaseModeInRead` to `EXCEPTION`. 2. Update the SQL migration guide. ### Why are the changes needed? Current default value `LEGACY` may lead to shifting timestamps in read or in write. We should leave the decision about rebasing to users. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? By existing test suites like `ParquetIOSuite`. Closes #30121 from MaxGekk/int96-exception-by-default. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-10-22 03:04:29 +00:00
Kent Yao	dcb0820433	[SPARK-32785][SQL][DOCS][FOLLOWUP] Update migaration guide for incomplete interval literals ### What changes were proposed in this pull request? Address comments https://github.com/apache/spark/pull/29635#discussion_r507241899 to improve migration guide ### Why are the changes needed? improve migration guide ### Does this PR introduce _any_ user-facing change? NO，only doc update ### How was this patch tested? passing GitHub action Closes #30113 from yaooqinn/SPARK-32785-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-21 15:51:16 +09:00
Keiji Yoshida	46ad325e56	[MINOR][DOCS] Fix the description about to_avro and from_avro functions ### What changes were proposed in this pull request? This pull request changes the description about `to_avro` and `from_avro` functions to include Python as a supported language as the functions have been supported in Python since Apache Spark 3.0.0 [[SPARK-26856](https://issues.apache.org/jira/browse/SPARK-26856)]. ### Why are the changes needed? Same as above. ### Does this PR introduce _any_ user-facing change? Yes. The description changed by this pull request is on https://spark.apache.org/docs/latest/sql-data-sources-avro.html#to_avro-and-from_avro. ### How was this patch tested? Tested manually by building and checking the document in the local environment. Closes #30105 from kjmrknsn/fix-docs-sql-data-sources-avro. Authored-by: Keiji Yoshida <kjmrknsn@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-21 00:36:45 +09:00
liaoaoyuan97	f65a24412b	[SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference ### What changes were proposed in this pull request? Add the link to the feature: "Run SQL on files directly" to SQL reference documentation page ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce _any_ user-facing change? yes. Previously, reading in sql from file directly is not included in the documentation: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html, not listed in from_items. The new link is added to the select statement documentation, like the below: ![image](https://user-images.githubusercontent.com/16770242/96517999-c34f3900-121e-11eb-8d56-c4ba0432855e.png) ![image](https://user-images.githubusercontent.com/16770242/96518808-8126f700-1220-11eb-8c98-fb398eee0330.png) ### How was this patch tested? Manually built and tested Closes #30095 from liaoaoyuan97/master. Authored-by: liaoaoyuan97 <al3468@columbia.edu> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-20 10:23:58 +09:00
Keiji Yoshida	d2f328aba6	[MINOR][DOCS] Fix the link to the pickle module page in RDD Programming Guide ### What changes were proposed in this pull request? This pull request changes the link to the pickle module page from https://docs.python.org/2/library/pickle.html to https://docs.python.org/3/library/pickle.html in RDD Programming Guide. ### Why are the changes needed? Since Python 2 is no longer supported and it is preferable to refer to the pickle module page of Python 3. ### Does this PR introduce _any_ user-facing change? Yes. Before: the `Pickle` link's destination page was https://docs.python.org/2/library/pickle.html After: the `Pickle` link's destination page is https://docs.python.org/3/library/pickle.html ### How was this patch tested? By building the documentation site and check the link's destination page is changed correctly in the local environment. Closes #30081 from kjmrknsn/docs-fix-pickle-link. Authored-by: Keiji Yoshida <kjmrknsn@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-18 17:13:55 +09:00
Liang-Chi Hsieh	2c4599db4b	[MINOR][SS][DOCS] Update Structured Streaming guide doc and update code typo ### What changes were proposed in this pull request? This is a minor change to update structured-streaming-programming-guide and typos in code. ### Why are the changes needed? Keep the user-facing document correct and updated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. Closes #30074 from viirya/ss-minor. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-16 22:18:12 -07:00
xuewei.linxuewei	306872eefa	[SPARK-33139][SQL] protect setActionSession and clearActiveSession ### What changes were proposed in this pull request? This PR is a sub-task of [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138). In order to make SQLConf.get reliable and stable, we need to make sure user can't pollute the SQLConf and SparkSession Context via calling setActiveSession and clearActiveSession. Change of the PR: * add legacy config spark.sql.legacy.allowModifyActiveSession to fallback to old behavior if user do need to call these two API. * by default, if user call these two API, it will throw exception * add extra two internal and private API setActiveSessionInternal and clearActiveSessionInternal for current internal usage * change all internal reference to new internal API except for SQLContext.setActive and SQLContext.clearActive ### Why are the changes needed? Make SQLConf.get reliable and stable. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? * Add UT in SparkSessionBuilderSuite to test the legacy config * Existing test Closes #30042 from leanken/leanken-SPARK-33139. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-10-16 06:05:17 +00:00
Dongjoon Hyun	8e7c39089f	[SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' ### What changes were proposed in this pull request? This PR makes `spark.kubernetes.pyspark.pythonVersion` allow only `3`. In other words, it will reject `2` for `Python 2`. - [x] Configuration description and check is updated. - [x] Documentation is updated - [x] Unit test cases are updated. - [x] Docker image script is updated. ### Why are the changes needed? After SPARK-32138, Apache Spark 3.1 dropped Python 2 support. ### Does this PR introduce _any_ user-facing change? Yes, but Python 2 support is already dropped officially. ### How was this patch tested? Pass the CI. Closes #30049 from dongjoon-hyun/SPARK-DROP-PYTHON2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-15 01:51:01 -07:00
xuewei.linxuewei	dc697a8b59	[SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero ### What changes were proposed in this pull request? As [SPARK-13860](https://issues.apache.org/jira/browse/SPARK-13860) stated, TPCDS Query 39 returns wrong results using SparkSQL. The root cause is that when stddev_samp is applied to a single element set, with TPCDS answer, it return null; as in SparkSQL, it return Double.NaN which caused the wrong result. Add an extra legacy config to fallback into the NaN logical, and return null by default to align with TPCDS standard. ### Why are the changes needed? SQL correctness issue. ### Does this PR introduce any user-facing change? Yes. See sql-migration-guide In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`. ### How was this patch tested? Updated DataFrameAggregateSuite/DataFrameWindowFunctionsSuite to test both default and legacy behavior. Adjust DataFrameWindowFunctionsSuite/SQLQueryTestSuite and some R case to update to the default return null behavior. Closes #29983 from leanken/leanken-SPARK-13860. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-10-13 13:21:45 +00:00
manubatham20	4a47b3e110	[DOC][MINOR] pySpark usage - removed repeated keyword causing confusion ### What changes were proposed in this pull request? While explaining pySpark usage, use of repeated synonymous words were causing confusion. Removed "instead of a JAR" word, to keep it more readable. ### Why are the changes needed? To keep the docs more readable and easy to understand. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No code changes, minor documentation change only. No tests added. Closes #29956 from manubatham20/patch-1. Authored-by: manubatham20 <manubatham2006@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-10-08 07:52:00 -05:00
Dongjoon Hyun	008a2ad1f8	[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) ### What changes were proposed in this pull request? As of today, - SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository. - SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions. This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0. ``` <hive.group>org.spark-project.hive</hive.group> <hive.version>1.2.1.spark2</hive.version> ``` For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it. ### Why are the changes needed? - First, Apache Spark community should not use the unofficial forked release of another Apache project. - Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far. ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`. ### How was this patch tested? 1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366) 2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382) 3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.) 4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected) Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-05 15:29:56 -07:00
Kousuke Saruta	005999721f	[SPARK-33046][DOCS] Update how to build doc for Scala 2.13 with sbt ### What changes were proposed in this pull request? This PR fixes the description how to build Spark for Scala 2.13 with sbt. In the current doc, how to build Spark for Scala 2.13 with sbt is described like: ![scala-2 13-build-before](https://user-images.githubusercontent.com/4736016/94816248-80c3e900-0436-11eb-9bc2-99af5786971a.png) But build fails with this command because scala-2.13 profile is not enabled and scala-parallel-collections is absent. ``` [error] /home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala:23: object parallel is not a member of package collection ``` The correct command should be: ``` build/sbt -Pspark-2.13 compile ``` ### Why are the changes needed? The build command is wrong. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I checked that `sbt -Pspark-2.13` is correct with the following command: ``` build/sbt -Dscala.version=2.13.3 -Phive -Phive-thriftserver -Pyarn -Pkubernetes compile ``` I also build the modified doc and checked the generated html: ![spark-scala-2 13-build-doc-after](https://user-images.githubusercontent.com/4736016/94869259-f2745500-047f-11eb-89e5-20816f3ed24d.png) Closes #29921 from sarutak/fix-scala-2.13-build-doc. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-10-01 18:01:23 -05:00
iRakson	d3dbe1a907	[SQL][DOC][MINOR] Corrects input table names in the examples of CREATE FUNCTION doc ### What changes were proposed in this pull request? Fix Typo ### Why are the changes needed? To maintain consistency. Correct table name should be used for SELECT command. ### Does this PR introduce _any_ user-facing change? Yes. Now CREATE FUNCTION doc will show the correct name of table. ### How was this patch tested? Manually. Doc changes. Closes #29920 from iRakson/fixTypo. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-01 20:50:16 +09:00
Peter Toth	28ed3a512a	[SPARK-32723][WEBUI] Upgrade to jQuery 3.5.1 ### What changes were proposed in this pull request? Upgrade to the latest available version of jQuery (3.5.1). ### Why are the changes needed? There are some CVE-s reported (CVE-2020-11022, CVE-2020-11023) affecting older versions of jQuery. Although Spark UI is read-only and those CVEs doesn't seem to affect Spark, using the latest version of this library can help to handle vulnerability reports of security scans. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual tests and checked the jQuery 3.5 upgrade guide. Closes #29902 from peter-toth/SPARK-32723-upgrade-to-jquery-3.5.1. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-30 21:30:17 -07:00
GuoPhilipse	3bdbb5546d	[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs ### What changes were proposed in this pull request? update sql-ref docs, the following key words will be added in this PR. CLUSTERED BY SORTED BY INTO num_buckets BUCKETS ### Why are the changes needed? let more users know the sql key words usage ### Does this PR introduce _any_ user-facing change? No ![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png) ![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png) ![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png) ### How was this patch tested? generate html test Closes #29883 from GuoPhilipse/add-sql-missing-keywords. Lead-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Co-authored-by: GuoPhilipse <guofei_ok@126.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-10-01 08:15:53 +09:00
Dongjoon Hyun	ece8d8e22c	[SPARK-33006][K8S][DOCS] Add dynamic PVC usage example into K8s doc ### What changes were proposed in this pull request? This updates K8s document to describe new dynamic PVC features. ### Why are the changes needed? This will help the user use the new features easily. ### Does this PR introduce _any_ user-facing change? Yes, but it's a doc updates. ### How was this patch tested? Manual. <img width="847" alt="Screen Shot 2020-09-28 at 3 54 53 PM" src="https://user-images.githubusercontent.com/9700541/94494923-3ed04400-01a5-11eb-81f9-127db42d4256.png"> <img width="779" alt="Screen Shot 2020-09-28 at 3 55 07 PM" src="https://user-images.githubusercontent.com/9700541/94494930-4394f800-01a5-11eb-9387-50ebc14af477.png"> Closes #29897 from dongjoon-hyun/SPARK-33006. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-30 09:27:57 -07:00
Dongjoon Hyun	cc06266ade	[SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default ### What changes were proposed in this pull request? Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of having a warning documentation, this PR aims to use a consistent and safer version of Apache Hadoop file output committer algorithm which is `v1`. This will prevent a silent correctness regression during migration from Apache Spark 2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided configuration, `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be used still. ### Why are the changes needed? Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` to `v2` and now there exists a discussion to remove `v2`. We had better provide a consistent default behavior of `v1` across various Spark distributions. - [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default ### Does this PR introduce _any_ user-facing change? Yes. This changes the default behavior. Users can override this conf. ### How was this patch tested? Manual. BEFORE (spark-3.0.1-bin-hadoop3.2) ```scala scala> sc.version res0: String = 3.0.1 scala> sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version") res1: String = 2 ``` AFTER ```scala scala> sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version") res0: String = 1 ``` Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-29 12:02:45 -07:00
Kousuke Saruta	790d9ef2d3	[SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link ### What changes were proposed in this pull request? This PR fixes an link in `_layouts/global.html`. The item `More` in the navigation bar in the WebUI links to `api.html` but it seems to be wrong. This PR also removes `api.md` because it and `api.html` generated from it are not referred from anywhere. ### Why are the changes needed? Fix the wrong link. ### Does this PR introduce _any_ user-facing change? Yes. "More" item no longer links to `api.html`. ### How was this patch tested? `SKIP_API=1 jekyll build` and confirmed that the item no longer links to `api.html`. I also confirmed `api.md` and `api.html` are no longer referred from anywhere by the following command. ``` $ grep -Erl "api\.(html\|md)" docs ``` Closes #29821 from sarutak/fix-api-doc-link. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-09-22 14:46:27 +09:00
itholic	9c653c957f	[SPARK-32189][DOCS][PYTHON] Development - Setting up IDEs ### What changes were proposed in this pull request? This PR proposes to document the way of setting up IDEs ![스크린샷 2020-09-21 오전 10 43 12](https://user-images.githubusercontent.com/44108233/93727715-5c2a6e80-fbf7-11ea-821b-555723b00bc8.png) ![스크린샷 2020-09-21 오전 10 43 45](https://user-images.githubusercontent.com/44108233/93727716-5f255f00-fbf7-11ea-9c6c-7b8a973bc511.png) ### Why are the changes needed? To let users know how to setup IDEs ### Does this PR introduce _any_ user-facing change? Yes, it adds a new page in the documentation about setting IDEs. ### How was this patch tested? Manually built the doc. Closes #29781 from itholic/SPARK-32189. Authored-by: itholic <haejoon309@naver.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-09-21 12:29:17 +09:00
Udbhav30	88e87bc8eb	[SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-17 09:25:17 -07:00
bowen.li	0549c20c6f	[SPARK-32865][DOC] python section in quickstart page doesn't display SPARK_VERSION correctly ### What changes were proposed in this pull request? In https://github.com/apache/spark/blame/master/docs/quick-start.md#L402,it should be `{{site.SPARK_VERSION}}` rather than `{site.SPARK_VERSION}` ### Why are the changes needed? SPARK_VERSION isn't displayed correctly, as shown below ![image](https://user-images.githubusercontent.com/1892692/93006726-d03c8680-f514-11ea-85e3-1d7cfb682ef2.png) ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tested locally, as shown below ![image](https://user-images.githubusercontent.com/1892692/93006712-a6835f80-f514-11ea-8d78-6831c9d65265.png) Closes #29738 from bowenli86/doc. Authored-by: bowen.li <bowenli86@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-09-12 21:45:55 -07:00
Jungtaek Lim (HeartSaVioR)	8f61005723	[SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset ### What changes were proposed in this pull request? This patch proposes to update the doc (both SS guide doc and Dataset dropDuplicates method doc) to leave a note to check on using SQL statements with streaming Dataset. Once end users create a temp view based on streaming Dataset, they won't bother with thinking about "streaming" and do whatever they do with batch query. In many cases it works, but not just smoothly for the case when streaming aggregation is involved. They still need to concern about maintaining state store. ### Why are the changes needed? Although SPARK-32456 fixed the weird error message, as a side effect some operations are enabled on streaming workload via SQL statement, which is error-prone if end users don't indicate what they're doing. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Only doc change. Closes #29461 from HeartSaVioR/SPARK-32456-FOLLOWUP-DOC. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-09-10 08:10:32 +00:00
HyukjinKwon	c336ae39cd	[SPARK-32186][DOCS][PYTHON] Development - Debugging ### What changes were proposed in this pull request? This PR proposes to document the way of debugging PySpark. It's pretty much self-descriptive. I made a demo site to review it more effectively: https://hyukjin-spark.readthedocs.io/en/stable/development/debugging.html ### Why are the changes needed? To let users know how to debug PySpark applications. ### Does this PR introduce _any_ user-facing change? Yes, it adds a new page in the documentation about debugging PySpark. ### How was this patch tested? Manually built the doc. Closes #29639 from HyukjinKwon/SPARK-32186. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-09-08 10:32:22 +09:00
Kent Yao	de44e9cfa0	[SPARK-32785][SQL] Interval with dangling parts should not results null ### What changes were proposed in this pull request? bugfix for incomplete interval values, e.g. interval '1', interval '1 day 2', currently these cases will result null, but actually we should fail them with IllegalArgumentsException ### Why are the changes needed? correctness ### Does this PR introduce _any_ user-facing change? yes, incomplete intervals will throw exception now #### before ``` bin/spark-sql -S -e "select interval '1', interval '+', interval '1 day -'" NULL NULL NULL ``` #### after ``` -- !query select interval '1' -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException Cannot parse the INTERVAL value: 1(line 1, pos 7) == SQL == select interval '1' ``` ### How was this patch tested? unit tests added Closes #29635 from yaooqinn/SPARK-32785. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-09-07 05:11:30 +00:00
Wenchen Fan	ccc0250a08	[SPARK-32718][SQL] Remove unnecessary keywords for interval units ### What changes were proposed in this pull request? Remove the YEAR, MONTH, DAY, HOUR, MINUTE, SECOND keywords. They are not useful in the parser, as we need to support plural like YEARS, so the parser has to accept the general identifier as interval unit anyway. ### Why are the changes needed? These keywords are reserved in ANSI. If Spark has these keywords, then they become reserved under ANSI mode. This makes Spark not able to run TPCDS queries as they use YEAR as alias name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added `TPCDSQueryANSISuite`, to make sure Spark with ANSI mode can run TPCDS queries. Closes #29560 from cloud-fan/keyword. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-29 14:06:01 -07:00
HyukjinKwon	c154629171	[SPARK-32183][DOCS][PYTHON] User Guide - PySpark Usage Guide for Pandas with Apache Arrow ### What changes were proposed in this pull request? This PR proposes to move Arrow usage guide from Spark documentation site to PySpark documentation site (at "User Guide"). Here is the demo for reviewing quicker: https://hyukjin-spark.readthedocs.io/en/stable/user_guide/arrow_pandas.html ### Why are the changes needed? To have a single place for PySpark users, and better documentation. ### Does this PR introduce _any_ user-facing change? Yes, it will move https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html to our PySpark documentation. ### How was this patch tested? ```bash cd docs SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch ``` and ```bash cd python/docs make clean html ``` Closes #29548 from HyukjinKwon/SPARK-32183. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-28 15:09:06 +09:00
waleedfateem	8749b2b6fa	[SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem <waleed.fateem@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-08-27 09:05:50 -05:00
Dale Clarke	ed51a7f083	[SPARK-30654] Bootstrap4 docs upgrade ### What changes were proposed in this pull request? We are using an older version of Bootstrap (v. 2.1.0) for the online documentation site. Bootstrap 2.x was moved to EOL in Aug 2013 and Bootstrap 3.x was moved to EOL in July 2019 (https://github.com/twbs/release). Older versions of Bootstrap are also getting flagged in security scans for various CVEs: https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-72889 https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-173700 https://snyk.io/vuln/npm:bootstrap:20180529 https://snyk.io/vuln/npm:bootstrap:20160627 I haven't validated each CVE, but it would probably be good practice to resolve any potential issues and get on a supported release. The bad news is that there have been quite a few changes between Bootstrap 2 and Bootstrap 4. I've tried updating the library, refactoring/tweaking the CSS and JS to maintain a similar appearance and functionality, and testing the documentation. This is a fairly large change so I'm sure additional testing and fixes will be needed. ### How was this patch tested? This has been manually tested, but as there is a lot of documentation it is possible issues were missed. Additional testing and feedback is welcomed. If it appears a whole section was missed let me know and I'll take a pass at addressing that section. Closes #27369 from clarkead/bootstrap4-docs-upgrade. Authored-by: Dale Clarke <a.dale.clarke@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-08-27 09:03:39 -05:00
Terry Kim	baaa756dee	[SPARK-32516][SQL][FOLLOWUP] 'path' option cannot coexist with path parameter for DataFrameWriter.save(), DataStreamReader.load() and DataStreamWriter.start() ### What changes were proposed in this pull request? This is a follow up PR to #29328 to apply the same constraint where `path` option cannot coexist with path parameter to `DataFrameWriter.save()`, `DataStreamReader.load()` and `DataStreamWriter.start()`. ### Why are the changes needed? The current behavior silently overwrites the `path` option if path parameter is passed to `DataFrameWriter.save()`, `DataStreamReader.load()` and `DataStreamWriter.start()`. For example, ``` Seq(1).toDF.write.option("path", "/tmp/path1").parquet("/tmp/path2") ``` will write the result to `/tmp/path2`. ### Does this PR introduce _any_ user-facing change? Yes, if `path` option coexists with path parameter to any of the above methods, it will throw `AnalysisException`: ``` scala> Seq(1).toDF.write.option("path", "/tmp/path1").parquet("/tmp/path2") org.apache.spark.sql.AnalysisException: There is a 'path' option set and save() is called with a path parameter. Either remove the path option, or call save() without the parameter. To ignore this check, set 'spark.sql.legacy.pathOptionBehavior.enabled' to 'true'.; ``` The user can restore the previous behavior by setting `spark.sql.legacy.pathOptionBehavior.enabled` to `true`. ### How was this patch tested? Added new tests. Closes #29543 from imback82/path_option. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-27 06:21:04 +00:00
HyukjinKwon	b54103016a	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation ### What changes were proposed in this pull request? This PR proposes to: - add a notebook with a Binder integration which allows users to try PySpark in a live notebook. Please [try this here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - reuse this notebook as a quickstart guide in PySpark documentation. Note that Binder turns a Git repo into a collection of interactive notebooks. It works based on Docker image. Once somebody builds, other people can reuse the image against a specific commit. Therefore, if we run Binder with the images based on released tags in Spark, virtually all users can instantly launch the Jupyter notebooks. <br/> I made a simple demo to make it easier to review. Please see: - [Main page](https://hyukjin-spark.readthedocs.io/en/stable/). Note that the link ("Live Notebook") in the main page wouldn't work since this PR is not merged yet. - [Quickstart page](https://hyukjin-spark.readthedocs.io/en/stable/getting_started/quickstart.html) <br/> When reviewing the notebook file itself, please give my direct feedback which I will appreciate and address. Another way might be: - open [here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - edit / change / update the notebook. Please feel free to change as whatever you want. I can apply as are or slightly update more when I apply to this PR. - download it as a `.ipynb` file: ![Screen Shot 2020-08-20 at 10 12 19 PM](https://user-images.githubusercontent.com/6477701/90774311-3e38c800-e332-11ea-8476-699a653984db.png) - upload the `.ipynb` file here in a GitHub comment. Then, I will push a commit with that file with crediting correctly, of course. - alternatively, push a commit into this PR right away if that's easier for you (if you're a committer). References: - https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html - https://databricks.com/jp/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html - my own blog post .. :-) and https://koalas.readthedocs.io/en/latest/getting_started/10min.html ### Why are the changes needed? To improve PySpark's usability. The current quickstart for Python users are very friendly. ### Does this PR introduce _any_ user-facing change? Yes, it will add a documentation page, and expose a live notebook to PySpark users. ### How was this patch tested? Manually tested, and GitHub Actions builds will test. Closes #29491 from HyukjinKwon/SPARK-32204. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-26 12:23:24 +09:00
Kent Yao	1f3bb51757	[SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F ### What changes were proposed in this pull request? This PR fixes the doc error and add a migration guide for datetime pattern. ### Why are the changes needed? This is a bug of the doc that we inherited from JDK https://bugs.openjdk.java.net/browse/JDK-8169482 The SimpleDateFormatter(F Day of week in month) we used in 2.x and the DatetimeFormatter(F week-of-month) we use now both have the opposite meanings to what they declared in the java docs. And unfortunately, this also leads to silent data change in Spark too. The `week-of-month` is actually the pattern `W` in DatetimeFormatter, which is banned to use in Spark 3.x. If we want to keep pattern `F`, we need to accept the behavior change with proper migration guide and fix the doc in Spark ### Does this PR introduce _any_ user-facing change? Yes, doc changed ### How was this patch tested? passing ci doc generating job Closes #29538 from yaooqinn/SPARK-32683. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-25 13:17:03 +00:00
Terry Kim	e3a88a9767	[SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters ### What changes were proposed in this pull request? This PR proposes to make the behavior consistent for the `path` option when loading dataframes with a single path (e.g, `option("path", path).format("parquet").load(path)` vs. `option("path", path).parquet(path)`) by disallowing `path` option to coexist with `load`'s path parameters. ### Why are the changes needed? The current behavior is inconsistent: ```scala scala> Seq(1).toDF.write.mode("overwrite").parquet("/tmp/test") scala> spark.read.option("path", "/tmp/test").format("parquet").load("/tmp/test").show +-----+ \|value\| +-----+ \| 1\| +-----+ scala> spark.read.option("path", "/tmp/test").parquet("/tmp/test").show +-----+ \|value\| +-----+ \| 1\| \| 1\| +-----+ ``` ### Does this PR introduce _any_ user-facing change? Yes, now if the `path` option is specified along with `load`'s path parameters, it would fail: ```scala scala> Seq(1).toDF.write.mode("overwrite").parquet("/tmp/test") scala> spark.read.option("path", "/tmp/test").format("parquet").load("/tmp/test").show org.apache.spark.sql.AnalysisException: There is a path option set and load() is called with path parameters. Either remove the path option or move it into the load() parameters.; at org.apache.spark.sql.DataFrameReader.verifyPathOptionDoesNotExist(DataFrameReader.scala:310) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232) ... 47 elided scala> spark.read.option("path", "/tmp/test").parquet("/tmp/test").show org.apache.spark.sql.AnalysisException: There is a path option set and load() is called with path parameters. Either remove the path option or move it into the load() parameters.; at org.apache.spark.sql.DataFrameReader.verifyPathOptionDoesNotExist(DataFrameReader.scala:310) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:250) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:778) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:756) ... 47 elided ``` The user can restore the previous behavior by setting `spark.sql.legacy.pathOptionBehavior.enabled` to `true`. ### How was this patch tested? Added a test Closes #29328 from imback82/dfw_option. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-24 16:30:30 +00:00
Huaxin Gao	db74fd0d33	[SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function # What changes were proposed in this pull request? There are two types of TVF. We only documented one type. Adding the doc for the 2nd type. ### Why are the changes needed? complete Table-valued Function doc ### Does this PR introduce _any_ user-facing change? <img width="1099" alt="Screen Shot 2020-08-06 at 5 30 25 PM" src="https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png"> <img width="1100" alt="Screen Shot 2020-08-06 at 5 30 49 PM" src="https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png"> <img width="1101" alt="Screen Shot 2020-08-06 at 5 31 19 PM" src="https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png"> <img width="1100" alt="Screen Shot 2020-08-06 at 5 31 40 PM" src="https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png"> ### How was this patch tested? Manually build and check Closes #29355 from huaxingao/tvf. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-08-24 09:43:41 +09:00
Yuanjian Li	8b26c69ce7	[SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-08-22 21:32:23 +09:00
Brandon Jiang	1450b5e095	[MINOR][DOCS] fix typo for docs,log message and comments ### What changes were proposed in this pull request? Fix typo for docs, log messages and comments ### Why are the changes needed? typo fix to increase readability ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manual test has been performed to test the updated Closes #29443 from brandonJY/spell-fix-doc. Authored-by: Brandon Jiang <Brandon.jiang.a@outlook.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-08-22 06:45:35 +09:00
Chao Sun	bf221debd0	[SPARK-32674][DOC] Add suggestion for parallel directory listing in tuning doc ### What changes were proposed in this pull request? This adds some tuning guide for increasing parallelism of directory listing. ### Why are the changes needed? Sometimes when job input has large number of directories, the listing can become a bottleneck. There are a few parameters to tune this. This adds some info to Spark tuning guide to make the knowledge better shared. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29498 from sunchao/SPARK-32674. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-21 16:48:54 +09:00
Gengliang Wang	1b39215a65	[SPARK-32018][FOLLOWUP][DOC] Add migration guide for decimal value overflow in sum aggregation ### What changes were proposed in this pull request? Add migration guide for decimal value overflow behavior in sum aggregation, introduced in https://github.com/apache/spark/pull/29026 ### Why are the changes needed? Add migration guide for the behavior changes from 3.0 to 3.1. See also: https://github.com/apache/spark/pull/29450#issuecomment-675222779 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview: ![image](https://user-images.githubusercontent.com/1097932/90589256-8b7e3380-e192-11ea-8ff1-05a447c20722.png) Closes #29458 from gengliangwang/migrationGuideDecimalOverflow. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-08-19 11:37:53 +08:00
Luca Canali	21e0dd0461	[SPARK-32119][FOLLOWUP][DOC] Update monitoring doc following the improvement in SPARK-32119 ### What changes were proposed in this pull request? Update monitoring doc following the improvement/fix in SPARK-32119. ### Why are the changes needed? SPARK-32119 removes the limitations listed in the monitoring doc "Distribution of the jar files containing the plugin code is currently not done by Spark." ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not relevant Closes #29463 from LucaCanali/followupSPARK32119. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>	2020-08-18 18:53:34 +09:00
Kousuke Saruta	9a79bbc8b6	[SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-08-16 12:07:37 -05:00
HyukjinKwon	9dec67717b	[SPARK-32584][PYTHON][DOCS] Exclude _images and _sources that are generated by Sphinx in Jekyll build ### What changes were proposed in this pull request? This PR proposes to `include` `_images` and `_sources` directories, generated from Sphinx, in Jekyll build. For `_images` directory, After SPARK-31851, now we add some images to use within the pages built by Sphinx. It copies and images into `_images` directory. Later, when Jekyll builds, the underscore directories are ignored by default which ends up with missing image in the main doc. Before: ![Screen Shot 2020-08-11 at 1 52 46 PM](https://user-images.githubusercontent.com/6477701/89859104-2e571080-dbdb-11ea-817c-c04bbcd4088e.png) After: ![Screen Shot 2020-08-11 at 1 49 00 PM](https://user-images.githubusercontent.com/6477701/89859105-30b96a80-dbdb-11ea-85c6-8a135eddf613.png) For `_sources` directory, Please refer [here](https://github.com/sphinx-contrib/sphinx-pretty-searchresults#source-links) and [here](https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-html_copy_source). They are generated by default and used by default in the documentations by Sphinx, and we should better include them. ### Why are the changes needed? To show the images correctly in PySpark documentation. ### Does this PR introduce _any_ user-facing change? No, only in unreleased branches. ### How was this patch tested? Manually tested via: ```bash SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch ``` Closes #29402 from HyukjinKwon/SPARK-32584. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-11 15:15:30 +09:00
Luca Canali	99f50c6286	[SPARK-32409][DOC] Document dependency between spark.metrics.staticSources.enabled and JVMSource registration ### What changes were proposed in this pull request? Document the dependency between the config `spark.metrics.staticSources.enabled` and JVMSource registration. ### Why are the changes needed? This PT just documents the dependency between config `spark.metrics.staticSources.enabled` and JVM source registration. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29203 from LucaCanali/bugJVMMetricsRegistration. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-10 09:32:01 -07:00
Dongjoon Hyun	b421bf0196	[SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3 ### What changes were proposed in this pull request? This PR aims to add `StorageLevel.DISK_ONLY_3` as a built-in `StorageLevel`. ### Why are the changes needed? In a YARN cluster, HDFS uaually provides storages with replication factor 3. So, we can save the result to HDFS to get `StorageLevel.DISK_ONLY_3` technically. However, disaggregate clusters or clusters without storage services are rising. Previously, in that situation, the users were able to use similar `MEMORY_AND_DISK_2` or a user-created `StorageLevel`. This PR aims to support those use cases officially for better UX. ### Does this PR introduce _any_ user-facing change? Yes. This provides a new built-in option. ### How was this patch tested? Pass the GitHub Action or Jenkins with the revised test cases. Closes #29331 from dongjoon-hyun/SPARK-32517. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-10 07:33:06 -07:00
Takeshi Yamamuro	bf4ac3bacc	[SPARK-32554][K8S][DOCS] Remove the words "experimental" in the k8s document ### What changes were proposed in this pull request? This PR targets at dropping the words "experimental" in the k8s document from the primary branch. This update comes from a thread in the spark-dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-on-k8s-is-still-experimental-td29942.html ### Why are the changes needed? To prepare a GA announcement for the k8s scheduler in the next feature release (v3.1.0) ### Does this PR introduce _any_ user-facing change? Yes BEFORE: <img width="938" alt="Screen Shot 2020-08-10 at 21 17 48" src="https://user-images.githubusercontent.com/692303/89781831-0752fd00-db4f-11ea-843a-67fb23fc8f71.png"> AFTER: <img width="874" alt="Screen Shot 2020-08-10 at 21 17 21" src="https://user-images.githubusercontent.com/692303/89781816-01f5b280-db4f-11ea-9ab4-4d1012bad80e.png"> ### How was this patch tested? N/A Closes #29368 from maropu/UpdateDocForK8S. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-10 06:38:19 -07:00
Liang-Chi Hsieh	f9f992e9a4	[SPARK-32191][PYTHON][DOCS] Port migration guide for PySpark docs ### What changes were proposed in this pull request? This proposes to port old PySpark migration guide to new PySpark docs. ### Why are the changes needed? Better documentation. ### Does this PR introduce _any_ user-facing change? No. Documentation only. ### How was this patch tested? Generated document locally. <img width="1521" alt="Screen Shot 2020-08-07 at 1 53 20 PM" src="https://user-images.githubusercontent.com/68855/89687618-672e7700-d8b5-11ea-8f29-67a9ab271fa8.png"> Closes #29385 from viirya/SPARK-32191. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-10 15:41:32 +09:00
Max Gekk	3a437ed22b	[SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings ### What changes were proposed in this pull request? Convert `NULL` elements of maps, structs and arrays to the `"null"` string while converting maps/struct/array values to strings. The SQL config `spark.sql.legacy.omitNestedNullInCast.enabled` controls the behaviour. When it is `true`, `NULL` elements of structs/maps/arrays will be omitted otherwise, when it is `false`, `NULL` elements will be converted to `"null"`. ### Why are the changes needed? 1. It is impossible to distinguish empty string and null, for instance: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +-----+ \|value\| +-----+ \| []\| \| []\| +-----+ ``` 2. Inconsistent NULL conversions for top-level values and nested columns, for instance: ```scala scala> sql("select named_struct('c', null), null").show +---------------------+----+ \|named_struct(c, NULL)\|NULL\| +---------------------+----+ \| []\|null\| +---------------------+----+ ``` 3. `.show()` is different from conversions to Hive strings, and as a consequence its output is different from `spark-sql` (sql tests): ```sql spark-sql> select named_struct('c', null) as struct; {"c":null} ``` ```scala scala> sql("select named_struct('c', null) as struct").show +------+ \|struct\| +------+ \| []\| +------+ ``` 4. It is impossible to distinguish empty struct/array from struct/array with null in the current implementation: ```scala scala> Seq[Seq[String]](Seq(), Seq(null)).toDF.show() +-----+ \|value\| +-----+ \| []\| \| []\| +-----+ ``` ### Does this PR introduce _any_ user-facing change? Yes, before: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +-----+ \|value\| +-----+ \| []\| \| []\| +-----+ ``` After: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +------+ \| value\| +------+ \| []\| \|[null]\| +------+ ``` ### How was this patch tested? By existing test suite `CastSuite`. Closes #29311 from MaxGekk/nested-null-to-string. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-05 12:03:36 +00:00
HyukjinKwon	15b73339d9	[SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation ### What changes were proposed in this pull request? This PR proposes to write the main page of PySpark documentation. The base work is finished at https://github.com/apache/spark/pull/29188. ### Why are the changes needed? For better usability and readability in PySpark documentation. ### Does this PR introduce _any_ user-facing change? Yes, it creates a new main page as below: ![Screen Shot 2020-07-31 at 10 02 44 PM](https://user-images.githubusercontent.com/6477701/89037618-d2d68880-d379-11ea-9a44-562f2aa0e3fd.png) ### How was this patch tested? Manually built the PySpark documentation. ```bash cd python make clean html ``` Closes #29320 from HyukjinKwon/SPARK-32507. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-05 11:14:14 +09:00
Kousuke Saruta	0660a0501d	[SPARK-32525][DOCS] The layout of monitoring.html is broken ### What changes were proposed in this pull request? This PR fixes the layout of monitoring.html broken after SPARK-31566(#28354). The cause is there are 2 `<td>` tags not closed in `monitoring.md`. ### Why are the changes needed? This is a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Build docs and the following screenshots are before/after. * Before fixed ![broken-doc](https://user-images.githubusercontent.com/4736016/89257873-fba09b80-d661-11ea-90da-06cbc0783011.png) * After fixed. ![fixed-doc2](https://user-images.githubusercontent.com/4736016/89257910-0fe49880-d662-11ea-9a85-7a1ecb1d38d6.png) Of course, the table is still rendered correctly. ![fixed-doc1](https://user-images.githubusercontent.com/4736016/89257948-225ed200-d662-11ea-80fd-d9254b44d4a0.png) Closes #29345 from sarutak/fix-monitoring.md. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-08-04 23:27:05 +08:00
Max Gekk	7eb6f45688	[SPARK-32499][SQL] Use `{}` in conversions maps and structs to strings ### What changes were proposed in this pull request? Change casting of map and struct values to strings by using the `{}` brackets instead of `[]`. The behavior is controlled by the SQL config `spark.sql.legacy.castComplexTypesToString.enabled`. When it is `true`, `CAST` wraps maps and structs by `[]` in casting to strings. Otherwise, if this is `false`, which is the default, maps and structs are wrapped by `{}`. ### Why are the changes needed? - To distinguish structs/maps from arrays. - To make `show`'s output consistent with Hive and conversions to Hive strings. - To display dataframe content in the same form by `spark-sql` and `show` - To be consistent with the `*.sql` tests ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? By existing test suite `CastSuite`. Closes #29308 from MaxGekk/show-struct-map. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-08-04 14:57:09 +00:00
Takuya UESHIN	7deb67c28f	[SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors ### What changes were proposed in this pull request? This is a follow-up of #29278. This PR changes the config name to switch allow/disallow `SparkContext` in executors as per the comment https://github.com/apache/spark/pull/29278#pullrequestreview-460256338. ### Why are the changes needed? The config name `spark.executor.allowSparkContext` is more reasonable. ### Does this PR introduce _any_ user-facing change? Yes, the config name is changed. ### How was this patch tested? Updated tests. Closes #29340 from ueshin/issues/SPARK-32160/change_config_name. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-04 12:45:06 +09:00
Max Gekk	9bbe8c7418	[MINOR][SQL] Fix versions in the SQL migration guide for Spark 3.1 ### What changes were proposed in this pull request? Change _To restore the behavior before Spark 3.0_ to _To restore the behavior before Spark 3.1_ in the SQL migration guide while telling about the behaviour before new version 3.1. ### Why are the changes needed? To have correct info in the SQL migration guide. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #29336 from MaxGekk/fix-version-in-sql-migration. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-04 11:23:28 +09:00
Takuya UESHIN	8014b0b5d6	[SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to create SparkContext in executors ### What changes were proposed in this pull request? This is a follow-up of #28986. This PR adds a config to switch allow/disallow to create `SparkContext` in executors. - `spark.driver.allowSparkContextInExecutors` ### Why are the changes needed? Some users or libraries actually create `SparkContext` in executors. We shouldn't break their workloads. ### Does this PR introduce _any_ user-facing change? Yes, users will be able to create `SparkContext` in executors with the config enabled. ### How was this patch tested? More tests are added. Closes #29278 from ueshin/issues/SPARK-32160/add_configs. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-31 17:28:35 +09:00
HyukjinKwon	e1d7321034	[SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization ### What changes were proposed in this pull request? This PR proposes to: 1. Fix the error message when the output schema is misbatched with R DataFrame from the given function. For example, ```R df <- createDataFrame(list(list(a=1L, b="2"))) count(gapply(df, "a", function(key, group) { group }, structType("a int, b int"))) ``` Before: ``` Error in handleErrors(returnStatus, conn) : ... java.lang.UnsupportedOperationException ... ``` After: ``` Error in handleErrors(returnStatus, conn) : ... java.lang.AssertionError: assertion failed: Invalid schema from gapply: expected IntegerType, IntegerType, got IntegerType, StringType ... ``` 2. Update documentation about the schema matching for `gapply` and `dapply`. ### Why are the changes needed? To show which schema is not matched, and let users know what's going on. ### Does this PR introduce _any_ user-facing change? Yes, error message is updated as above, and documentation is updated. ### How was this patch tested? Manually tested and unitttests were added. Closes #29283 from HyukjinKwon/r-vectorized-error. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-30 15:16:02 +09:00
Max Gekk	99a855575c	[SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources ### What changes were proposed in this pull request? When `spark.sql.caseSensitive` is `false` (by default), check that there are not duplicate column names on the same level (top level or nested levels) in reading from in-built datasources Parquet, ORC, Avro and JSON. If such duplicate columns exist, throw the exception: ``` org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: ``` ### Why are the changes needed? To make handling of duplicate nested columns is similar to handling of duplicate top-level columns i. e. output the same error when `spark.sql.caseSensitive` is `false`: ```Scala org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `camelcase` ``` Checking of top-level duplicates was introduced by https://github.com/apache/spark/pull/17758. ### Does this PR introduce _any_ user-facing change? Yes. For the example from SPARK-32431: ORC: ```scala java.io.IOException: Error reading file: file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-c02c2f9a-0cdc-4859-94fc-b9c809ca58b1/part-00001-63e8c3f0-7131-4ec9-be02-30b3fdd276f4-c000.snappy.orc at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329) at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78) ... Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 3 kind DATA position: 6 length: 6 range: 0 offset: 12 limit: 12 range 0 = 0 to 6 uncompressed: 3 to 3 at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61) at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) ``` JSON: ```scala +------------+ \|StructColumn\| +------------+ \| [,,]\| +------------+ ``` Parquet: ```scala +------------+ \|StructColumn\| +------------+ \| [0,, 1]\| +------------+ ``` Avro: ```scala +------------+ \|StructColumn\| +------------+ \| [,,]\| +------------+ ``` After the changes, Parquet, ORC, JSON and Avro output the same error: ```scala Found duplicate column(s) in the data schema: `camelcase`; org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `camelcase`; at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:112) at org.apache.spark.sql.util.SchemaUtils$.checkSchemaColumnNameDuplication(SchemaUtils.scala:51) at org.apache.spark.sql.util.SchemaUtils$.checkSchemaColumnNameDuplication(SchemaUtils.scala:67) ``` ### How was this patch tested? Run modified test suites: ``` $ build/sbt "sql/test:testOnly org.apache.spark.sql.FileBasedDataSourceSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.*" ``` and added new UT to `SchemaUtilsSuite`. Closes #29234 from MaxGekk/nested-case-insensitive-column. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-30 06:05:55 +00:00
HyukjinKwon	89d9b7cc64	[SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode ### What changes were proposed in this pull request? This PR proposes: 1. To introduce `InheritableThread` class, that works identically with `threading.Thread` but it can inherit the inheritable attributes of a JVM thread such as `InheritableThreadLocal`. This was a problem from the pinned thread mode, see also https://github.com/apache/spark/pull/24898. Now it works as below: ```python import pyspark spark.sparkContext.setLocalProperty("a", "hi") def print_prop(): print(spark.sparkContext.getLocalProperty("a")) pyspark.InheritableThread(target=print_prop).start() ``` ``` hi ``` 2. Also, it adds the resource leak fix into `InheritableThread`. Py4J leaks the thread and does not close the connection from Python to JVM. In `InheritableThread`, it manually closes the connections when PVM garbage collection happens. So, JVM threads finish safely. I manually verified by profiling but there's also another easy way to verify: ```bash PYSPARK_PIN_THREAD=true ./bin/pyspark ``` ```python >>> from threading import Thread >>> Thread(target=lambda: spark.range(1000).collect()).start() >>> Thread(target=lambda: spark.range(1000).collect()).start() >>> Thread(target=lambda: spark.range(1000).collect()).start() >>> spark._jvm._gateway_client.deque deque([<py4j.clientserver.ClientServerConnection object at 0x119f7aba8>, <py4j.clientserver.ClientServerConnection object at 0x119fc9b70>, <py4j.clientserver.ClientServerConnection object at 0x119fc9e10>, <py4j.clientserver.ClientServerConnection object at 0x11a015358>, <py4j.clientserver.ClientServerConnection object at 0x119fc00f0>]) >>> Thread(target=lambda: spark.range(1000).collect()).start() >>> spark._jvm._gateway_client.deque deque([<py4j.clientserver.ClientServerConnection object at 0x119f7aba8>, <py4j.clientserver.ClientServerConnection object at 0x119fc9b70>, <py4j.clientserver.ClientServerConnection object at 0x119fc9e10>, <py4j.clientserver.ClientServerConnection object at 0x11a015358>, <py4j.clientserver.ClientServerConnection object at 0x119fc08d0>, <py4j.clientserver.ClientServerConnection object at 0x119fc00f0>]) ``` This issue is fixed now. 3. Because now we have a fix for the issue here, it also proposes to deprecate `collectWithJobGroup` which was a temporary workaround added to avoid this leak issue. ### Why are the changes needed? To support pinned thread mode properly without a resource leak, and a proper inheritable local properties. ### Does this PR introduce _any_ user-facing change? Yes, it adds an API `InheritableThread` class for pinned thread mode. ### How was this patch tested? Manually tested as described above, and unit test was added as well. Closes #28968 from HyukjinKwon/SPARK-32010. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-30 10:15:25 +09:00
Thomas Graves	e926d419d3	[SPARK-30322][DOCS] Add stage level scheduling docs ### What changes were proposed in this pull request? Document the stage level scheduling feature. ### Why are the changes needed? Document the stage level scheduling feature. ### Does this PR introduce _any_ user-facing change? Documentation. ### How was this patch tested? n/a docs only Closes #29292 from tgravescs/SPARK-30322. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-07-29 13:46:28 -05:00
HyukjinKwon	5491c08bf1	Revert "[SPARK-31525][SQL] Return an empty list for df.head() when df is empty" This reverts commit `44a5258ac2`.	2020-07-29 12:07:35 +09:00
Xiaochang Wu	44c868b73a	[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Huaxin Gao <huaxing@us.ibm.com>	2020-07-28 08:36:11 -07:00
Tianshi Zhu	44a5258ac2	[SPARK-31525][SQL] Return an empty list for df.head() when df is empty ### What changes were proposed in this pull request? return an empty list instead of None when calling `df.head()` ### Why are the changes needed? `df.head()` and `df.head(1)` are inconsistent when df is empty. ### Does this PR introduce _any_ user-facing change? Yes. If a user relies on `df.head()` to return None, things like `if df.head() is None:` will be broken. ### How was this patch tested? Closes #29214 from tianshizz/SPARK-31525. Authored-by: Tianshi Zhu <zhutianshirea@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-28 12:32:19 +09:00
GuoPhilipse	8de43338be	[SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs ### What changes were proposed in this pull request? update sql-ref docs, the following key words will be added in this PR. CASE/ELSE WHEN/THEN MAP KEYS TERMINATED BY NULL DEFINED AS LINES TERMINATED BY ESCAPED BY COLLECTION ITEMS TERMINATED BY PIVOT LATERAL VIEW OUTER? ROW FORMAT SERDE ROW FORMAT DELIMITED FIELDS TERMINATED BY IGNORE NULLS FIRST LAST ### Why are the changes needed? let more users know the sql key words usage ### Does this PR introduce _any_ user-facing change? ![image](https://user-images.githubusercontent.com/46367746/88148830-c6dc1f80-cc31-11ea-81ea-13bc9dc34550.png) ![image](https://user-images.githubusercontent.com/46367746/88148968-fb4fdb80-cc31-11ea-8649-e8297cf5813e.png) ![image](https://user-images.githubusercontent.com/46367746/88149000-073b9d80-cc32-11ea-9aa4-f914ecd72663.png) ![image](https://user-images.githubusercontent.com/46367746/88149021-0f93d880-cc32-11ea-86ed-7db8672b5aac.png) ### How was this patch tested? No Closes #29056 from GuoPhilipse/add-missing-keywords. Lead-authored-by: GuoPhilipse <guofei_ok@126.com> Co-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-07-28 09:41:53 +09:00
Kent Yao	d315ebf3a7	[SPARK-32424][SQL] Fix silent data change for timestamp parsing if overflow happens ### What changes were proposed in this pull request? When using `Seconds.toMicros` API to convert epoch seconds to microseconds, ```scala /** * Equivalent to * {link #convert(long, TimeUnit) MICROSECONDS.convert(duration, this)}. * param duration the duration * return the converted duration, * or {code Long.MIN_VALUE} if conversion would negatively * overflow, or {code Long.MAX_VALUE} if it would positively overflow. */ ``` This PR change it to `Math.multiplyExact(epochSeconds, MICROS_PER_SECOND)` ### Why are the changes needed? fix silent data change between 3.x and 2.x ``` ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200722  bin/spark-sql -S -e "select to_timestamp('300000', 'y');" +294247-01-10 12:00:54.775807 ``` ``` kentyaohulk  ~/Downloads/spark/spark-2.4.5-bin-hadoop2.7  bin/spark-sql -S -e "select to_timestamp('300000', 'y');" 284550-10-19 15:58:1010.448384 ``` ### Does this PR introduce _any_ user-facing change? Yes, we will raise `ArithmeticException` instead of giving the wrong answer if overflow. ### How was this patch tested? add unit test Closes #29220 from yaooqinn/SPARK-32424. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-27 17:03:14 +00:00
HyukjinKwon	6ab29b37cf	[SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base ### What changes were proposed in this pull request? This PR proposes to redesign the PySpark documentation. I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html. Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html. In more details, this PR proposes: 1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark. 2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow. 3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively. One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage. ### Why are the changes needed? Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/). It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate. Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it. ### Does this PR introduce _any_ user-facing change? Yes, PySpark API documentation will be redesigned. ### How was this patch tested? Manually tested, and the demo site was made to show. Closes #29188 from HyukjinKwon/SPARK-32179. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-27 17:49:21 +09:00
HyukjinKwon	bfa5d57bbd	[SPARK-32452][R][SQL] Bump up the minimum Arrow version as 1.0.0 in SparkR ### What changes were proposed in this pull request? This PR proposes to set the minimum Arrow version as 1.0.0 to minimise the maintenance overhead and keep the minimal version up to date. Other required changes to support 1.0.0 were already made in SPARK-32451. ### Why are the changes needed? R side, people rather aggressively encourage people to use the latest version, and SparkR vectorization is very experimental that was added from Spark 3.0. Also, we're technically not testing old Arrow versions in SparkR for now. ### Does this PR introduce _any_ user-facing change? Yes, users wouldn't be able to use SparkR with old Arrow. ### How was this patch tested? GitHub Actions and AppVeyor are already testing them. Closes #29253 from HyukjinKwon/SPARK-32452. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-27 14:21:15 +09:00
Kent Yao	d3596c04b0	[SPARK-32406][SQL] Make RESET syntax support single configuration reset ### What changes were proposed in this pull request? This PR extends the RESET command to support reset SQL configuration one by one. ### Why are the changes needed? Currently, the reset command only supports restore all of the runtime configurations to their defaults. In most cases, users do not want this, but just want to restore one or a small group of settings. The SET command can work as a workaround for this, but you have to keep the defaults in your mind or by temp variables, which turns out not very convenient to use. Hive supports this: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineExample reset <key> \| Resets the value of a particular configuration variable (key) to the default value.Note: If you misspell the variable name, Beeline will not show an error. -- \| -- PostgreSQL supports this too https://www.postgresql.org/docs/9.1/sql-reset.html ### Does this PR introduce _any_ user-facing change? yes, reset can restore one configuration now ### How was this patch tested? add new unit tests. Closes #29202 from yaooqinn/SPARK-32406. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-24 09:13:26 -07:00
Wenchen Fan	aa54dcf193	[SPARK-32251][SQL][TESTS][FOLLOWUP] improve SQL keyword test ### What changes were proposed in this pull request? Improve the `SQLKeywordSuite` so that: 1. it checks keywords under default mode as well 2. it checks if there are typos in the doc (found one and fixed in this PR) ### Why are the changes needed? better test coverage ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #29200 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-23 14:02:38 +00:00
ulysses	184074de22	[SPARK-31999][SQL] Add REFRESH FUNCTION command ### What changes were proposed in this pull request? In Hive mode, permanent functions are shared with Hive metastore so that functions may be modified by other Hive client. With in long-lived spark scene, it's hard to update the change of function. Here are 2 reasons: * Spark cache the function in memory using `FunctionRegistry`. * User may not know the location or classname of udf when using `replace function`. Note that we use v2 command code path to add new command. ### Why are the changes needed? Give a easy way to make spark function registry sync with Hive metastore. Then we can call ``` refresh function functionName ``` ### Does this PR introduce _any_ user-facing change? Yes, new command. ### How was this patch tested? New UT. Closes #28840 from ulysses-you/SPARK-31999. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-22 19:05:50 +00:00
Brandon	1267d80db6	[MINOR][DOCS] add link for Debugging your Application in running-on-yarn.html#launching-spark-on-yarn ### What changes were proposed in this pull request? add link for Debugging your Application in `running-on-yarn.html#launching-spark-on-yar` ### Why are the changes needed? Currrently on running-on-yarn.html page launching-spark-on-yarn section, it mentions to refer for Debugging your Application. It is better to add a direct link for it to save reader time to find the section ![image](https://user-images.githubusercontent.com/20021316/87867542-80cc5500-c9c0-11ea-8560-5ddcb5a308bc.png) ### Does this PR introduce _any_ user-facing change? Yes. Docs changes. 1. add link for Debugging your Application in `running-on-yarn.html#launching-spark-on-yarn` section Updated behavior: ![image](https://user-images.githubusercontent.com/20021316/87867534-6eeab200-c9c0-11ea-94ee-d3fa58157156.png) 2. update Spark Properties link to anchor link only ### How was this patch tested? manual test has been performed to test the updated Closes #29154 from brandonJY/patch-1. Authored-by: Brandon <brandonJY@users.noreply.github.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-21 13:42:19 +09:00
Gengliang Wang	c2afe1c0b9	[SPARK-32366][DOC] Fix doc link of datetime pattern in 3.0 migration guide ### What changes were proposed in this pull request? In http://spark.apache.org/docs/latest/sql-migration-guide.html#query-engine, there is a invalid reference for datetime reference "sql-ref-datetime-pattern.md". We should fix the link as http://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html. ![image](https://user-images.githubusercontent.com/1097932/87916920-fff57380-ca28-11ea-9028-99b9f9ebdfa4.png) Also, it is nice to add url for [DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) ### Why are the changes needed? Fix migration guide doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build the doc in local env and check it: ![image](https://user-images.githubusercontent.com/1097932/87919723-13a2d900-ca2d-11ea-9923-a29b4cefaf3c.png) Closes #29162 from gengliangwang/fixDoc. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-20 20:49:22 +09:00
Igor Dvorzhak	32a0451376	[MINOR][DOCS] Fix links to Cloud Storage connectors docs Closes #29155 from medb/patch-1. Authored-by: Igor Dvorzhak <idv@google.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-19 12:19:36 -07:00
Kent Yao	bdeb626c5a	[SPARK-32272][SQL] Add SQL standard command SET TIME ZONE ### What changes were proposed in this pull request? This PR adds the SQL standard command - `SET TIME ZONE` to the current default time zone displacement for the current SQL-session, which is the same as the existing `set spark.sql.session.timeZone=xxx'. All in all, this PR adds syntax as following, ``` SET TIME ZONE LOCAL; SET TIME ZONE 'valid time zone'; -- zone offset or region SET TIME ZONE INTERVAL XXXX; -- xxx must in [-18, + 18] hours, * this range is bigger than ansi [-14, + 14] ``` ### Why are the changes needed? ANSI compliance and supply pure SQL users a way to retrieve all supported TimeZones ### Does this PR introduce _any_ user-facing change? yes, add new syntax. ### How was this patch tested? add unit tests. and locally verified reference doc ![image](https://user-images.githubusercontent.com/8326978/87510244-c8dc3680-c6a5-11ea-954c-b098be84afee.png) Closes #29064 from yaooqinn/SPARK-32272. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-07-16 13:01:53 +00:00
Warren Zhu	db47c6e340	[SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API ### What changes were proposed in this pull request? Support fetching taskList by status as below: ``` /applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskList?status=failed ``` ### Why are the changes needed? When there're large number of tasks in one stage, current api is hard to get taskList by status ### Does this PR introduce _any_ user-facing change? Yes. Updated monitoring doc. ### How was this patch tested? Added tests in `HistoryServerSuite` Closes #28942 from warrenzhu25/SPARK-32125. Authored-by: Warren Zhu <zhonzh@microsoft.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-07-16 11:31:24 +08:00
Erik Krogen	cf22d947fb	[SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature ### What changes were proposed in this pull request? This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR. This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained. ### Why are the changes needed? As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. ### Does this PR introduce _any_ user-facing change? In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests. ### How was this patch tested? Existing tests should be suitable since no behavior changes are expected as a result of this PR. Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists. Authored-by: Erik Krogen <ekrogen@linkedin.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-07-15 11:40:55 -05:00
Baohe Zhang	90b0c26b22	[SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster ### What changes were proposed in this pull request? Add a new class HybridStore to make the history server faster when loading event files. When rebuilding the application state from event logs, HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. HybridStore is to make content serving faster by using more memory. It's only safe to enable it when the cluster is not having a heavy load. ### Why are the changes needed? HybridStore can greatly reduce the event logs loading time, especially for large log files. In general, it has 4x - 6x UI loading speed improvement for large log files. The detailed result is shown in comments. ### Does this PR introduce any user-facing change? This PR adds new configs `spark.history.store.hybridStore.enabled` and `spark.history.store.hybridStore.maxMemoryUsage`. ### How was this patch tested? A test suite for HybridStore is added. I also manually tested it on 3.1.0 on mac os. This is a follow-up for the work done by Hieu Huynh in 2019. Closes #28412 from baohe-zhang/SPARK-31608. Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>	2020-07-15 07:51:13 +09:00
HyukjinKwon	4ad9bfd53b	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5 ### What changes were proposed in this pull request? This PR aims to drop Python 2.7, 3.4 and 3.5. Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark. ### Why are the changes needed? 1. Unsupport EOL Python versions 2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2. 3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation. 4. Users can use Python type hints with Pandas UDFs without thinking about Python version 5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle. ### Does this PR introduce _any_ user-facing change? Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version. ### How was this patch tested? Manually tested and also tested in Jenkins. Closes #28957 from HyukjinKwon/SPARK-32138. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-14 11:22:44 +09:00
Holden Karau	90ac9f975b	[SPARK-32004][ALL] Drop references to slave ### What changes were proposed in this pull request? This change replaces the world slave with alternatives matching the context. ### Why are the changes needed? There is no need to call things slave, we might as well use better clearer names. ### Does this PR introduce _any_ user-facing change? Yes, the ouput JSON does change. To allow backwards compatibility this is an additive change. The shell scripts for starting & stopping workers are renamed, and for backwards compatibility old scripts are added to call through to the new ones while printing a deprecation message to stderr. ### How was this patch tested? Existing tests. Closes #28864 from holdenk/SPARK-32004-drop-references-to-slave. Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden Karau <holden@pigscanfly.ca> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-07-13 14:05:33 -07:00
Chuliang Xiao	c56c84af47	[MINOR][DOCS] Fix typo in PySpark example in ml-datasource.md ### What changes were proposed in this pull request? This PR changes `true` to `True` in the python code. ### Why are the changes needed? The previous example is a syntax error. ### Does this PR introduce _any_ user-facing change? Yes, but this is doc-only typo fix. ### How was this patch tested? Manually run the example. Closes #29073 from ChuliangXiao/patch-1. Authored-by: Chuliang Xiao <ChuliangX@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-12 09:01:41 -07:00
Wenchen Fan	84db660ebe	[SPARK-32251][SQL][DOCS][TESTS] Fix SQL keyword document ### What changes were proposed in this pull request? This PR improves the test to make sure all the SQL keywords are documented correctly. It fixes several issues: 1. some keywords are not documented 2. some keywords are not ANSI SQL keywords but documented as reserved/non-reserved. ### Why are the changes needed? To make sure the implementation matches the doc. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test Closes #29055 from cloud-fan/keyword. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-10 15:10:28 -07:00
Kent Yao	4609f1fdab	[SPARK-32207][SQL] Support 'F'-suffixed Float Literals ### What changes were proposed in this pull request? In this PR, I suppose we support 'f'-suffixed float literal, e.g. `select 1.1f` ### Why are the changes needed? a very common feature across platforms, checked with pg, presto, hive, MySQL... ### Does this PR introduce _any_ user-facing change? yes, `select 1.1f` results float value 1.1 instead of throwing AnlaysisExceptiion`Can't extract value from 1: need struct type but got int;` ### How was this patch tested? add unit tests Closes #29022 from yaooqinn/SPARK-32207. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-09 19:45:16 -07:00
moovlin	9331a5c44b	[SPARK-32035][DOCS][EXAMPLES] Fixed typos involving AWS Access, Secret, & Sessions tokens ### What changes were proposed in this pull request? I resolved some of the inconsistencies of AWS env variables. They're fixed in the documentation as well as in the examples. I grep-ed through the repo to try & find any more instances but nothing popped up. ### Why are the changes needed? As previously mentioned, there is a JIRA request, SPARK-32035, which encapsulates all the issues. But, in summary, the naming of items was inconsistent. ### Does this PR introduce _any_ user-facing change? Correct names: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN These are the same that AWS uses in their libraries. However, looking through the Spark documentation and comments, I see that these are not denoted correctly across the board: docs/cloud-integration.md 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` <-- both different 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated authentication options docs/streaming-kinesis-integration.md 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY` with your AWS credentials. <-- secret key different external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py 34: $ export AWS_ACCESS_KEY_ID=<your-access-key> 35: $ export AWS_SECRET_KEY=<your-secret-key> <-- different 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY <-- secret key different core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala 53: * $ export AWS_ACCESS_KEY_ID=<your-access-key> 54: * $ export AWS_SECRET_KEY=<your-secret-key> <-- different 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY <-- secret key different external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] 60: * $ export AWS_SECRET_KEY=<your-secret-key> <-- different 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY <-- secret key different These were all fixed to match names listed under the "correct names" heading. ### How was this patch tested? I built the documentation using jekyll and verified that the changes were present & accurate. Closes #29058 from Moovlin/SPARK-32035. Authored-by: moovlin <richjoerger@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-09 10:35:21 -07:00
GuoPhilipse	09cc6c51ea	[SPARK-32193][SQL][DOCS] Update regexp usage in SQL docs ### What changes were proposed in this pull request? update REGEXP usage and examples in sql-ref-syntx-qry-select-like.cmd ### Why are the changes needed? make the usage of REGEXP known to more users ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No tests Closes #29009 from GuoPhilipse/update-migrate-guide. Lead-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Co-authored-by: GuoPhilipse <guofei_ok@126.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-07-09 16:14:33 +09:00
Max Gekk	1261fac674	[SPARK-31710][SQL][FOLLOWUP] Allow cast numeric to timestamp by default ### What changes were proposed in this pull request? 1. Set the SQL config `spark.sql.legacy.allowCastNumericToTimestamp` to `true` by default 2. Remove explicit sets of `spark.sql.legacy.allowCastNumericToTimestamp` to `true` in the cast suites. ### Why are the changes needed? To avoid breaking changes in minor versions (in the upcoming Spark 3.1.0) according to the the semantic versioning guidelines (https://spark.apache.org/versioning-policy.html) ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? By `CastSuite`. Closes #29012 from MaxGekk/allow-cast-numeric-to-timestamp. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-07 14:09:40 -07:00
Huaxin Gao	492d5d174a	[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands <img width="1200" alt="Screen Shot 2020-07-04 at 9 05 35 AM" src="https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png"> after change, when clicking USE DATABASE, the side bar menu shows DDL commands <img width="1120" alt="Screen Shot 2020-07-04 at 9 06 06 AM" src="https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png"> before change, when clicking refresh table, the side bar menu shows Auxiliary statements <img width="1200" alt="Screen Shot 2020-07-04 at 9 30 40 AM" src="https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png"> after change, when clicking refresh table, the side bar menu shows Cache statements <img width="1199" alt="Screen Shot 2020-07-04 at 9 35 21 AM" src="https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png"> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Huaxin Gao <huaxing@us.ibm.com>	2020-07-04 19:01:07 -07:00
Max Gekk	bcf23307f4	[SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default ### What changes were proposed in this pull request? Set the JSON option `inferTimestamp` to `false` if an user don't pass it as datasource option. ### Why are the changes needed? To prevent perf regression while inferring schemas from JSON with potential timestamps fields. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - Modified existing tests in `JsonSuite` and `JsonInferSchemaSuite`. - Regenerated results of `JsonBenchmark` in the environment: \| Item \| Description \| \| ---- \| ----\| \| Region \| us-west-2 (Oregon) \| \| Instance \| r3.xlarge \| \| AMI \| ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) \| \| Java \| OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 \| Closes #28966 from MaxGekk/json-inferTimestamps-disable-by-default. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-01 15:45:39 -07:00
Kousuke Saruta	5176707ac3	[MINOR][DOCS] Fix a typo for a configuration property of resources allocation ### What changes were proposed in this pull request? This PR fixes a typo for a configuration property in the `spark-standalone.md`. `spark.driver.resourcesfile` should be `spark.driver.resourcesFile`. I look for similar typo but this is the only typo. ### Why are the changes needed? The property name is wrong. ### Does this PR introduce _any_ user-facing change? Yes. The property name is corrected. ### How was this patch tested? I confirmed the spell of the property name is the correct from the property name defined in o.a.s.internal.config.package.scala. Closes #28958 from sarutak/fix-resource-typo. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-30 09:28:54 -07:00
Guy Khazma	44aecaa912	[SPARK-32099][DOCS] Remove broken link in cloud integration documentation ### What changes were proposed in this pull request? The 3rd link in `IBM Cloud Object Storage connector for Apache Spark` is broken. The PR removes this link. ### Why are the changes needed? broken link ### Does this PR introduce _any_ user-facing change? yes, the broken link is removed from the doc. ### How was this patch tested? doc generation passes successfully as before Closes #28927 from guykhazma/spark32099. Authored-by: Guy Khazma <guykhag@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-26 19:12:42 -07:00
gatorsmile	d06604f60a	[SPARK-32078][DOC] Add a redirect to sql-ref from sql-reference ### What changes were proposed in this pull request? This PR is to add a redirect to sql-ref.html. ### Why are the changes needed? Before Spark 3.0 release, we are using sql-reference.md, which was replaced by sql-ref.md instead. A number of Google searches I’ve done today have turned up https://spark.apache.org/docs/latest/sql-reference.html, which does not exist any more. Thus, we should add a redirect to sql-ref.html. ### Does this PR introduce _any_ user-facing change? https://spark.apache.org/docs/latest/sql-reference.html will be redirected to https://spark.apache.org/docs/latest/sql-ref.html ### How was this patch tested? Build it in my local environment. It works well. The sql-reference.html file was generated. The contents are like: ``` <!DOCTYPE html> <html lang="en-US"> <meta charset="utf-8"> <title>Redirecting…</title> <link rel="canonical" href="http://localhost:4000/sql-ref.html"> <script>location="http://localhost:4000/sql-ref.html"</script> <meta http-equiv="refresh" content="0; url=http://localhost:4000/sql-ref.html"> <meta name="robots" content="noindex"> <h1>Redirecting…</h1> <a href="http://localhost:4000/sql-ref.html">Click here if you are not redirected.</a> </html> ``` Closes #28914 from gatorsmile/addRedirectSQLRef. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-24 11:00:20 -07:00
sidedoorleftroad	986fa01747	[SPARK-32075][DOCS] Fix a few issues in parameters table ### What changes were proposed in this pull request? Fix a few issues in parameters table in structured-streaming-kafka-integration doc. ### Why are the changes needed? Make the title of the table consistent with the data. ### Does this PR introduce _any_ user-facing change? Yes. Before: ![image](https://user-images.githubusercontent.com/67275816/85414316-8475e300-b59e-11ea-84ec-fa78ecc980b3.png) After: ![image](https://user-images.githubusercontent.com/67275816/85414562-d61e6d80-b59e-11ea-9fe6-247e0ad4d9ee.png) Before: ![image](https://user-images.githubusercontent.com/67275816/85414467-b8510880-b59e-11ea-92a0-7205542fe28b.png) After: ![image](https://user-images.githubusercontent.com/67275816/85414589-de76a880-b59e-11ea-91f2-5073eaf3444b.png) Before: ![image](https://user-images.githubusercontent.com/67275816/85414502-c69f2480-b59e-11ea-837f-1201f10a56b6.png) After: ![image](https://user-images.githubusercontent.com/67275816/85414615-e9313d80-b59e-11ea-9b1a-fc11da0b6bc5.png) ### How was this patch tested? Manually build and check. Closes #28910 from sidedoorleftroad/SPARK-32075. Authored-by: sidedoorleftroad <sidedoorleftroad@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-24 13:39:55 +09:00
HyukjinKwon	b62e2536db	[SPARK-32073][R] Drop R < 3.5 support ### What changes were proposed in this pull request? Spark 3.0 accidentally dropped R < 3.5. It is built by R 3.6.3 which not support R < 3.5: ``` Error in readRDS(pfile) : cannot read workspace version 3 written by R 3.6.3; need R 3.5.0 or newer version. ``` In fact, with SPARK-31918, we will have to drop R < 3.5 entirely to support R 4.0.0. This is inevitable to release on CRAN because they require to make the tests pass with the latest R. ### Why are the changes needed? To show the supported versions correctly, and support R 4.0.0 to unblock the releases. ### Does this PR introduce _any_ user-facing change? In fact, no because Spark 3.0.0 already does not work with R < 3.5. Compared to Spark 2.4, yes. R < 3.5 would not work. ### How was this patch tested? Jenkins should test it out. Closes #28908 from HyukjinKwon/SPARK-32073. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-24 11:05:27 +09:00
Gabor Somogyi	a9247c39d2	[SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka connector executor side to avoid infinite wait ### What changes were proposed in this pull request? Spark uses an old and deprecated API named `KafkaConsumer.poll(long)` which never returns and stays in live lock if metadata is not updated (for instance when broker disappears at consumer creation). Please see [Kafka documentation](https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll-long-) and [standalone test application](https://github.com/gaborgsomogyi/kafka-get-assignment) for further details. In this PR I've applied the new `KafkaConsumer.poll(Duration)` API on executor side. Please note driver side still uses the old API which will be fixed in SPARK-32032. ### Why are the changes needed? Infinite wait in `KafkaConsumer.poll(long)`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. Closes #28871 from gaborgsomogyi/SPARK-32033. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-19 14:46:26 -07:00
Yuanjian Li	8750363c8d	[MINOR][DOCS] Emphasize the Streaming tab is for DStream API ### What changes were proposed in this pull request? Emphasize the Streaming tab is for DStream API. ### Why are the changes needed? Some users reported that it's a little confusing of the streaming tab and structured streaming tab. ### Does this PR introduce _any_ user-facing change? Document change. ### How was this patch tested? N/A Closes #28854 from xuanyuanking/minor-doc. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-19 12:17:40 +09:00
James Yu	ac98a9a07f	[MINOR][DOCS] Update running-on-kubernetes.md ### What changes were proposed in this pull request? Fix executor container name typo. `executor` should be `spark-kubernetes-executor`. ### Why are the changes needed? The Executor pod container name the users actually get from their Kubernetes clusters is different from that described in the documentation. For example, below is what a user get from an executor pod. ``` Containers: spark-kubernetes-executor: Container ID: docker://aaaabbbbccccddddeeeeffff Image: <imagename> Image ID: docker-pullable://0000.dkr.ecr.us-east-0.amazonaws.com/spark Port: 7079/TCP Host Port: 0/TCP Args: executor State: Running Started: Thu, 28 May 2020 05:54:04 -0700 Ready: True Restart Count: 0 Limits: memory: 16Gi ``` ### Does this PR introduce _any_ user-facing change? Document change. ### How was this patch tested? N/A Closes #28862 from yuj/patch-1. Authored-by: James Yu <yuj@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-18 14:36:20 -07:00
DB Tsai	9b792518b2	[SPARK-31960][YARN][BUILD] Only populate Hadoop classpath for no-hadoop build ### What changes were proposed in this pull request? If a Spark distribution has built-in hadoop runtime, Spark will not populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to Yarn. Users can override this behavior by setting `spark.yarn.populateHadoopClasspath` to `true`. ### Why are the changes needed? Without this, Spark will populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` even Spark distribution has built-in hadoop. This results jar conflict and many unexpected behaviors in runtime. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually test with two builds, with-hadoop and no-hadoop builds. Closes #28788 from dbtsai/yarn-classpath. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-06-18 06:08:40 +00:00
GuoPhilipse	f0e6d0ec13	[SPARK-31710][SQL] Fail casting numeric to timestamp by default ## What changes were proposed in this pull request? we fail casting from numeric to timestamp by default. ## Why are the changes needed? casting from numeric to timestamp is not a non-standard,meanwhile it may generate different result between spark and other systems,for example hive ## Does this PR introduce any user-facing change? Yes,user cannot cast numeric to timestamp directly,user have to use the following function to achieve the same effect:TIMESTAMP_SECONDS/TIMESTAMP_MILLIS/TIMESTAMP_MICROS ## How was this patch tested? unit test added Closes #28593 from GuoPhilipse/31710-fix-compatibility. Lead-authored-by: GuoPhilipse <guofei_ok@126.com> Co-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-16 08:35:35 +00:00
Jungtaek Lim (HeartSaVioR)	fe68e95a5a	[SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numRowsDroppedByWatermark" ### What changes were proposed in this pull request? This PR renames the variable from "numLateInputs" to "numRowsDroppedByWatermark" so that it becomes self-explanation. ### Why are the changes needed? This is originated from post-review, see https://github.com/apache/spark/pull/28607#discussion_r439853232 ### Does this PR introduce _any_ user-facing change? No, as SPARK-24634 is not introduced in any release yet. ### How was this patch tested? Existing UTs. Closes #28828 from HeartSaVioR/SPARK-24634-v3-followup. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-16 16:41:08 +09:00
Takeshi Yamamuro	3698a14204	[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-06-16 00:27:45 +09:00
yi.wu	54e702c0dd	[SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties ### What changes were proposed in this pull request? This PR proposes to use "mdc.XXX" as the consistent key for both `sc.setLocalProperty` and `log4j.properties` when setting up configurations for MDC. ### Why are the changes needed? It's weird that we use "mdc.XXX" as key to set MDC value via `sc.setLocalProperty` while we use "XXX" as key to set MDC pattern in log4j.properties. It could also bring extra burden to the user. ### Does this PR introduce _any_ user-facing change? No, as MDC feature is added in version 3.1, which hasn't been released. ### How was this patch tested? Tested manually. Closes #28801 from Ngone51/consistent-mdc. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-14 14:26:11 -07:00
Jungtaek Lim (HeartSaVioR)	84815d0550	[SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay ### What changes were proposed in this pull request? Please refer https://issues.apache.org/jira/browse/SPARK-24634 to see rationalization of the issue. This patch adds a new metric to count the number of inputs arrived later than watermark plus allowed delay. To make changes simpler, this patch doesn't count the exact number of input rows which are later than watermark plus allowed delay. Instead, this patch counts the inputs which are dropped in the logic of operator. The difference of twos are shown in streaming aggregation: to optimize the calculation, streaming aggregation "pre-aggregates" the input rows, and later checks the lateness against "pre-aggregated" inputs, hence the number might be reduced. The new metric will be provided via two places: 1. On Spark UI: check the metrics in stateful operator nodes in query execution details page in SQL tab 2. On Streaming Query Listener: check "numLateInputs" in "stateOperators" in QueryProcessEvent. ### Why are the changes needed? Dropping late inputs means that end users might not get expected outputs. Even end users may indicate the fact and tolerate the result (as that's what allowed lateness is for), but they should be able to observe whether the current value of allowed lateness drops inputs or not so that they can adjust the value. Also, whatever the chance they have multiple of stateful operators in a single query, if Spark drops late inputs "between" these operators, it becomes "correctness" issue. Spark should disallow such possibility, but given we already provided the flexibility, at least we should provide the way to observe the correctness issue and decide whether they should make correction of their query or not. ### Does this PR introduce _any_ user-facing change? Yes. End users will be able to retrieve the information of late inputs via two ways: 1. SQL tab in Spark UI 2. Streaming Query Listener ### How was this patch tested? New UTs added & existing UTs are modified to reflect the change. And ran manual test reproducing SPARK-28094. I've picked the specific case on "B outer C outer D" which is enough to represent the "intermediate late row" issue due to global watermark. https://gist.github.com/jammann/b58bfbe0f4374b89ecea63c1e32c8f17 Spark logs warning message on the query which means SPARK-28074 is working correctly, ``` 20/05/30 17:52:47 WARN UnsupportedOperationChecker: Detected pattern of possible 'correctness' issue due to global watermark. The query contains stateful operation which can emit rows older than the current watermark plus allowed late record delay, which are "late rows" in downstream stateful operations and these rows can be discarded. Please refer the programming guide doc for more details.; Join LeftOuter, ((D_FK#28 = D_ID#87) AND (B_LAST_MOD#26-T30000ms = D_LAST_MOD#88-T30000ms)) :- Join LeftOuter, ((C_FK#27 = C_ID#58) AND (B_LAST_MOD#26-T30000ms = C_LAST_MOD#59-T30000ms)) : :- EventTimeWatermark B_LAST_MOD#26: timestamp, 30 seconds : : +- Project [v#23.B_ID AS B_ID#25, v#23.B_LAST_MOD AS B_LAST_MOD#26, v#23.C_FK AS C_FK#27, v#23.D_FK AS D_FK#28] : : +- Project [from_json(StructField(B_ID,StringType,false), StructField(B_LAST_MOD,TimestampType,false), StructField(C_FK,StringType,true), StructField(D_FK,StringType,true), value#21, Some(UTC)) AS v#23] : : +- Project [cast(value#8 as string) AS value#21] : : +- StreamingRelationV2 org.apache.spark.sql.kafka010.KafkaSourceProvider3a7fd18c, kafka, org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaTable396d2958, org.apache.spark.sql.util.CaseInsensitiveStringMapa51ee61a, [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13], StreamingRelation DataSource(org.apache.spark.sql.SparkSessiond221af8,kafka,List(),None,List(),None,Map(inferSchema -> true, startingOffsets -> earliest, subscribe -> B, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6] : +- EventTimeWatermark C_LAST_MOD#59: timestamp, 30 seconds : +- Project [v#56.C_ID AS C_ID#58, v#56.C_LAST_MOD AS C_LAST_MOD#59] : +- Project [from_json(StructField(C_ID,StringType,false), StructField(C_LAST_MOD,TimestampType,false), value#54, Some(UTC)) AS v#56] : +- Project [cast(value#41 as string) AS value#54] : +- StreamingRelationV2 org.apache.spark.sql.kafka010.KafkaSourceProvider3f507373, kafka, org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaTable7b6736a4, org.apache.spark.sql.util.CaseInsensitiveStringMapa51ee61b, [key#40, value#41, topic#42, partition#43, offset#44L, timestamp#45, timestampType#46], StreamingRelation DataSource(org.apache.spark.sql.SparkSessiond221af8,kafka,List(),None,List(),None,Map(inferSchema -> true, startingOffsets -> earliest, subscribe -> C, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#33, value#34, topic#35, partition#36, offset#37L, timestamp#38, timestampType#39] +- EventTimeWatermark D_LAST_MOD#88: timestamp, 30 seconds +- Project [v#85.D_ID AS D_ID#87, v#85.D_LAST_MOD AS D_LAST_MOD#88] +- Project [from_json(StructField(D_ID,StringType,false), StructField(D_LAST_MOD,TimestampType,false), value#83, Some(UTC)) AS v#85] +- Project [cast(value#70 as string) AS value#83] +- StreamingRelationV2 org.apache.spark.sql.kafka010.KafkaSourceProvider2b90e779, kafka, org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaTable36f8cd29, org.apache.spark.sql.util.CaseInsensitiveStringMapa51ee620, [key#69, value#70, topic#71, partition#72, offset#73L, timestamp#74, timestampType#75], StreamingRelation DataSource(org.apache.spark.sql.SparkSessiond221af8,kafka,List(),None,List(),None,Map(inferSchema -> true, startingOffsets -> earliest, subscribe -> D, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#62, value#63, topic#64, partition#65, offset#66L, timestamp#67, timestampType#68] ``` and we can find the late inputs from the batch 4 as follows: ![Screen Shot 2020-05-30 at 18 02 53](https://user-images.githubusercontent.com/1317309/83324401-058fd200-a2a0-11ea-8bf6-89cf777e9326.png) which represents intermediate inputs are being lost, ended up with correctness issue. Closes #28607 from HeartSaVioR/SPARK-24634-v3. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-14 14:37:38 +09:00
Kent Yao	6a424b93e5	[SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions ### What changes were proposed in this pull request? Currently, `date_format` and `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, `to_timestamp`, `to_date` have different exception handling behavior for formatting datetime values. In this PR, we apply the exception handling behavior of `date_format` to `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, `to_timestamp` and `to_date`. In the phase of creating the datetime formatted or formating, exceptions will be raised. e.g. ```java spark-sql> select date_format(make_timestamp(1, 1 ,1,1,1,1), 'yyyyyyyyyyy-MM-aaa'); 20/05/28 15:25:38 ERROR SparkSQLDriver: Failed in [select date_format(make_timestamp(1, 1 ,1,1,1,1), 'yyyyyyyyyyy-MM-aaa')] org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'yyyyyyyyyyy-MM-aaa' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html ``` ```java spark-sql> select date_format(make_timestamp(1, 1 ,1,1,1,1), 'yyyyyyyyyyy-MM-AAA'); 20/05/28 15:26:10 ERROR SparkSQLDriver: Failed in [select date_format(make_timestamp(1, 1 ,1,1,1,1), 'yyyyyyyyyyy-MM-AAA')] java.lang.IllegalArgumentException: Illegal pattern character: A ``` ```java spark-sql> select date_format(make_timestamp(1,1,1,1,1,1), 'yyyyyyyyyyy-MM-dd'); 20/05/28 15:23:23 ERROR SparkSQLDriver: Failed in [select date_format(make_timestamp(1,1,1,1,1,1), 'yyyyyyyyyyy-MM-dd')] java.lang.ArrayIndexOutOfBoundsException: 11 at java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2568) ``` In the phase of parsing, `DateTimeParseException \| DateTimeException \| ParseException` will be suppressed, but `SparkUpgradeException` will still be raised e.g. ```java spark-sql> set spark.sql.legacy.timeParserPolicy=exception; spark.sql.legacy.timeParserPolicy exception spark-sql> select to_timestamp("2020-01-27T20:06:11.847-0800", "yyyy-MM-dd'T'HH:mm:ss.SSSz"); 20/05/28 15:31:15 ERROR SparkSQLDriver: Failed in [select to_timestamp("2020-01-27T20:06:11.847-0800", "yyyy-MM-dd'T'HH:mm:ss.SSSz")] org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2020-01-27T20:06:11.847-0800' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. ``` ```java spark-sql> set spark.sql.legacy.timeParserPolicy=corrected; spark.sql.legacy.timeParserPolicy corrected spark-sql> select to_timestamp("2020-01-27T20:06:11.847-0800", "yyyy-MM-dd'T'HH:mm:ss.SSSz"); NULL spark-sql> set spark.sql.legacy.timeParserPolicy=legacy; spark.sql.legacy.timeParserPolicy legacy spark-sql> select to_timestamp("2020-01-27T20:06:11.847-0800", "yyyy-MM-dd'T'HH:mm:ss.SSSz"); 2020-01-28 12:06:11.847 ``` ### Why are the changes needed? Consistency ### Does this PR introduce _any_ user-facing change? Yes, invalid datetime patterns will fail `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, `to_timestamp` and `to_date` instead of resulting `NULL` ### How was this patch tested? add more tests Closes #28650 from yaooqinn/SPARK-31830. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-09 16:56:45 +00:00
Akshat Bordia	6befb2d8bd	[SPARK-31486][CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode ### What changes were proposed in this pull request? These changes implement an application wait mechanism which will allow spark-submit to wait until the application finishes in Standalone Spark Mode. This will delay the exit of spark-submit JVM until the job is completed. This implementation will keep monitoring the application until it is either finished, failed or killed. This will be controlled via a flag (spark.submit.waitForCompletion) which will be set to false by default. ### Why are the changes needed? Currently, Livy API for Standalone Cluster Mode doesn't know when the job has finished. If this flag is enabled, this can be used by Livy API (/batches/{batchId}/state) to find out when the application has finished/failed. This flag is Similar to spark.yarn.submit.waitAppCompletion. ### Does this PR introduce any user-facing change? Yes, this PR introduces a new flag but it will be disabled by default. ### How was this patch tested? Couldn't implement unit tests since the pollAndReportStatus method has System.exit() calls. Please provide any suggestions. Tested spark-submit locally for the following scenarios: 1. With the flag enabled, spark-submit exits once the job is finished. 2. With the flag enabled and job failed, spark-submit exits when the job fails. 3. With the flag disabled, spark-submit exists post submitting the job (existing behavior). 4. Existing behavior is unchanged when the flag is not added explicitly. Closes #28258 from akshatb1/master. Lead-authored-by: Akshat Bordia <akshat.bordia31@gmail.com> Co-authored-by: Akshat Bordia <akshat.bordia@citrix.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-06-09 09:29:37 -05:00
Jungtaek Lim (HeartSaVioR)	8305b77796	[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-09 04:52:48 +00:00
Gabor Somogyi	04f66bfd4e	[MINOR][SS][DOCS] fileNameOnly parameter description re-unite ### What changes were proposed in this pull request? `fileNameOnly` parameter is split to 2 pieces in [this](`dbb8143501`) commit. This PR re-unites it. ### Why are the changes needed? Parameter description split in doc. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` cd docs/ SKIP_API=1 jekyll build ``` Manual webpage check. Closes #28739 from gaborgsomogyi/datasettxtfix. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-06 16:49:48 +09:00
Kent Yao	9d5b5d0a58	[SPARK-31879][SQL][TEST-JAVA11] Make week-based pattern invalid for formatting too # What changes were proposed in this pull request? After all these attempts https://github.com/apache/spark/pull/28692 and https://github.com/apache/spark/pull/28719 an https://github.com/apache/spark/pull/28727. they all have limitations as mentioned in their discussions. Maybe the only way is to forbid them all ### Why are the changes needed? These week-based fields need Locale to express their semantics, the first day of the week varies from country to country. From the Java doc of WeekFields ```java /** * Gets the first day-of-week. * <p> * The first day-of-week varies by culture. * For example, the US uses Sunday, while France and the ISO-8601 standard use Monday. * This method returns the first day using the standard {code DayOfWeek} enum. * * return the first day-of-week, not null */ public DayOfWeek getFirstDayOfWeek() { return firstDayOfWeek; } ``` But for the SimpleDateFormat, the day-of-week is not localized ``` u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1 ``` Currently, the default locale we use is the US, so the result moved a day or a year or a week backward. e.g. For the date `2019-12-29(Sunday)`, in the Sunday Start system(e.g. en-US), it belongs to 2020 of week-based-year, in the Monday Start system(en-GB), it goes to 2019. the week-of-week-based-year(w) will be affected too ```sql spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY', 'locale', 'en-US')); 2020 spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY', 'locale', 'en-GB')); 2019 spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-US')); 2020-01-01 spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-GB')); 2019-52-07 spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2020-01-05', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-US')); 2020-02-01 spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2020-01-05', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-GB')); 2020-01-07 ``` For other countries, please refer to [First Day of the Week in Different Countries](http://chartsbin.com/view/41671) ### Does this PR introduce _any_ user-facing change? With this change, user can not use 'YwuW', but 'e' for 'u' instead. This can at least turn this not to be a silent data change. ### How was this patch tested? add unit tests Closes #28728 from yaooqinn/SPARK-31879-NEW2. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-05 08:14:01 +00:00
Enrico Minack	4bbe3c2bb4	[SPARK-31853][DOCS] Mention removal of params mixins setter in migration guide ### What changes were proposed in this pull request? The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API. ### Why are the changes needed? In SPARK-29093, all setters have been removed from `Params` mixins in `pyspark.ml.param.shared`. Those setters had been part of the public pyspark ML API, hence this is a breaking change. ### Does this PR introduce _any_ user-facing change? Only documentation. ### How was this patch tested? Visually. Closes #28663 from EnricoMi/branch-pyspark-migration-guide-setters. Authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-06-03 18:06:13 -05:00
Kent Yao	afe95bd9ad	[SPARK-31892][SQL] Disable week-based date filed for parsing ### What changes were proposed in this pull request? This PR disables week-based date filed for parsing closes #28674 ### Why are the changes needed? 1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at https://github.com/apache/spark/pull/28674 2. The existing behavior itself in 2.4 is confusing, e.g. ```sql spark-sql> select to_timestamp('1', 'w'); 1969-12-28 00:00:00 spark-sql> select to_timestamp('1', 'u'); 1970-01-05 00:00:00 ``` The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form. And, e.g. ```sql spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy'); 2020-01-01 00:00:00 spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY'); 2019-12-29 00:00:00 spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w'); NULL spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w'); 2019-12-29 00:00:00 ``` I think we don't need to introduce all the weird behavior from Java 3. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it. 4. Avoiding JDK bugs https://issues.apache.org/jira/browse/SPARK-31880 ### Does this PR introduce _any_ user-facing change? Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions. ### How was this patch tested? more tests added Closes #28706 from yaooqinn/SPARK-31892. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-03 06:49:58 +00:00
Eren Avsarogullari	979593d708	[SPARK-31566][SQL][DOCS] Add SQL Rest API Documentation ### What changes were proposed in this pull request? SQL Rest API exposes query execution details and metrics as Public API. Its documentation will be useful for the end-users. ### Why are the changes needed? SQL Rest API does not exist under Spark Rest API. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually build and check Closes #28354 from erenavsarogullari/SPARK-31566. Lead-authored-by: Eren Avsarogullari <eren.avsarogullari@gmail.com> Co-authored-by: Gengliang Wang <gengliang.wang@databricks.com> Co-authored-by: Eren Avsarogullari <erenavsarogullari@gmail.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-06-02 12:46:12 -07:00
lipzhu	d79a8a88b1	[SPARK-31834][SQL] Improve error message for incompatible data types ### What changes were proposed in this pull request? We should use dataType.catalogString to unified the data type mismatch message. Before: ```sql spark-sql> create table SPARK_31834(a int) using parquet; spark-sql> insert into SPARK_31834 select '1'; Error in query: Cannot write incompatible data to table '`default`.`spark_31834`': - Cannot safely cast 'a': StringType to IntegerType; ``` After: ```sql spark-sql> create table SPARK_31834(a int) using parquet; spark-sql> insert into SPARK_31834 select '1'; Error in query: Cannot write incompatible data to table '`default`.`spark_31834`': - Cannot safely cast 'a': string to int; ``` ### How was this patch tested? UT. Closes #28654 from lipzhu/SPARK-31834. Authored-by: lipzhu <lipzhu@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-02 21:07:10 +09:00
Kent Yao	547c5bf552	[SPARK-31867][SQL] Disable year type datetime patterns which are longer than 10 ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/28673 and suggested via cloud-fan at https://github.com/apache/spark/pull/28673#discussion_r432817075 In this PR, we disable datetime pattern in the form of `y..y` and `Y..Y` whose lengths are greater than 10 to avoid sort of JDK bug as described below he new datetime formatter introduces silent data change like, ```sql spark-sql> select from_unixtime(1, 'yyyyyyyyyyy-MM-dd'); NULL spark-sql> set spark.sql.legacy.timeParserPolicy=legacy; spark.sql.legacy.timeParserPolicy legacy spark-sql> select from_unixtime(1, 'yyyyyyyyyyy-MM-dd'); 00000001970-01-01 spark-sql> ``` For patterns that support `SignStyle.EXCEEDS_PAD`, e.g. `y..y`(len >=4), when using the `NumberPrinterParser` to format it ```java switch (signStyle) { case EXCEEDS_PAD: if (minWidth < 19 && value >= EXCEED_POINTS[minWidth]) { buf.append(decimalStyle.getPositiveSign()); } break; .... ``` the `minWidth` == `len(y..y)` the `EXCEED_POINTS` is ```java /** * Array of 10 to the power of n. */ static final long[] EXCEED_POINTS = new long[] { 0L, 10L, 100L, 1000L, 10000L, 100000L, 1000000L, 10000000L, 100000000L, 1000000000L, 10000000000L, }; ``` So when the `len(y..y)` is greater than 10, ` ArrayIndexOutOfBoundsException` will be raised. And at the caller side, for `from_unixtime`, the exception will be suppressed and silent data change occurs. for `date_format`, the `ArrayIndexOutOfBoundsException` will continue. ### Why are the changes needed? fix silent data change ### Does this PR introduce _any_ user-facing change? Yes, SparkUpgradeException will take place of `null` result when the pattern contains 10 or more continuous 'y' or 'Y' ### How was this patch tested? new tests Closes #28684 from yaooqinn/SPARK-31867-2. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-31 12:34:39 +00:00
Huaxin Gao	1b780f364b	[SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference ### What changes were proposed in this pull request? Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference ### Why are the changes needed? To make SQL reference complete ### Does this PR introduce _any_ user-facing change? <img width="1100" alt="Screen Shot 2020-05-29 at 6 46 38 PM" src="https://user-images.githubusercontent.com/13592258/83316782-d6fcf300-a1dc-11ea-87f6-e357b9c739fd.png"> <img width="1099" alt="Screen Shot 2020-05-29 at 6 43 30 PM" src="https://user-images.githubusercontent.com/13592258/83316784-d8c6b680-a1dc-11ea-95ea-10a1f75dcef9.png"> Only the the above pages are changed. The following two pages are the same as before. <img width="1100" alt="Screen Shot 2020-05-28 at 10 05 27 PM" src="https://user-images.githubusercontent.com/13592258/83223474-bfb3fc00-a12f-11ea-807a-824a618afa0b.png"> <img width="1099" alt="Screen Shot 2020-05-28 at 10 05 08 PM" src="https://user-images.githubusercontent.com/13592258/83223478-c2165600-a12f-11ea-806e-a1e57dc35ef4.png"> ### How was this patch tested? Manually build and check Closes #28672 from huaxingao/coalesce_hint. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-30 14:51:45 -05:00
Wenchen Fan	1528fbced8	[SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form ### What changes were proposed in this pull request? If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message. ### Why are the changes needed? to keep backward compatibility with Spark 2.4 ### Does this PR introduce _any_ user-facing change? Yes Spark 2.4 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ \|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)\| +---------------------------------------------+ \| Jan\| +---------------------------------------------+ ``` Spark 3.0 with Java 11 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ \|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)\| +---------------------------------------------+ \| Jan\| +---------------------------------------------+ ``` Spark 3.0 with Java 8 ``` // before this PR +---------------------------------------------+ \|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)\| +---------------------------------------------+ \| 1\| +---------------------------------------------+ // after this PR scala> sql("select date_format('1990-1-1', 'LLL')").show org.apache.spark.SparkUpgradeException ``` ### How was this patch tested? manual test with java 8 and 11 Closes #28646 from cloud-fan/format. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-27 18:53:19 +00:00
Xingcan Cui	8ba2b47737	[SPARK-31792][SS][DOCS] Introduce the structured streaming UI in the Web UI doc ### What changes were proposed in this pull request? This PR adds the structured streaming UI introduction to the Web UI doc. ![image](https://user-images.githubusercontent.com/1452518/82642209-92b99380-9bdb-11ea-9a0d-cbb26040b0ef.png) ### Why are the changes needed? The structured streaming web UI introduced before was missing from the Web UI documentation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N.A. Closes #28609 from xccui/ss-ui-doc. Authored-by: Xingcan Cui <xccui@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-26 14:27:42 +09:00
Kent Yao	695cb617d4	[SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q' ### What changes were proposed in this pull request? Five continuous pattern characters with 'G/M/L/E/u/Q/q' means Narrow-Text Style while we turn to use `java.time.DateTimeFormatterBuilder` since 3.0.0, which output the leading single letter of the value, e.g. `December` would be `D`. In Spark 2.4 they mean Full-Text Style. In this PR, we explicitly disable Narrow-Text Style for these pattern characters. ### Why are the changes needed? Without this change, there will be a silent data change. ### Does this PR introduce _any_ user-facing change? Yes, queries with datetime operations using datetime patterns, e.g. `G/M/L/E/u` will fail if the pattern length is 5 and other patterns, e,g. 'k', 'm' also can accept a certain number of letters. 1. datetime patterns that are not supported by the new parser but the legacy will get SparkUpgradeException, e.g. "GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "aa", "aaa". 2 options are given to end-users, one is to use legacy mode, and the other is to follow the new online doc for correct datetime patterns 2, datetime patterns that are not supported by both the new parser and the legacy, e.g. "QQQQQ", "qqqqq", will get IllegalArgumentException which is captured by Spark internally and results NULL to end-users. ### How was this patch tested? add unit tests Closes #28592 from yaooqinn/SPARK-31771. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-25 15:07:41 +00:00
Huaxin Gao	ad9532a09c	[SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref ### What changes were proposed in this pull request? Fix a few issues in SQL Reference ### Why are the changes needed? To make SQL Reference look better ### Does this PR introduce _any_ user-facing change? Yes. before: <img width="189" alt="Screen Shot 2020-05-21 at 11 41 34 PM" src="https://user-images.githubusercontent.com/13592258/82639052-d0f38a80-9bbc-11ea-81a4-22def4ca5cc0.png"> after: <img width="195" alt="Screen Shot 2020-05-21 at 11 41 17 PM" src="https://user-images.githubusercontent.com/13592258/82639063-d5b83e80-9bbc-11ea-84d1-8361e6bee949.png"> before: <img width="763" alt="Screen Shot 2020-05-21 at 11 45 22 PM" src="https://user-images.githubusercontent.com/13592258/82639252-3e9fb680-9bbd-11ea-863c-e6a6c2f83a06.png"> after: <img width="724" alt="Screen Shot 2020-05-21 at 11 45 02 PM" src="https://user-images.githubusercontent.com/13592258/82639265-42cbd400-9bbd-11ea-8df2-fc5c255b84d3.png"> before: <img width="437" alt="Screen Shot 2020-05-21 at 11 41 57 PM" src="https://user-images.githubusercontent.com/13592258/82639072-db158900-9bbc-11ea-9963-731881cda4fd.png"> after <img width="347" alt="Screen Shot 2020-05-21 at 11 42 26 PM" src="https://user-images.githubusercontent.com/13592258/82639082-dfda3d00-9bbc-11ea-9bd2-f922cc91f175.png"> ### How was this patch tested? Manually build and check Closes #28608 from huaxingao/doc_fix. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-05-23 08:43:16 +09:00
GuoPhilipse	892b600ce3	[SPARK-31790][DOCS] cast(long as timestamp) show different result between Hive and Spark ### What changes were proposed in this pull request? add docs for sql migration-guide ### Why are the changes needed? let user know more about the cast scenarios in which Hive and Spark generate different results ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? no need to test Closes #28605 from GuoPhilipse/spark-docs. Lead-authored-by: GuoPhilipse <guofei_ok@126.com> Co-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-22 22:01:38 +09:00
Izek Greenfield	eaf7a2a4ed	[SPARK-8981][CORE][TEST-HADOOP3.2][TEST-JAVA11] Add MDC support in Executor ### What changes were proposed in this pull request? Added MDC support in all thread pools. ThreaddUtils create new pools that pass over MDC. ### Why are the changes needed? In many cases, it is very hard to understand from which actions the logs in the executor come from. when you are doing multi-thread work in the driver and send actions in parallel. ### Does this PR introduce any user-facing change? No ### How was this patch tested? No test added because no new functionality added it is thread pull change and all current tests pass. Closes #26624 from igreenfield/master. Authored-by: Izek Greenfield <igreenfield@axiomsl.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-20 07:41:00 +00:00
Max Gekk	b3686a7622	[SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters ### What changes were proposed in this pull request? 1. Describe standard 'M' and stand-alone 'L' text forms 2. Add examples for all supported number of month letters <img width="1047" alt="Screenshot 2020-05-18 at 08 57 31" src="https://user-images.githubusercontent.com/1580697/82178856-b16f1000-98e5-11ea-87c0-456ef94dcd43.png"> ### Why are the changes needed? To improve docs and show how to use month patterns. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By building docs and checking by eyes. Closes #28558 from MaxGekk/describe-L-M-date-pattern. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-18 12:07:01 +00:00
Jungtaek Lim (HeartSaVioR)	d2bec5e265	[SPARK-31707][SQL] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax ### What changes were proposed in this pull request? This patch effectively reverts SPARK-30098 via below changes: * Removed the config * Removed the changes done in parser rule * Removed the usage of config in tests * Removed tests which depend on the config * Rolled back some tests to before SPARK-30098 which were affected by SPARK-30098 * Reflect the change into docs (migration doc, create table syntax) ### Why are the changes needed? SPARK-30098 brought confusion and frustration on using create table DDL query, and we agreed about the bad effect on the change. Please go through the [discussion thread](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html) to see the details. ### Does this PR introduce _any_ user-facing change? No, compared to Spark 2.4.x. End users tried to experiment with Spark 3.0.0 previews will see the change that the behavior is going back to Spark 2.4.x, but I believe we won't guarantee compatibility in preview releases. ### How was this patch tested? Existing UTs. Closes #28517 from HeartSaVioR/revert-SPARK-30098. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-05-17 02:27:23 +00:00
Huaxin Gao	194ac3be8b	[SPARK-31708][ML][DOCS] Add docs and examples for ANOVASelector and FValueSelector ### What changes were proposed in this pull request? Add docs and examples for ANOVASelector and FValueSelector ### Why are the changes needed? Complete the implementation of ANOVASelector and FValueSelector ### Does this PR introduce _any_ user-facing change? Yes <img width="850" alt="Screen Shot 2020-05-13 at 5 17 44 PM" src="https://user-images.githubusercontent.com/13592258/81878703-b4f94480-953d-11ea-9166-da3c64852b90.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 05 15 PM" src="https://user-images.githubusercontent.com/13592258/81878600-6055c980-953d-11ea-8b24-09c31647139b.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 06 06 PM" src="https://user-images.githubusercontent.com/13592258/81878603-621f8d00-953d-11ea-9447-39913ccc067d.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 06 21 PM" src="https://user-images.githubusercontent.com/13592258/81878606-65b31400-953d-11ea-9d76-51859266d1a8.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 07 10 PM" src="https://user-images.githubusercontent.com/13592258/81878611-69df3180-953d-11ea-8618-23a2a6cfd730.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 07 33 PM" src="https://user-images.githubusercontent.com/13592258/81878620-6cda2200-953d-11ea-9c46-da763328364e.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 07 47 PM" src="https://user-images.githubusercontent.com/13592258/81878625-6f3c7c00-953d-11ea-9d11-2281b33a0bd8.png"> <img width="851" alt="Screen Shot 2020-05-13 at 5 19 35 PM" src="https://user-images.githubusercontent.com/13592258/81878882-13bebe00-953e-11ea-9776-288bac97d93f.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 08 42 PM" src="https://user-images.githubusercontent.com/13592258/81878637-76638a00-953d-11ea-94b0-dc9bc85ae2b7.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 09 01 PM" src="https://user-images.githubusercontent.com/13592258/81878640-79f71100-953d-11ea-9a66-b27f9482fbd3.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 09 50 PM" src="https://user-images.githubusercontent.com/13592258/81878644-7cf20180-953d-11ea-9142-9658c8e90986.png"> <img width="851" alt="Screen Shot 2020-05-13 at 5 10 06 PM" src="https://user-images.githubusercontent.com/13592258/81878653-81b6b580-953d-11ea-9dc2-8015095cf569.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 10 59 PM" src="https://user-images.githubusercontent.com/13592258/81878658-854a3c80-953d-11ea-8dc9-217aa749fd00.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 11 27 PM" src="https://user-images.githubusercontent.com/13592258/81878659-87ac9680-953d-11ea-8c6b-74ab76748e4a.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 14 54 PM" src="https://user-images.githubusercontent.com/13592258/81878664-8b401d80-953d-11ea-9ee1-05f6677e263c.png"> <img width="850" alt="Screen Shot 2020-05-13 at 5 15 17 PM" src="https://user-images.githubusercontent.com/13592258/81878669-8da27780-953d-11ea-8216-77eb8bb7e091.png"> ### How was this patch tested? Manually build and check Closes #28524 from huaxingao/examples. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-15 09:59:14 -05:00
Dongjoon Hyun	7ce3f76af6	[SPARK-31696][DOCS][FOLLOWUP] Update version in documentation # What changes were proposed in this pull request? This PR is a follow-up to fix a version of configuration document. ### Why are the changes needed? The original PR is backported to branch-3.0. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manual. Closes #28530 from dongjoon-hyun/SPARK-31696-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-14 10:25:22 -07:00
Dongjoon Hyun	c8f3bd861d	[SPARK-31696][K8S] Support driver service annotation in K8S ### What changes were proposed in this pull request? This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`. ### Why are the changes needed? Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation. - https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations ### Does this PR introduce _any_ user-facing change? Yes. The documentation is added. ### How was this patch tested? Pass Jenkins with the updated unit tests. Closes #28518 from dongjoon-hyun/SPARK-31696. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-13 13:59:42 -07:00
HyukjinKwon	e1315cd656	[SPARK-31701][R][SQL] Bump up the minimum Arrow version as 0.15.1 in SparkR ### What changes were proposed in this pull request? This PR proposes to set the minimum Arrow version as 0.15.1 to be consistent with PySpark side at. ### Why are the changes needed? It will reduce the maintenance overhead to match the Arrow versions, and minimize the supported range. SparkR Arrow optimization is experimental yet. ### Does this PR introduce _any_ user-facing change? No, it's the change in unreleased branches only. ### How was this patch tested? 0.15.x was already tested at SPARK-29378, and we're testing the latest version of SparkR currently in AppVeyor. I already manually tested too. Closes #28520 from HyukjinKwon/SPARK-31701. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-13 10:03:12 -07:00
Antonin Delpeuch	59d90997a5	[MINOR][DOCS] Mention lack of RDD order preservation after deserialization ### What changes were proposed in this pull request? This changes the docs to make it clearer that order preservation is not guaranteed when saving a RDD to disk and reading it back ([SPARK-5300](https://issues.apache.org/jira/browse/SPARK-5300)). I added two sentences about this in the RDD Programming Guide. The issue was discussed on the dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-order-guarantees-td10142.html ### Why are the changes needed? Because RDDs are order-aware collections, it is natural to expect that if I use `saveAsTextFile` and then load the resulting file with `sparkContext.textFile`, I obtain a RDD in the same order. This is unfortunately not the case at the moment and there is no agreed upon way to fix this in Spark itself (see PR #4204 which attempted to fix this). Users should be aware of this. ### Does this PR introduce _any_ user-facing change? Yes, two new sentences in the documentation. ### How was this patch tested? By checking that the documentation looks good. Closes #28465 from wetneb/SPARK-5300-docs. Authored-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-12 08:27:43 -05:00
Dongjoon Hyun	b80309bdb4	[SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental ### What changes were proposed in this pull request? This PR aims to new Prometheus-format metric endpoints experimental in Apache Spark 3.0.0. ### Why are the changes needed? Although the new metrics are disabled by default, we had better make it experimental explicitly in Apache Spark 3.0.0 since the output format is still not fixed. We can finalize it in Apache Spark 3.1.0. ### Does this PR introduce _any_ user-facing change? Only doc-change is visible to the users. ### How was this patch tested? Manually check the code since this is a documentation and class annotation change. Closes #28495 from dongjoon-hyun/SPARK-31674. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-10 22:32:26 -07:00
Huaxin Gao	a75dc80a76	[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference ### What changes were proposed in this pull request? Remove the unneeded embedded inline HTML markup by using the basic markdown syntax. Please see #28414 ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually build and check Closes #28451 from huaxingao/html_cleanup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-10 12:57:25 -05:00
Huaxin Gao	08335b651a	[SPARK-31659][ML][DOCS] Add VarianceThresholdSelector examples and doc ### What changes were proposed in this pull request? Add VarianceThresholdSelector examples and doc ### Why are the changes needed? VarianceThresholdSelector is a new feature selector in 3.1.0. We need to add examples and doc ### Does this PR introduce _any_ user-facing change? Yes. add Scala, Python and Java examples for VarianceThresholdSelector. Also add doc <img width="860" alt="Screen Shot 2020-05-07 at 9 20 01 AM" src="https://user-images.githubusercontent.com/13592258/81321791-e3f84d80-9047-11ea-837b-e39c193bd437.png"> <img width="860" alt="Screen Shot 2020-05-07 at 9 20 44 AM" src="https://user-images.githubusercontent.com/13592258/81321806-e8246b00-9047-11ea-8f35-206e330a92ab.png"> <img width="860" alt="Screen Shot 2020-05-07 at 9 21 27 AM" src="https://user-images.githubusercontent.com/13592258/81321822-ea86c500-9047-11ea-8743-99adec7f502b.png"> <img width="860" alt="Screen Shot 2020-05-07 at 9 21 43 AM" src="https://user-images.githubusercontent.com/13592258/81321826-ec508880-9047-11ea-9e7a-22ee5e13f495.png"> ### How was this patch tested? Manually checked Closes #28478 from huaxingao/variance_doc. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>	2020-05-08 10:57:35 +08:00
wang-zhun	f3891e377f	[SPARK-31235][YARN] Separates different categories of applications ### What changes were proposed in this pull request? This PR adds `spark.yarn.applicationType` to identify the application type ### Why are the changes needed? The current application defaults to the SPARK type. In fact, different types of applications have different characteristics and are suitable for different scenarios.For example: SPAKR-SQL, SPARK-STREAMING. I recommend distinguishing them by the parameter `spark.yarn.applicationType` so that we can more easily manage and maintain different types of applications. ### How was this patch tested? 1.add UT 2.Tested by verifying Yarn-UI `ApplicationType` in the following cases: - client and cluster mode Need additional explanation: limit cannot exceed 20 characters, can be empty or space The reasons are as follows: ``` // org.apache.hadoop.yarn.server.resourcemanager.submitApplication. if (submissionContext.getApplicationType() == null) { submissionContext .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE); } else { // APPLICATION_TYPE_LENGTH = 20 if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) { submissionContext.setApplicationType(submissionContext .getApplicationType().substring(0, YarnConfiguration.APPLICATION_TYPE_LENGTH)); } } ``` Closes #28009 from wang-zhun/SPARK-31235. Authored-by: wang-zhun <wangzhun6103@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-05-05 08:40:57 -05:00
Dilip Biswal	5052d9557d	[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - sql-ref-ansi-compliance.md <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - sql-ref-datatypes.md (Scala) <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - sql-ref-datatypes.md (Java) <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - sql-ref-datatypes.md (Python) <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - sql-ref-datatypes.md (R) <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - sql-ref-datatypes.md (SQL) <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - sql-ref-syntax-qry-select-tvf.md <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-05-05 15:21:14 +09:00
Max Gekk	372ccba063	[SPARK-31639] Revert SPARK-27528 Use Parquet logical type TIMESTAMP_MICROS by default ### What changes were proposed in this pull request? This reverts commit `43a73e387c`. It sets `INT96` as the timestamp type while saving timestamps to parquet files. ### Why are the changes needed? To be compatible with Hive and Presto that don't support the `TIMESTAMP_MICROS` type in current stable releases. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing test suites. Closes #28450 from MaxGekk/parquet-int96. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-04 17:27:02 -07:00
Kazuaki Ishizaki	35fcc8d5c5	[MINOR][DOCS] Fix typo in documents ### What changes were proposed in this pull request? Fixed typo in `docs` directory and in `project/MimaExcludes.scala` ### Why are the changes needed? Better readability of documents ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No test needed Closes #28447 from kiszk/typo_20200504. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-04 16:53:50 +09:00
Huaxin Gao	75da05038b	[MINOR][SQL][DOCS] Remove two leading spaces from sql tables ### What changes were proposed in this pull request? Remove two leading spaces from sql tables. ### Why are the changes needed? Follow the format of other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html. ### Does this PR introduce any user-facing change? before ``` SELECT * FROM test; +-+ ... +-+ ``` after ``` SELECT * FROM test; +-+ ... +-+ ``` ### How was this patch tested? Manually build and check Closes #28348 from huaxingao/sql-format. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2020-05-01 10:11:43 -07:00
Xingbo Jiang	b7cde42b04	[SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout" ### What changes were proposed in this pull request? The "spark.dynamicAllocation.shuffleTimeout" configuration only takes effect if "spark.dynamicAllocation.shuffleTracking.enabled" is true, so we should re-namespace that configuration so that it's nested under the "shuffleTracking" one. ### How was this patch tested? Covered by current existing test cases. Closes #28426 from jiangxb1987/confName. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-01 11:46:17 +09:00
Huaxin Gao	2410a45703	[SPARK-31612][SQL][DOCS] SQL Reference clean up ### What changes were proposed in this pull request? SQL Reference cleanup ### Why are the changes needed? To complete SQL Reference ### Does this PR introduce _any_ user-facing change? updated sql-ref-syntax-qry.html before <img width="1100" alt="Screen Shot 2020-04-29 at 11 08 25 PM" src="https://user-images.githubusercontent.com/13592258/80677799-70b27280-8a6e-11ea-8e3f-a768f29d0377.png"> after <img width="1100" alt="Screen Shot 2020-04-29 at 11 05 55 PM" src="https://user-images.githubusercontent.com/13592258/80677803-74de9000-8a6e-11ea-880c-aa05c53254a6.png"> ### How was this patch tested? Manually build and check Closes #28417 from huaxingao/cleanup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-05-01 06:30:35 +09:00
Xiao Li	b5ecc41c73	[SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file ### What changes were proposed in this pull request? This PR is to clean up the markdown file in SHOW COLUMNS page. - remove the unneeded embedded inline HTML markup by using the basic markdown syntax. - use the ``` sql for highlighting the SQL syntax. ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Before ![Screen Shot 2020-04-29 at 5 20 11 PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png) After ![Screen Shot 2020-04-29 at 6 03 50 PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png) Closes #28414 from gatorsmile/cleanupShowColumns. Lead-authored-by: Xiao Li <gatorsmile@gmail.com> Co-authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2020-04-30 09:34:56 -07:00
Yuanjian Li	7195a18bf2	[SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests ### What changes were proposed in this pull request? - Rephrase the API doc for `Column.as` - Simplify the UTs ### Why are the changes needed? Address comments in https://github.com/apache/spark/pull/28326 ### Does this PR introduce any user-facing change? No ### How was this patch tested? New UT added. Closes #28390 from xuanyuanking/SPARK-27340-follow. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-30 06:24:00 +00:00
gatorsmile	f56c6630fb	[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in datetime-pattern page. - Replace HTML table by MD table ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Before ![Screen Shot 2020-04-29 at 7 59 10 PM](https://user-images.githubusercontent.com/11567269/80668093-c9294600-8a55-11ea-9dca-d558203298f8.png) After ![Screen Shot 2020-04-29 at 8 13 38 PM](https://user-images.githubusercontent.com/11567269/80668146-f1b14000-8a55-11ea-8d47-8dc8a0378271.png) Closes #28415 from gatorsmile/cleanupUDFPage. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-30 05:47:42 +00:00
DB Tsai	ecfee82fda	[SPARK-31582][YARN] Being able to not populate Hadoop classpath ### What changes were proposed in this pull request? We are adding a new Spark Yarn configuration, `spark.yarn.populateHadoopClasspath` to not populate Hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath`. ### Why are the changes needed? Spark Yarn client populates extra Hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to a Yarn Hadoop cluster. However, for `with-hadoop` Spark build that embeds Hadoop runtime, it can cause jar conflicts because Spark distribution can contain different version of Hadoop jars. One case we have is when a user uses an Apache Spark distribution with its-own embedded hadoop, and submits a job to Cloudera or Hortonworks Yarn clusters, because of two different incompatible Hadoop jars in the classpath, it runs into errors. By not populating the Hadoop classpath from the clusters can address this issue. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? An UT is added, but very hard to add a new integration test since this requires using different incompatible versions of Hadoop. We also manually tested this PR, and we are able to submit a Spark job using Spark distribution built with Apache Hadoop 2.10 to CDH 5.6 without populating CDH classpath. Closes #28376 from dbtsai/yarn-classpath. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-04-29 21:10:40 +00:00
Terry Kim	36803031e8	[SPARK-30282][SQL][FOLLOWUP] SHOW TBLPROPERTIES should support views ### What changes were proposed in this pull request? This PR addresses two things: - `SHOW TBLPROPERTIES` should supports view (a regression introduced by #26921) - `SHOW TBLPROPERTIES` on a temporary view should return empty result (2.4 behavior instead of throwing `AnalysisException`. ### Why are the changes needed? It's a bug. ### Does this PR introduce any user-facing change? Yes, now `SHOW TBLPROPERTIES` works on views: ``` scala> sql("CREATE VIEW view TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1") scala> sql("SHOW TBLPROPERTIES view").show(truncate=false) +---------------------------------+-------------+ \|key \|value \| +---------------------------------+-------------+ \|view.catalogAndNamespace.numParts\|2 \| \|view.query.out.col.0 \|c1 \| \|view.query.out.numCols \|1 \| \|p2 \|v2 \| \|view.catalogAndNamespace.part.0 \|spark_catalog\| \|p1 \|v1 \| \|view.catalogAndNamespace.part.1 \|default \| +---------------------------------+-------------+ ``` And for a temporary view: ``` scala> sql("CREATE TEMPORARY VIEW tview TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1") scala> sql("SHOW TBLPROPERTIES tview").show(truncate=false) +---+-----+ \|key\|value\| +---+-----+ +---+-----+ ``` ### How was this patch tested? Added tests. Closes #28375 from imback82/show_tblproperties_followup. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-29 07:06:45 +00:00
Kent Yao	295d866969	[SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc ### What changes were proposed in this pull request? This PR adds `-Phive` profile to the pre-build phase to build the hive module to dev classpath. Then reflect the HiveUtils object to dump all configurations in the class. ### Why are the changes needed? supply SQL configurations from hive module to doc ### Does this PR introduce any user-facing change? NO ### How was this patch tested? passing Jenkins add verified locally ![image](https://user-images.githubusercontent.com/8326978/80492333-6fae1200-8996-11ea-99fd-595ee18c67e5.png) Closes #28394 from yaooqinn/SPARK-31596. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-29 15:34:45 +09:00
Huaxin Gao	d34cb59fb3	[SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference ### What changes were proposed in this pull request? Document LIKE clause in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-25 at 5 49 57 PM" src="https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png"> <img width="1050" alt="Screen Shot 2020-04-25 at 5 50 24 PM" src="https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png"> <img width="1050" alt="Screen Shot 2020-04-25 at 5 50 42 PM" src="https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png"> ### How was this patch tested? Manually build and check Closes #28332 from huaxingao/where_clause. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-29 09:17:23 +09:00
Huaxin Gao	dcc09022f1	[SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started ### What changes were proposed in this pull request? Add a paragraph for scalar function in sql getting started ### Why are the changes needed? To make 3.0 doc complete. ### Does this PR introduce any user-facing change? before: <img width="870" alt="Screen Shot 2020-04-21 at 10 11 12 PM" src="https://user-images.githubusercontent.com/13592258/79943182-16d1fd00-841d-11ea-9744-9cdd58d83f81.png"> after: <img width="865" alt="Screen Shot 2020-04-22 at 11 49 59 PM" src="https://user-images.githubusercontent.com/13592258/80068256-26704500-84f4-11ea-9845-c835927c027e.png"> <img width="1033" alt="Screen Shot 2020-04-23 at 6 22 53 PM" src="https://user-images.githubusercontent.com/13592258/80165100-82d47280-858f-11ea-8c84-1ef702cc1bff.png"> ### How was this patch tested? Closes #28290 from huaxingao/scalar. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-28 11:17:45 -05:00
Huaxin Gao	7735db2a27	[SPARK-31569][SQL][DOCS] Add links to subsections in SQL Reference main page ### What changes were proposed in this pull request? Add links to subsections in SQL Reference main page ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes before: <img width="1050" alt="Screen Shot 2020-04-26 at 10 52 42 PM" src="https://user-images.githubusercontent.com/13592258/80338238-a9551080-8810-11ea-8ae8-d6707fde2cac.png"> after: <img width="1050" alt="Screen Shot 2020-04-26 at 10 51 58 PM" src="https://user-images.githubusercontent.com/13592258/80338241-ac500100-8810-11ea-8518-95c4f8c0a2eb.png"> ### How was this patch tested? Manually build and check. Closes #28360 from huaxingao/sql-ref. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-27 09:45:00 -05:00
Kent Yao	5ba467ca1d	[SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc ### What changes were proposed in this pull request? ```scala spark.sql.session.timeZone spark.sql.warehouse.dir ``` these 2 configs are nondeterministic and vary with environments Besides, reflect code in `gen-sql-config-docs.py` via https://github.com/apache/spark/pull/28274#discussion_r412893096 and `configuration.md` via https://github.com/apache/spark/pull/28274#discussion_r412894905 ### Why are the changes needed? doc fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? verify locally ![image](https://user-images.githubusercontent.com/8326978/80179099-5e7da200-8632-11ea-803f-d47a93151869.png) Closes #28322 from yaooqinn/SPARK-31550. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-27 17:08:52 +09:00
HyukjinKwon	5dd581c88a	[SPARK-29664][PYTHON][SQL][FOLLOW-UP] Add deprecation warnings for getItem instead ### What changes were proposed in this pull request? This PR proposes to use a different approach instead of breaking it per Micheal's rubric added at https://spark.apache.org/versioning-policy.html. It deprecates the behaviour for now. It will be gradually removed in the future releases. After this change, ```python import warnings warnings.simplefilter("always") from pyspark.sql.functions import * df = spark.range(2) map_col = create_map(lit(0), lit(100), lit(1), lit(200)) df.withColumn("mapped", map_col.getItem(col('id'))).show() ``` ``` /.../python/pyspark/sql/column.py:311: DeprecationWarning: A column as 'key' in getItem is deprecated as of Spark 3.0, and will not be supported in the future release. Use `column[key]` or `column.key` syntax instead. DeprecationWarning) ... ``` ```python import warnings warnings.simplefilter("always") from pyspark.sql.functions import * df = spark.range(2) struct_col = struct(lit(0), lit(100), lit(1), lit(200)) df.withColumn("struct", struct_col.getField(lit("col1"))).show() ``` ``` /.../spark/python/pyspark/sql/column.py:336: DeprecationWarning: A column as 'name' in getField is deprecated as of Spark 3.0, and will not be supported in the future release. Use `column[name]` or `column.name` syntax instead. DeprecationWarning) ``` ### Why are the changes needed? To prevent the radical behaviour change after the amended versioning policy. ### Does this PR introduce any user-facing change? Yes, it will show the deprecated warning message. ### How was this patch tested? Manually tested. Closes #28327 from HyukjinKwon/SPARK-29664. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-27 14:49:22 +09:00
Wei Zhang	3e83ccc5d8	[SPARK-31516][DOC] Fix non-existed metric hiveClientCalls.count of CodeGenerator in DOC ### What changes were proposed in this pull request? This PR proposes to remove the non-existed `hiveClientCalls.count` metric documentation of `CodeGenerator` of the Spark metrics system in the monitoring guide. There is a duplicated `hiveClientCalls.count` metric in both `namespace=HiveExternalCatalog` and `namespace=CodeGenerator` bullet lists, but there is only one defined inside object `HiveCatalogMetrics`. Closes #28292 from wezhang/monitoringdoc. Authored-by: Wei Zhang <wezhang@outlook.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-24 21:52:50 -07:00
Huaxin Gao	054bef94ca	[SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values ### What changes were proposed in this pull request? Re-arrange Data Types page to document Floating Point Special Values ### Why are the changes needed? To complete SQL Reference ### Does this PR introduce any user-facing change? Yes - add Floating Point Special Values in Data Types page - move NaN Semantics to Data Types page <img width="1050" alt="Screen Shot 2020-04-24 at 9 14 57 AM" src="https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png"> <img width="1050" alt="Screen Shot 2020-04-24 at 9 15 22 AM" src="https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png"> <img width="1049" alt="Screen Shot 2020-04-24 at 9 15 44 AM" src="https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png"> ### How was this patch tested? Manually build and check Closes #28264 from huaxingao/datatypes. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-25 09:02:16 +09:00
yi.wu	463c54419b	[SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for untyped scala udf ### What changes were proposed in this pull request? Give more friendly warning message/migration guide of deprecated scala udf to users. ### Why are the changes needed? User can not distinguish function signature between typed and untyped scala udf. Instead, we shall tell user what to do directly. ### Does this PR introduce any user-facing change? No, it's newly added in Spark 3.0. ### How was this patch tested? Pass Jenkins. Closes #28311 from Ngone51/update_udf_doc. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-24 19:10:18 +09:00
Huaxin Gao	b14b980ab8	[SPARK-31502][SQL][DOCS] Document identifier in SQL Reference ### What changes were proposed in this pull request? Document identifier in SQL Reference ### Why are the changes needed? make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-23 at 11 14 10 PM" src="https://user-images.githubusercontent.com/13592258/80180695-2f2a4f00-85b8-11ea-819b-f96872956d05.png"> <img width="1050" alt="Screen Shot 2020-04-23 at 11 32 32 PM" src="https://user-images.githubusercontent.com/13592258/80182062-e6c06080-85ba-11ea-9502-1c38358c97c9.png"> ### How was this patch tested? Manually build and check Closes #28277 from huaxingao/identifier. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-24 08:05:27 +00:00
yi.wu	6c018b31e2	[SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator v1 APIs ### What changes were proposed in this pull request? Add migration guide for removed accumulator v1 APIs. ### Why are the changes needed? Provide better guidance for users' migration. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass Jenkins. Closes #28309 from Ngone51/SPARK-16775-migration-guide. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-23 10:59:35 +00:00
Huaxin Gao	f543d6a1ee	[SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference ### What changes were proposed in this pull request? Need to address a few more comments ### Why are the changes needed? Fix a few problems ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Manually build and check Closes #28306 from huaxingao/literal-folllowup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-23 15:03:20 +09:00
Huaxin Gao	03fe9ee428	[SPARK-31465][SQL][DOCS] Document Literal in SQL Reference ### What changes were proposed in this pull request? Document Literal in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-22 at 8 50 04 PM" src="https://user-images.githubusercontent.com/13592258/80057912-9ecb0c00-84dc-11ea-881e-1415108d674f.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 50 29 PM" src="https://user-images.githubusercontent.com/13592258/80057917-a12d6600-84dc-11ea-8884-81f2a94644d5.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 50 54 PM" src="https://user-images.githubusercontent.com/13592258/80057922-a4c0ed00-84dc-11ea-9857-75db50f7b054.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 51 15 PM" src="https://user-images.githubusercontent.com/13592258/80057927-a7234700-84dc-11ea-9124-45ae1f6143fd.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 51 44 PM" src="https://user-images.githubusercontent.com/13592258/80057932-ab4f6480-84dc-11ea-8393-cf005af13ce9.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 52 03 PM" src="https://user-images.githubusercontent.com/13592258/80057936-ad192800-84dc-11ea-8d78-9f071a82f1df.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 52 28 PM" src="https://user-images.githubusercontent.com/13592258/80057940-b0141880-84dc-11ea-97a7-f787cad0ee03.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 53 14 PM" src="https://user-images.githubusercontent.com/13592258/80057945-b30f0900-84dc-11ea-985f-c070609e2329.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 53 34 PM" src="https://user-images.githubusercontent.com/13592258/80057949-b5716300-84dc-11ea-9452-3f51137fe03d.png"> <img width="1050" alt="Screen Shot 2020-04-22 at 8 53 56 PM" src="https://user-images.githubusercontent.com/13592258/80057957-b904ea00-84dc-11ea-8b12-a6f00362aa55.png"> <img width="1049" alt="Screen Shot 2020-04-22 at 8 54 12 PM" src="https://user-images.githubusercontent.com/13592258/80057962-bacead80-84dc-11ea-94da-916b1d1c1756.png"> ### How was this patch tested? Manually build and check Closes #28237 from huaxingao/literal. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-23 14:12:10 +09:00
Kent Yao	2c2062ea7c	[SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation ### What changes were proposed in this pull request? Currently, only the non-static public SQL configurations are dump to public doc, we'd better also add those static public ones as the command `set -v` This PR force call StaticSQLConf to buildStaticConf. ### Why are the changes needed? Fix missing SQL configurations in doc ### Does this PR introduce any user-facing change? NO ### How was this patch tested? add unit test and verify locally to see if public static SQL conf is in `docs/sql-config.html` Closes #28274 from yaooqinn/SPARK-31498. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-22 10:16:39 +00:00
Takeshi Yamamuro	e42dbe7cd4	[SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions ### What changes were proposed in this pull request? This PR intends to add a Python script to generates a SQL document for built-in functions and the document in SQL references. ### Why are the changes needed? To make SQL references complete. ### Does this PR introduce any user-facing change? Yes; ![a](https://user-images.githubusercontent.com/692303/79406712-c39e1b80-7fd2-11ea-8b85-9f9cbb6efed3.png) ![b](https://user-images.githubusercontent.com/692303/79320526-eb46a280-7f44-11ea-8639-90b1fb2b8848.png) ![c](https://user-images.githubusercontent.com/692303/79320707-3365c500-7f45-11ea-9984-69ffe800fb87.png) ### How was this patch tested? Manually checked and added tests. Closes #28224 from maropu/SPARK-31429. Lead-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-21 10:55:13 +09:00
Yuming Wang	b11e42663b	[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7 ### What changes were proposed in this pull request? Hive 2.3.7 fixed these issues: - HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer - HIVE-21980:Parsing time can be high in case of deeply nested subqueries - HIVE-22249: Support Parquet through HCatalog ### Why are the changes needed? Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840) - [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353) - [x] Manual test with remote hive metastore. Hive side: ``` export JAVA_HOME=/usr/lib/jdk1.8.0_221 export PATH=$JAVA_HOME/bin:$PATH cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6 bin/schematool -dbType derby -initSchema --verbose bin/hive --service metastore ``` Spark side: ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 ``` Closes #28148 from wangyum/SPARK-31381. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-20 13:38:24 -07:00
gatorsmile	6c792a79c1	[SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration ### What changes were proposed in this pull request? This PR is the follow-up PR of https://github.com/apache/spark/pull/28003 - add a migration guide - add an end-to-end test case. ### Why are the changes needed? The original PR made the major behavior change in the user-facing RESET command. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added a new end-to-end test Closes #28265 from gatorsmile/spark-31234followup. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2020-04-20 13:08:55 -07:00
Huaxin Gao	142f43629c	[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section ### What changes were proposed in this pull request? Document Window Function in SQL syntax ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-16 at 9 13 34 PM" src="https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png"> <img width="1050" alt="Screen Shot 2020-04-16 at 9 14 12 PM" src="https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png"> <img width="1050" alt="Screen Shot 2020-04-16 at 9 14 45 PM" src="https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png"> <img width="1050" alt="Screen Shot 2020-04-16 at 9 15 10 PM" src="https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png"> <img width="1050" alt="Screen Shot 2020-04-16 at 9 15 25 PM" src="https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png"> ### How was this patch tested? Manually build and check Closes #28220 from huaxingao/sql-win-fun. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-18 09:31:52 +09:00
Dongjoon Hyun	fde996be87	[SPARK-31394][DOC][FOLLOWUP] Add nfs volume type description ### What changes were proposed in this pull request? This adds newly supported `nfs` volume type description into the document for Apache Spark 3.1.0. ### Why are the changes needed? To complete the document. ### Does this PR introduce any user-facing change? Yes. (Doc) ![nfs_screen_shot](https://user-images.githubusercontent.com/9700541/79530887-8f077f80-8025-11ea-8cc1-e0b551802d5d.png) ### How was this patch tested? Manually generate doc and check it. ``` SKIP_API=1 jekyll build ``` Closes #28236 from dongjoon-hyun/SPARK-NFS-DOC. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-17 12:07:34 -07:00
Huaxin Gao	92c1b24617	[SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference ### What changes were proposed in this pull request? Document Common Table Expression in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-13 at 12 06 35 AM" src="https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png"> <img width="1050" alt="Screen Shot 2020-04-13 at 12 07 09 AM" src="https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png"> <img width="1050" alt="Screen Shot 2020-04-13 at 12 07 35 AM" src="https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png"> Also link to Select page <img width="1045" alt="Screen Shot 2020-04-12 at 4 14 30 PM" src="https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png"> ### How was this patch tested? Manually build and check Closes #28196 from huaxingao/cte. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-16 08:34:26 +09:00
yi.wu	0d4e4df061	[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2020-04-15 11:29:55 -07:00
Huaxin Gao	46be1e01e9	[SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF ### What changes were proposed in this pull request? Add a SQL example for UDAF ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes. Add the following page, also change ```Sql``` to ```SQL``` in the example tab for all the sql examples. <img width="1110" alt="Screen Shot 2020-04-13 at 6 09 24 PM" src="https://user-images.githubusercontent.com/13592258/79175240-06cd7400-7db2-11ea-8f3e-af71a591a64b.png"> ### How was this patch tested? Manually build and check Closes #28209 from huaxingao/udf_followup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-14 13:29:44 +09:00
Takeshi Yamamuro	853c6c9909	[SPARK-31434][SQL][DOCS] Drop builtin function pages from SQL references ### What changes were proposed in this pull request? This PR intends to drop the built-in function pages from SQL references. We've already had a complete list of built-in functions in the API documents. See related discussions for more details: https://github.com/apache/spark/pull/28170#issuecomment-611917191 ### Why are the changes needed? For better SQL documents. ### Does this PR introduce any user-facing change? ![functions](https://user-images.githubusercontent.com/692303/79109009-793e5400-7db2-11ea-8cb7-4c3cf31ccb77.png) ### How was this patch tested? Manually checked. Closes #28203 from maropu/DropBuiltinFunctionDocs. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-14 10:22:46 +09:00
Takeshi Yamamuro	179289f0bf	[SPARK-31383][SQL][DOC] Clean up the SQL documents in docs/sql-ref* ### What changes were proposed in this pull request? This PR intends to clean up the SQL documents in `doc/sql-ref`. Main changes are as follows; - Fixes wrong syntaxes and capitalize sub-titles - Adds some DDL queries in `Examples` so that users can run examples there - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format - Adds/Removes spaces, Indents, or blank lines to follow the format below; ``` --- license... --- ### Description Writes what's the syntax is. ### Syntax {% highlight sql %} SELECT... WHERE... // 4 indents after the second line ... {% endhighlight %} ### Parameters <dl> <dt><code><em>Param Name</em></code></dt> <dd> Param Description </dd> ... </dl> ### Examples {% highlight sql %} -- It is better that users are able to execute example queries here. -- So, we prepare test data in the first section if possible. CREATE TABLE t (key STRING, value DOUBLE); INSERT INTO t VALUES ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0); -- query output has 2 indents and it follows the `Dataset.showString` -- format (right-aligned). SELECT FROM t; +---+-----+ \|key\|value\| +---+-----+ \| a\| 1.0\| \| a\| 2.0\| \| b\| 3.0\| \| c\| 4.0\| +---+-----+ -- Query statements after the second line have 4 indents. SELECT key, SUM(value) FROM t GROUP BY key; +---+----------+ \|key\|sum(value)\| +---+----------+ \| c\| 4.0\| \| b\| 3.0\| \| a\| 3.0\| +---+----------+ ... {% endhighlight %} ### Related Statements * [XXX](xxx.html) * ... ``` ### Why are the changes needed? The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Manually checked. Closes #28151 from maropu/MakeRightAligned. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-12 23:40:36 -05:00
Huaxin Gao	310bef1ac7	[SPARK-31419][SQL][DOCS] Document Table-valued Function and Inline Table ### What changes were proposed in this pull request? Document Table-valued Function and Inline Table ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-11 at 5 34 25 PM" src="https://user-images.githubusercontent.com/13592258/79057852-cedff880-7c1a-11ea-9e1e-7882594ab573.png"> <img width="1050" alt="Screen Shot 2020-04-11 at 5 34 46 PM" src="https://user-images.githubusercontent.com/13592258/79057854-d4d5d980-7c1a-11ea-94cc-92ef1121fa43.png"> <img width="1050" alt="Screen Shot 2020-04-10 at 7 36 00 PM" src="https://user-images.githubusercontent.com/13592258/79033391-c2986480-7b62-11ea-9d0a-6c60de823256.png"> <img width="1051" alt="Screen Shot 2020-04-10 at 7 36 21 PM" src="https://user-images.githubusercontent.com/13592258/79033392-c5935500-7b62-11ea-88d4-e7d7812a7add.png"> <img width="1051" alt="Screen Shot 2020-04-11 at 5 09 48 PM" src="https://user-images.githubusercontent.com/13592258/79057555-6ba09700-7c17-11ea-9683-16bbde63a529.png"> Also, linked the newly added pages to select statement <img width="1050" alt="Screen Shot 2020-04-10 at 3 27 59 PM" src="https://user-images.githubusercontent.com/13592258/79027245-5147ba00-7b40-11ea-9b10-527fd9639958.png"> ### How was this patch tested? Manually build and check Closes #28185 from huaxingao/tvf. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-12 23:39:27 -05:00
Huaxin Gao	3bbd80dbc3	[SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference ### What changes were proposed in this pull request? Document UDF in SQL Reference ### Why are the changes needed? To make SQL Reference complete. ### Does this PR introduce any user-facing change? Yes. Here are the new pages: <img width="1050" alt="Screen Shot 2020-04-09 at 5 06 42 PM" src="https://user-images.githubusercontent.com/13592258/78950977-585dc200-7a85-11ea-875c-ce14c3795e0f.png"> <img width="1049" alt="Screen Shot 2020-04-09 at 5 07 06 PM" src="https://user-images.githubusercontent.com/13592258/78950979-5b58b280-7a85-11ea-81f3-bd5d91bd07e3.png"> <img width="1049" alt="Screen Shot 2020-04-09 at 5 07 26 PM" src="https://user-images.githubusercontent.com/13592258/78950985-5e53a300-7a85-11ea-86be-f63152c1501b.png"> <img width="1051" alt="Screen Shot 2020-04-09 at 5 07 54 PM" src="https://user-images.githubusercontent.com/13592258/78950991-63185700-7a85-11ea-9379-8da46cfc434c.png"> <img width="1060" alt="Screen Shot 2020-04-09 at 5 08 17 PM" src="https://user-images.githubusercontent.com/13592258/78950994-657ab100-7a85-11ea-8b34-d2c87f94b03b.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 09 27 PM" src="https://user-images.githubusercontent.com/13592258/78951001-6875a180-7a85-11ea-874e-8abd14a3d3d3.png"> <img width="1060" alt="Screen Shot 2020-04-09 at 5 10 00 PM" src="https://user-images.githubusercontent.com/13592258/78951005-6f041900-7a85-11ea-9e57-520eb8db59ec.png"> <img width="1049" alt="Screen Shot 2020-04-09 at 5 11 10 PM" src="https://user-images.githubusercontent.com/13592258/78951014-73303680-7a85-11ea-93ab-32d68d2e2d59.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 11 41 PM" src="https://user-images.githubusercontent.com/13592258/78951019-75929080-7a85-11ea-9d3b-600e8e157c05.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 16 22 PM" src="https://user-images.githubusercontent.com/13592258/78951137-dfab3580-7a85-11ea-8512-c6b660aa271e.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 22 15 PM" src="https://user-images.githubusercontent.com/13592258/78951466-22214200-7a87-11ea-93dd-6e36492421f1.png"> <img width="1049" alt="Screen Shot 2020-04-09 at 5 22 46 PM" src="https://user-images.githubusercontent.com/13592258/78951469-24839c00-7a87-11ea-93a9-fe30d689adbd.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 23 08 PM" src="https://user-images.githubusercontent.com/13592258/78951472-26e5f600-7a87-11ea-84db-087a3528aa53.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 23 34 PM" src="https://user-images.githubusercontent.com/13592258/78951474-29e0e680-7a87-11ea-8be4-2a5be1bc3788.png"> <img width="1049" alt="Screen Shot 2020-04-09 at 5 23 57 PM" src="https://user-images.githubusercontent.com/13592258/78951481-2cdbd700-7a87-11ea-8894-0a39abf54a3b.png"> <img width="1050" alt="Screen Shot 2020-04-09 at 5 24 15 PM" src="https://user-images.githubusercontent.com/13592258/78951483-2f3e3100-7a87-11ea-8845-ffebf89d7898.png"> ### How was this patch tested? Manually build and check Closes #28087 from huaxingao/udf. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-12 23:38:17 -05:00
Huaxin Gao	fda910d4e2	[SPARK-31348][SQL][DOCS] Document Join in SQL Reference ### What changes were proposed in this pull request? Document join in SQL Reference. ### Why are the changes needed? To make SQL Reference complete. ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-05 at 8 46 47 PM" src="https://user-images.githubusercontent.com/13592258/78521722-ab7efe80-777f-11ea-90f5-1fac09282721.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 47 20 PM" src="https://user-images.githubusercontent.com/13592258/78521724-ade15880-777f-11ea-9238-183d999ed918.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 47 41 PM" src="https://user-images.githubusercontent.com/13592258/78521726-b043b280-777f-11ea-996f-a8e86d453c01.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 48 11 PM" src="https://user-images.githubusercontent.com/13592258/78521731-b3d73980-777f-11ea-85c8-c24798ef41ac.png"> <img width="1049" alt="Screen Shot 2020-04-05 at 8 48 33 PM" src="https://user-images.githubusercontent.com/13592258/78521734-b5a0fd00-777f-11ea-8b2c-96af30f3bf49.png"> ### How was this patch tested? Manually build and check. Closes #28121 from huaxingao/join. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-12 13:57:54 -05:00
Huaxin Gao	f69b0ef25d	[SPARK-31355][SQL][DOCS] Document TABLESAMPLE in SQL Reference ### What changes were proposed in this pull request? Document TABLESAMPLE in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-06 at 10 23 52 PM" src="https://user-images.githubusercontent.com/13592258/78633123-96749f00-7855-11ea-9509-b7ee21da7fbd.png"> <img width="1050" alt="Screen Shot 2020-04-06 at 10 24 26 PM" src="https://user-images.githubusercontent.com/13592258/78633130-98d6f900-7855-11ea-8675-fd4b6163dfb6.png"> ### How was this patch tested? Manually build and check. Closes #28130 from huaxingao/sampling. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-09 19:39:34 -05:00
zero323	697fe911ac	[SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR ### What changes were proposed in this pull request? This pull request adds SparkR wrapper for `FMRegressor`: - Supporting ` org.apache.spark.ml.r.FMRegressorWrapper`. - `FMRegressionModel` S4 class. - Corresponding `spark.fmRegressor`, `predict`, `summary` and `write.ml` generics. - Corresponding docs and tests. ### Why are the changes needed? Feature parity. ### Does this PR introduce any user-facing change? No (new API). ### How was this patch tested? New unit tests. Closes #27571 from zero323/SPARK-30819. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-09 19:38:11 -05:00
Huaxin Gao	61f903fa7a	[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs ### What changes were proposed in this pull request? Document Spark integration with Hive UDFs/UDAFs/UDTFs ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png"> ### How was this patch tested? Manually build and check Closes #28104 from huaxingao/hive-udfs. Lead-authored-by: Huaxin Gao <huaxing@us.ibm.com> Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-09 13:28:01 -05:00
HyukjinKwon	c279e6b091	[SPARK-30722][DOCS][FOLLOW-UP] Explicitly mention the same entire input/output length restriction of Series Iterator UDF ### What changes were proposed in this pull request? This PR explicitly mention that the requirement of Iterator of Series to Iterator of Series and Iterator of Multiple Series to Iterator of Series (previously Scalar Iterator pandas UDF). The actual limitation of this UDF is the same length of the _entire input and output_, instead of each series's length. Namely you can do something as below: ```python from typing import Iterator, Tuple import pandas as pd from pyspark.sql.functions import pandas_udf pandas_udf("long") def func( iterator: Iterator[pd.Series]) -> Iterator[pd.Series]: return iter([pd.concat(iterator)]) spark.range(100).select(func("id")).show() ``` This characteristic allows you to prefetch the data from the iterator to speed up, compared to the regular Scalar to Scalar (previously Scalar pandas UDF). ### Why are the changes needed? To document the correct restriction and characteristics of a feature. ### Does this PR introduce any user-facing change? Yes in the documentation but only in unreleased branches. ### How was this patch tested? Github Actions should test the documentation build Closes #28160 from HyukjinKwon/SPARK-30722-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-09 16:46:27 +09:00
Gengliang Wang	d89fcc64db	[SPARK-31333][FOLLOWUP][DOC] Link Join Hints doc in SQL perf tuning guide ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/28113. There is also a brief section about Join hints in SQL perf tuning guide: https://spark.apache.org/docs/latest/sql-performance-tuning.html . We should link the new Join hint doc in it. ### Why are the changes needed? So that users can read more examples. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually build the doc and check it: ![image](https://user-images.githubusercontent.com/1097932/78860030-f7cb7800-79e5-11ea-8573-c0587d43a7dc.png) Closes #28161 from gengliangwang/joinHintFollowUp. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-09 15:03:08 +09:00
zero323	0063462d55	[SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper ### What changes were proposed in this pull request? This pull request adds SparkR wrapper for `LinearRegression` - Supporting `org.apache.spark.ml.rLinearRegressionWrapper`. - `LinearRegressionModel` S4 class. - Corresponding `spark.lm` predict, summary and write.ml generics. - Corresponding docs and tests. ### Why are the changes needed? Feature parity. ### Does this PR introduce any user-facing change? No (new API). ### How was this patch tested? New unit tests. Closes #27593 from zero323/SPARK-30818. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-08 22:29:44 -05:00
Huaxin Gao	5dc9b9c7c1	[SPARK-31362][SQL][DOCS] Document Set Operators in SQL Reference ### What changes were proposed in this pull request? Document Set Operators in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-07 at 9 20 05 AM" src="https://user-images.githubusercontent.com/13592258/78694605-c6ea2680-78b1-11ea-8590-afb43dbe5933.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 20 41 AM" src="https://user-images.githubusercontent.com/13592258/78694613-c8b3ea00-78b1-11ea-89b9-d6cd71ee86a0.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 21 29 AM" src="https://user-images.githubusercontent.com/13592258/78694622-ca7dad80-78b1-11ea-9acf-7611ee57d4f2.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 21 54 AM" src="https://user-images.githubusercontent.com/13592258/78694626-cc477100-78b1-11ea-82f8-4deaf0048de7.png"> ### How was this patch tested? Manually build and check Closes #28139 from huaxingao/set-operators. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-08 10:51:04 -05:00
gatorsmile	a3d83948b8	[SPARK-31351][DOC] Migration Guide Auditing for Spark 3.0 Release ### What changes were proposed in this pull request? This PR is to audit the migration guides in Spark 3.0 release: - correct the grammar errors - clean up some items - replace HTML table by markdown table ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? No ### How was this patch tested? Screenshot: ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-04-04-21_36_29](https://user-images.githubusercontent.com/11567269/78467043-9477d800-76bd-11ea-8ab0-3d51ea5e9fa5.png) ![Screen Shot 2020-04-04 at 9 28 13 PM](https://user-images.githubusercontent.com/11567269/78467045-98a3f580-76bd-11ea-9e4b-927bf12e683a.png) ![Screen Shot 2020-04-04 at 9 28 02 PM](https://user-images.githubusercontent.com/11567269/78467046-98a3f580-76bd-11ea-8ea3-9f13cb8d200b.png) ![Screen Shot 2020-04-04 at 9 21 40 PM](https://user-images.githubusercontent.com/11567269/78467047-993c8c00-76bd-11ea-8c29-91afc68eb590.png) Closes #28125 from gatorsmile/updateMigrationGuide3.0. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-08 12:27:40 +09:00
beliefer	0fc859b4d5	[SPARK-31269][DOC][FOLLOWUP][MINOR] Add version head of GraphX table ### What changes were proposed in this pull request? HyukjinKwon have ported back all the PR about version to branch-3.0. I make a double check and found GraphX table lost version head. This PR will fix the issue. HyukjinKwon, please help me merge this PR to master and branch-3.0 ### Why are the changes needed? Add version head of GraphX table ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #28149 from beliefer/fix-head-of-graphx-table. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-08 12:25:06 +09:00
Eric Wu	a28ed86a38	[SPARK-31113][SQL] Add SHOW VIEWS command ### What changes were proposed in this pull request? Previously, user can issue `SHOW TABLES` to get info of both tables and views. This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews) Hive -- Only show view names ``` hive> SHOW VIEWS; OK view_1 view_2 ... ``` Spark(Hive-Compatible) -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications ``` SHOW VIEWS IN showdb; view_1 view_2 ... ``` Spark -- Show more information database/viewName/isTemporary ``` spark-sql> SHOW VIEWS; userdb view_1 false userdb view_2 false ... ``` ### Why are the changes needed? `SHOW VIEWS` command provides better granularity to only get information of views. ### Does this PR introduce any user-facing change? Add new `SHOW VIEWS` SQL command ### How was this patch tested? Add new test `show-views.sql` and pass existing tests Closes #27897 from Eric5553/ShowViews. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-07 09:25:01 -07:00
zero323	0d37f794ef	[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR ### What changes were proposed in this pull request? This pull request adds SparkR wrapper for `FMClassifier`: - Supporting ` org.apache.spark.ml.r.FMClassifierWrapper`. - `FMClassificationModel` S4 class. - Corresponding `spark.fmClassifier`, `predict`, `summary` and `write.ml` generics. - Corresponding docs and tests. ### Why are the changes needed? Feature parity. ### Does this PR introduce any user-facing change? No (new API). ### How was this patch tested? New unit tests. Closes #27570 from zero323/SPARK-30820. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-07 09:01:45 -05:00
Kent Yao	3c94a7c8f5	[SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes ### What changes were proposed in this pull request? Add migration guide for extracting second from datetimes ### Why are the changes needed? doc the behavior change for extract expression ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A, just passing jenkins Closes #28140 from yaooqinn/SPARK-29311. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-07 07:09:45 +00:00
Huaxin Gao	44d37efba2	[SPARK-31333][SQL][DOCS] Document Join Hints ### What changes were proposed in this pull request? Document Join Hints ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png"> <img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png"> ### How was this patch tested? Manually build and check Closes #28113 from huaxingao/join-hints. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-04-06 09:02:22 -05:00
Takeshi Yamamuro	e24f0dcd27	[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references ### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. <img src="https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png" width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-06 21:36:51 +09:00
Dongjoon Hyun	3886442332	[SPARK-27963][DOCS][FOLLOWUP] Update requirements for spark.dynamicAllocation.enabled ### What changes were proposed in this pull request? This PR fixes the outdated requirement for `spark.dynamicAllocation.enabled=true`. ### Why are the changes needed? This is found during 3.0.0 RC1 document review and testing. As described at `spark.dynamicAllocation.shuffleTracking.enabled` in the same table, we can enabled Dynamic Allocation without external shuffle service. ### Does this PR introduce any user-facing change? Yes. (Doc.) ### How was this patch tested? Manually generate the doc by `SKIP_API=1 jekyll build` BEFORE ![Screen Shot 2020-04-05 at 2 31 23 PM](https://user-images.githubusercontent.com/9700541/78510472-29c0ae00-774a-11ea-9916-ba80015fae82.png) AFTER ![Screen Shot 2020-04-05 at 2 29 25 PM](https://user-images.githubusercontent.com/9700541/78510434-ea925d00-7749-11ea-8db8-018955507fd5.png) Closes #28132 from dongjoon-hyun/SPARK-DA-DOC. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-06 11:04:21 +09:00
Huaxin Gao	4e45c07f5d	[SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference ### What changes were proposed in this pull request? Create Function docs structure for SQL Reference... ### Why are the changes needed? so the Function docs can be added later, also want to get a consensus about what to document for Functions in SQL Reference. ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-02 at 12 09 20 AM" src="https://user-images.githubusercontent.com/13592258/78220451-68b6e100-7476-11ea-9a21-733b41652785.png"> <img width="1051" alt="Screen Shot 2020-04-02 at 12 09 44 AM" src="https://user-images.githubusercontent.com/13592258/78220460-6ce2fe80-7476-11ea-887c-defefd55c19d.png"> <img width="1051" alt="Screen Shot 2020-04-02 at 12 10 05 AM" src="https://user-images.githubusercontent.com/13592258/78220463-6f455880-7476-11ea-81fc-fd4137db7c3f.png"> ### How was this patch tested? Manually build and check Closes #28099 from huaxingao/function. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-03 14:36:03 +09:00
Takeshi Yamamuro	d98df7626b	[SPARK-31325][SQL][WEB UI] Control a plan explain mode in the events of SQL listeners via SQLConf ### What changes were proposed in this pull request? This PR intends to add a new SQL config for controlling a plan explain mode in the events of (e.g., `SparkListenerSQLExecutionStart` and `SparkListenerSQLAdaptiveExecutionUpdate`) SQL listeners. In the current master, the output of `QueryExecution.toString` (this is equivalent to the "extended" explain mode) is stored in these events. I think it is useful to control the content via `SQLConf`. For example, the query "Details" content (TPCDS q66 query) of a SQL tab in a Spark web UI will be changed as follows; Before this PR: ![q66-extended](https://user-images.githubusercontent.com/692303/78211668-950b4580-74e8-11ea-90c6-db52d437534b.png) After this PR: ![q66-formatted](https://user-images.githubusercontent.com/692303/78211674-9ccaea00-74e8-11ea-9d1d-43c7e2b0f314.png) ### Why are the changes needed? For better usability. ### Does this PR introduce any user-facing change? Yes; since Spark 3.1, SQL UI data adopts the `formatted` mode for the query plan explain results. To restore the behavior before Spark 3.0, you can set `spark.sql.ui.explainMode` to `extended`. ### How was this patch tested? Added unit tests. Closes #28097 from maropu/SPARK-31325. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-04-02 21:09:16 -07:00
Thomas Graves	55dea9be62	[SPARK-29153][CORE] Add ability to merge resource profiles within a stage with Stage Level Scheduling ### What changes were proposed in this pull request? For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322. This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles. ### Why are the changes needed? To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail. ### Does this PR introduce any user-facing change? Yes, when the config is turned on it now merges the profiles instead of errorring out. ### How was this patch tested? Unit tests Closes #28053 from tgravescs/SPARK-29153. Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-04-02 08:30:18 -05:00
beliefer	50e535c431	[SPARK-31295][DOC][FOLLOWUP] Supplement version for configuration appear in doc ### What changes were proposed in this pull request? This PR supplements version for configuration appear in docs. I sorted out some information show below. docs/sql-performance-tuning.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.inMemoryColumnarStorage.compressed \| 1.0.1 \| SPARK-2631 \| 86534d0f5255362618c05a07b0171ec35c915822#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.inMemoryColumnarStorage.batchSize \| 1.1.1 \| SPARK-2650 \| 779d1eb26d0f031791e93c908d51a59c3b422a55#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.files.maxPartitionBytes \| 2.0.0 \| SPARK-13664 \| 17eec0a71ba8713c559d641e3f43a1be726b037c#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.files.openCostInBytes \| 2.0.0 \| SPARK-14259 \| 400b2f863ffaa01a34a8dae1541c61526fef908b#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.broadcastTimeout \| 1.3.0 \| SPARK-4269 \| fa66ef6c97e87c9255b67b03836a4ba50598ebae#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.autoBroadcastJoinThreshold \| 1.1.0 \| SPARK-2393 \| c7db274be79f448fda566208946cb50958ea9b1a#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.shuffle.partitions \| 1.1.0 \| SPARK-1508 \| 08ed9ad81397b71206c4dc903bfb94b6105691ed#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.adaptive.coalescePartitions.enabled \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.coalescePartitions.minPartitionNum \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.coalescePartitions.initialPartitionNum \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.advisoryPartitionSizeInBytes \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.skewJoin.enabled \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.skewJoin.skewedPartitionFactor \| 3.0.0 \| SPARK-31037 \| 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes \| 3.0.0 \| SPARK-31201 \| 8d0800a0803d3c47938bddefa15328d654739bc5#diff-9a6b543db706f1a90f790783d6930a13 \| docs/sql-ref-ansi-compliance.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.ansi.enabled \| 3.0.0 \| SPARK-30125 \| d9b30694122f8716d3acb448638ef1e2b96ebc7a#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.storeAssignmentPolicy \| 3.0.0 \| SPARK-28730 \| 895c90b582cc2b2667241f66d5b733852aeef9eb#diff-9a6b543db706f1a90f790783d6930a13 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28096 from beliefer/supplement-version-of-performance. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-02 16:01:54 +09:00
Kousuke Saruta	b9b1b549af	[SPARK-31073][DOC][FOLLOWUP] Add description for Shuffle Write Time metric in StagePage to web-ui.md ### What changes were proposed in this pull request? This PR adds description for `Shuffle Write Time` to `web-ui.md`. ### Why are the changes needed? #27837 added `Shuffle Write Time` metric to task metrics summary but it's not documented yet. ### Does this PR introduce any user-facing change? Yes. We can see the description for `Shuffle Write Time` in the new `web-ui.html`. <img width="956" alt="shuffle-write-time-description" src="https://user-images.githubusercontent.com/4736016/78175342-a9722280-7495-11ea-9cc6-62c6f3619aa3.png"> ### How was this patch tested? Built docs by `SKIP_API=1 jekyll build` in `doc` directory and then confirmed `web-ui.html`. Closes #28093 from sarutak/SPARK-31073-doc. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-01 12:03:41 -07:00
Huaxin Gao	fd0b228127	[SPARK-31290][R] Add back the deprecated R APIs ### What changes were proposed in this pull request? Add back the deprecated R APIs removed by https://github.com/apache/spark/pull/22843/ and https://github.com/apache/spark/pull/22815. These APIs are - `sparkR.init` - `sparkRSQL.init` - `sparkRHive.init` - `registerTempTable` - `createExternalTable` - `dropTempTable` No need to port the function such as ```r createExternalTable <- function(x, ...) { dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...) } ``` because this was for the backward compatibility when SQLContext exists before assuming from https://github.com/apache/spark/pull/9192, but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at https://github.com/apache/spark/pull/13635. ### Why are the changes needed? Amend Spark's Semantic Versioning Policy ### Does this PR introduce any user-facing change? Yes The removed R APIs are put back. ### How was this patch tested? Add back the removed tests Closes #28058 from huaxingao/r. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-01 10:38:03 +09:00
Huaxin Gao	1a7f9649b6	[SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference ### What changes were proposed in this pull request? Add a page to list all commands in SQL Reference... ### Why are the changes needed? so it's easier for user to find a specific command. ### Does this PR introduce any user-facing change? before: ![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png) after: ![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png) ![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png) Also move ```use database``` from query category to ddl category. ### How was this patch tested? Manually build and check Closes #28074 from huaxingao/list-all. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-04-01 08:42:15 +09:00
HyukjinKwon	4d4c3e76f6	Revert "[SPARK-30879][DOCS] Refine workflow for building docs" This reverts commit `7892f88f84`.	2020-03-31 16:11:59 +09:00
beliefer	47c810f8ae	[SPARK-31279][SQL][DOC] Add version information to the configuration of Hive ### What changes were proposed in this pull request? Add version information to the configuration of `Hive`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.hive.metastore.version \| 1.4.0 \| SPARK-6908 \| 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.version \| 1.1.1 \| SPARK-3971 \| 64945f868443fbc59cb34b34c16d782dda0fb63d#diff-12fa2178364a810b3262b30d8d48aa2d \| spark.sql.hive.metastore.jars \| 1.4.0 \| SPARK-6908 \| 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.convertMetastoreParquet \| 1.1.1 \| SPARK-2406 \| cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.convertMetastoreParquet.mergeSchema \| 1.3.1 \| SPARK-6575 \| 778c87686af0c04df9dfe144b8f744f271a988ad#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.convertMetastoreOrc \| 2.0.0 \| SPARK-14070 \| 1e886159849e3918445d3fdc3c4cef86c6c1a236#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.convertInsertingPartitionedTable \| 3.0.0 \| SPARK-28573 \| d5688dc732890923c326f272b0c18c329a69459a#diff-842e3447fc453de26c706db1cac8f2c4 \| spark.sql.hive.convertMetastoreCtas \| 3.0.0 \| SPARK-25271 \| 5ad03607d1487e7ab3e3b6d00eef9c4028ed4975#diff-842e3447fc453de26c706db1cac8f2c4 \| spark.sql.hive.metastore.sharedPrefixes \| 1.4.0 \| SPARK-7491 \| a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.metastore.barrierPrefixes \| 1.4.0 \| SPARK-7491 \| a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.hive.thriftServer.async \| 1.5.0 \| SPARK-6964 \| eb19d3f75cbd002f7e72ce02017a8de67f562792#diff-ff50aea397a607b79df9bec6f2a841db \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #28042 from beliefer/add-version-to-hive-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-31 12:35:01 +09:00
beliefer	4fc8ee74fc	[SPARK-31295][DOC] Supplement version for configuration appear in doc ### What changes were proposed in this pull request? This PR supplements version for configuration appear in docs. I sorted out some information show below. docs/spark-standalone.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.deploy.retainedApplications \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.deploy.retainedDrivers \| 1.1.0 \| None \| 7446f5ff93142d2dd5c79c63fa947f47a1d4db8b#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.deploy.spreadOut \| 0.6.1 \| None \| bb2b9ff37cd2503cc6ea82c5dd395187b0910af0#diff-0e7ae91819fc8f7b47b0f97be7116325 \| spark.deploy.defaultCores \| 0.9.0 \| None \| d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.deploy.maxExecutorRetries \| 1.6.3 \| SPARK-16956 \| ace458f0330f22463ecf7cbee7c0465e10fba8a8#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.worker.resource.{resourceName}.amount \| 3.0.0 \| SPARK-27371 \| cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 \| spark.worker.resource.{resourceName}.discoveryScript \| 3.0.0 \| SPARK-27371 \| cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 \| spark.worker.resourcesFile \| 3.0.0 \| SPARK-27369 \| 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da \| spark.shuffle.service.db.enabled \| 3.0.0 \| SPARK-26288 \| 8b0aa59218c209d39cbba5959302d8668b885cf6#diff-6bdad48cfc34314e89599655442ff210 \| spark.storage.cleanupFilesAfterExecutorExit \| 2.4.0 \| SPARK-24340 \| 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 \| spark.deploy.recoveryMode \| 0.8.1 \| None \| d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.deploy.recoveryDirectory \| 0.8.1 \| None \| d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 \| docs/sql-data-sources-avro.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.legacy.replaceDatabricksSparkAvro.enabled \| 2.4.0 \| SPARK-25129 \| ac0174e55af2e935d41545721e9f430c942b3a0c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.avro.compression.codec \| 2.4.0 \| SPARK-24881 \| 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.avro.deflate.level \| 2.4.0 \| SPARK-24881 \| 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 \| docs/sql-data-sources-orc.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.orc.impl \| 2.3.0 \| SPARK-20728 \| 326f1d6728a7734c228d8bfaa69442a1c7b92e9b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.orc.enableVectorizedReader \| 2.3.0 \| SPARK-16060 \| 60f6b994505e3f82091a04eed2dc0a9e8bd523ce#diff-9a6b543db706f1a90f790783d6930a13 \| docs/sql-data-sources-parquet.md Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.parquet.binaryAsString \| 1.1.1 \| SPARK-2927 \| de501e169f24e4573747aec85b7651c98633c028#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.parquet.int96AsTimestamp \| 1.3.0 \| SPARK-4987 \| 67d52207b5cf2df37ca70daff2a160117510f55e#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.parquet.compression.codec \| 1.1.1 \| SPARK-3131 \| 3a9d874d7a46ab8b015631d91ba479d9a0ba827f#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.parquet.filterPushdown \| 1.2.0 \| SPARK-4391 \| 576688aa2a19bd4ba239a2b93af7947f983e5124#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.hive.convertMetastoreParquet \| 1.1.1 \| SPARK-2406 \| cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.parquet.mergeSchema \| 1.5.0 \| SPARK-8690 \| 246265f2bb056d5e9011d3331b809471a24ff8d7#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.parquet.writeLegacyFormat \| 1.6.0 \| SPARK-10400 \| 01cd688f5245cbb752863100b399b525b31c3510#diff-41ef65b9ef5b518f77e2a03559893f4d \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28064 from beliefer/supplement-doc-for-data-sources. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-31 12:33:46 +09:00
beliefer	fc5d67fe22	[SPARK-31282][DOC] Supplement version for configuration appear in security doc ### What changes were proposed in this pull request? This PR supplements version for configuration appear in security doc. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.network.crypto.keyLength \| 2.2.0 \| SPARK-19139 \| 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.network.crypto.keyFactoryAlgorithm \| 2.2.0 \| SPARK-19139 \| 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.network.crypto.config.* \| 2.2.0 \| SPARK-19139 \| 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.network.crypto.saslFallback \| 2.2.0 \| SPARK-19139 \| 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.authenticate.enableSaslEncryption \| 2.2.0 \| SPARK-19139 \| 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.network.sasl.serverAlwaysEncrypt \| 1.4.0 \| SPARK-6229 \| 38d4e9e446b425ca6a8fe8d8080f387b08683842#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.ui.filters \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-f79a5ead735b3d0b34b6b94486918e1c \| spark.acls.enable \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.user.groups.mapping \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.history.ui.acls.enable \| 1.0.1 \| Spark 1489 \| c8dd13221215275948b1a6913192d40e0c8cbadd#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.ui.admin.acls \| 2.1.1 \| SPARK-19033 \| 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.ui.admin.acls.groups \| 2.1.1 \| SPARK-19033 \| 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.ui.xXssProtection \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.xContentTypeOptions.enabled \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.strictTransportSecurity \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.security.credentials.${service}.enabled \| 2.3.0 \| SPARK-20434 \| a18d637112b97d2caaca0a8324bdd99086664b24#diff-da6c1fd6d8b0c7538a3e77a09e06a083 \| spark.kerberos.access.hadoopFileSystems \| 3.0.0 \| SPARK-26766 \| d0443a74d185ec72b747fa39994fa9a40ce974cf#diff-6bdad48cfc34314e89599655442ff210 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28044 from beliefer/supplement-version-to-security-doc. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-31 12:33:01 +09:00
beliefer	18b73a5b59	[SPARK-31269][DOC] Supplement version for configuration only appear in configuration doc ### What changes were proposed in this pull request? The `configuration.md` exists some config not organized by `ConfigEntry`. This PR supplements version for configuration only appear in configuration doc. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.app.name \| 0.9.0 \| None \| 994f080f8ae3372366e6004600ba791c8a372ff0#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.driver.resource.{resourceName}.amount \| 3.0.0 \| SPARK-27760 \| d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.driver.resource.{resourceName}.discoveryScript \| 3.0.0 \| SPARK-27488 \| 74e5e41eebf9ed596b48e6db52a2a9c642e5cbc3#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.driver.resource.{resourceName}.vendor \| 3.0.0 \| SPARK-27362 \| 1277f8fa92da85d9e39d9146e3099fcb75c71a8f#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.executor.resource.{resourceName}.amount \| 3.0.0 \| SPARK-27760 \| d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.executor.resource.{resourceType}.discoveryScript \| 3.0.0 \| SPARK-27024 \| db2e3c43412e4a7fb4a46c58d73d9ab304a1e949#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.executor.resource.{resourceName}.vendor \| 3.0.0 \| SPARK-27362 \| 1277f8fa92da85d9e39d9146e3099fcb75c71a8f#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.local.dir \| 0.5.0 \| None \| 0e93891d3d7df849cff6442038c111ffd42a5243#diff-17fd275d280b667722664ed833c6402a \| spark.logConf \| 0.9.0 \| None \| d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-364713d7776956cb8b0a771e9b62f82d \| spark.master \| 0.9.0 \| SPARK-544 \| 2573add94cf920a88f74d80d8ea94218d812704d#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.driver.defaultJavaOptions \| 3.0.0 \| SPARK-23472 \| f83000597f250868de9722d8285fed013abc5ecf#diff-a78ecfc6a89edfaf0b60a5eaa0381970 \| spark.executor.defaultJavaOptions \| 3.0.0 \| SPARK-23472 \| f83000597f250868de9722d8285fed013abc5ecf#diff-a78ecfc6a89edfaf0b60a5eaa0381970 \| spark.executorEnv.[EnvironmentVariableName] \| 0.9.0 \| None \| 642029e7f43322f84abe4f7f36bb0b1b95d8101d#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.python.profile \| 1.2.0 \| SPARK-3478 \| 1aa549ba9839565274a12c52fa1075b424f138a6#diff-d6fe2792e44f6babc94aabfefc8b9bce \| spark.python.profile.dump \| 1.2.0 \| SPARK-3478 \| 1aa549ba9839565274a12c52fa1075b424f138a6#diff-d6fe2792e44f6babc94aabfefc8b9bce \| spark.python.worker.memory \| 1.1.0 \| SPARK-2538 \| 14174abd421318e71c16edd24224fd5094bdfed4#diff-d6fe2792e44f6babc94aabfefc8b9bce \| spark.jars.packages \| 1.5.0 \| SPARK-9263 \| 34335719a372c1951fdb4dd25b75b086faf1076f#diff-63a5d817d2d45ae24de577f6a1bd80f9 \| spark.jars.excludes \| 1.5.0 \| SPARK-9263 \| 34335719a372c1951fdb4dd25b75b086faf1076f#diff-63a5d817d2d45ae24de577f6a1bd80f9 \| spark.jars.ivy \| 1.3.0 \| SPARK-5341 \| 3b7acd22ab4a134c74746e3b9a803dbd34d43855#diff-63a5d817d2d45ae24de577f6a1bd80f9 \| spark.jars.ivySettings \| 2.2.0 \| SPARK-17568 \| 3bc2eff8880a3ba8d4318118715ea1a47048e3de#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.jars.repositories \| 2.3.0 \| SPARK-21403 \| d8257b99ddae23f702f312640a5335ddb4554403#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.shuffle.io.maxRetries \| 1.2.0 \| SPARK-4188 \| c1ea5c542f3267c0b23a7775887e3a6ece793fe3#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.shuffle.io.numConnectionsPerPeer \| 1.2.1 \| SPARK-4740 \| 441ec3451730c7ae3dbef8952e313071d6147ab6#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.shuffle.io.preferDirectBufs \| 1.2.0 \| SPARK-4188 \| c1ea5c542f3267c0b23a7775887e3a6ece793fe3#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.shuffle.io.retryWait \| 1.2.1 \| None \| 5e5d8f469a1bea9bbe606f772ccdcab7c184c651#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.shuffle.io.backLog \| 1.1.1 \| SPARK-2468 \| 66b4c81db7e826c00f7fb449b8a8af810cf7dd9a#diff-bdee8e601924d41e93baa7287189e878 \| spark.shuffle.service.index.cache.size \| 2.3.0 \| SPARK-21501 \| 1662e93119d68498942386906de309d35f4a135f#diff-97d5edc927a83a678e013ae00343df94 \| spark.shuffle.maxChunksBeingTransferred \| 2.3.0 \| SPARK-21175 \| 799e13161e89f1ea96cb1bc7b507a05af2e89cd0#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.sql.ui.retainedExecutions \| 1.5.0 \| SPARK-8861 and SPARK-8862 \| ebc3aad272b91cf58e2e1b4aa92b49b8a947a045#diff-81764e4d52817f83bdd5336ef1226bd9 \| spark.streaming.ui.retainedBatches \| 1.0.0 \| SPARK-1386 \| f36dc3fed0a0671b0712d664db859da28c0a98e2#diff-56b8d67d07284cfab165d5363bd3500e \| spark.default.parallelism \| 0.5.0 \| None \| e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-0544ebf7533fa70ff5103e0fe1f0b036 \| spark.files.fetchTimeout \| 1.0.0 \| None \| f6f9d02e85d17da2f742ed0062f1648a9293e73c#diff-d239aee594001f8391676e1047a0381e \| spark.files.useFetchCache \| 1.2.2 \| SPARK-6313 \| a2a94a154bdd00753b8d5e344d712664c7151050#diff-d239aee594001f8391676e1047a0381e \| spark.files.overwrite \| 1.0.0 \| None \| 84670f2715392859624df290c1b52eb4ed4a9cb1#diff-d239aee594001f8391676e1047a0381e \| Exists in branch-1.0, but the version of pom is 0.9.0-incubating-SNAPSHOT spark.hadoop.cloneConf \| 1.0.3 \| SPARK-2546 \| 6d8f1dd15afdc7432b5721c89f9b2b402460322b#diff-83eb37f7b0ebed3c14ccb7bff0d577c2 \| spark.hadoop.validateOutputSpecs \| 1.0.1 \| SPARK-1677 \| 8100cbdb7546e8438019443cfc00683017c81278#diff-f70e97c099b5eac05c75288cb215e080 \| spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version \| 2.2.0 \| SPARK-20107 \| edc87d76efea7b4d19d9d0c4ddba274a3ccb8752#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.rpc.io.backLog \| 3.0.0 \| SPARK-27868 \| 09ed64d795d3199a94e175273fff6fcea6b52131#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.network.io.preferDirectBufs \| 3.0.0 \| SPARK-24920 \| e103c4a5e72bab8862ff49d6d4c1e62e642fc412#diff-0ac65da2bc6b083fb861fe410c7688c2 \| spark.port.maxRetries \| 1.1.1 \| SPARK-3565 \| 32f2222e915f31422089139944a077e2cbd442f9#diff-d239aee594001f8391676e1047a0381e \| spark.core.connection.ack.wait.timeout \| 1.1.1 \| SPARK-2677 \| bd3ce2ffb8964abb4d59918ebb2c230fe4614aa2#diff-f748e95f2aa97ed715afa53ddeeac9de \| spark.scheduler.listenerbus.eventqueue.shared.capacity \| 3.0.0 \| SPARK-28574 \| c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 \| spark.scheduler.listenerbus.eventqueue.appStatus.capacity \| 3.0.0 \| SPARK-28574 \| c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 \| spark.scheduler.listenerbus.eventqueue.executorManagement.capacity \| 3.0.0 \| SPARK-28574 \| c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 \| spark.scheduler.listenerbus.eventqueue.eventLog.capacity \| 3.0.0 \| SPARK-28574 \| c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 \| spark.scheduler.listenerbus.eventqueue.streams.capacity \| 3.0.0 \| SPARK-28574 \| c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 \| spark.task.resource.{resourceName}.amount \| 3.0.0 \| SPARK-27760 \| d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 \| spark.stage.maxConsecutiveAttempts \| 2.2.0 \| SPARK-13369 \| 7b5d873aef672aa0aee41e338bab7428101e1ad3#diff-6a9ff7fb74fd490a50462d45db2d5e11 \| spark.{driver\\|executor}.rpc.io.serverThreads \| 1.6.0 \| SPARK-10745 \| 7c5b641808740ba5eed05ba8204cdbaf3fc579f5#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.{driver\\|executor}.rpc.io.clientThreads \| 1.6.0 \| SPARK-10745 \| 7c5b641808740ba5eed05ba8204cdbaf3fc579f5#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 \| spark.{driver\\|executor}.rpc.netty.dispatcher.numThreads \| 3.0.0 \| SPARK-29398 \| 2f0a38cb50e3e8b4b72219c7b2b8b15d51f6b931#diff-a68a21481fea5053848ca666dd3201d8 \| spark.r.driver.command \| 1.5.3 \| SPARK-10971 \| 9695f452e86a88bef3bcbd1f3c0b00ad9e9ac6e1#diff-025470e1b7094d7cf4a78ea353fb3981 \| spark.r.shell.command \| 2.1.0 \| SPARK-17178 \| fa6347938fc1c72ddc03a5f3cd2e929b5694f0a6#diff-a78ecfc6a89edfaf0b60a5eaa0381970 \| spark.graphx.pregel.checkpointInterval \| 2.2.0 \| SPARK-5484 \| f971ce5dd0788fe7f5d2ca820b9ea3db72033ddc#diff-e399679417ffa6eeedf26a7630baca16 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28035 from beliefer/supplement-configuration-version. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-31 12:32:04 +09:00
beliefer	bed21770af	[SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL ### What changes were proposed in this pull request? Add version information to the static configuration of `SQL`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.warehouse.dir \| 2.0.0 \| SPARK-14994 \| 054f991c4350af1350af7a4109ee77f4a34822f0#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.catalogImplementation \| 2.0.0 \| SPARK-14720 and SPARK-13643 \| 8fc267ab3322e46db81e725a5cb1adb5a71b2b4d#diff-6bdad48cfc34314e89599655442ff210 \| spark.sql.globalTempDatabase \| 2.1.0 \| SPARK-17338 \| 23ddff4b2b2744c3dc84d928e144c541ad5df376#diff-6bdad48cfc34314e89599655442ff210 \| spark.sql.sources.schemaStringLengthThreshold \| 1.3.1 \| SPARK-6024 \| 6200f0709c5c8440decae8bf700d7859f32ac9d5#diff-41ef65b9ef5b518f77e2a03559893f4d \| 1.3 spark.sql.filesourceTableRelationCacheSize \| 2.2.0 \| SPARK-19265 \| 9d9d67c7957f7cbbdbe889bdbc073568b2bfbb16#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.codegen.cache.maxEntries \| 2.4.0 \| SPARK-24727 \| b2deef64f604ddd9502a31105ed47cb63470ec85#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.codegen.comments \| 2.0.0 \| SPARK-15680 \| f0e8738c1ec0e4c5526aeada6f50cf76428f9afd#diff-8bcc5aea39c73d4bf38aef6f6951d42c \| spark.sql.debug \| 2.1.0 \| SPARK-17899 \| db8784feaa605adcbd37af4bc8b7146479b631f8#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.hive.thriftServer.singleSession \| 1.6.0 \| SPARK-11089 \| 167ea61a6a604fd9c0b00122a94d1bc4b1de24ff#diff-ff50aea397a607b79df9bec6f2a841db \| spark.sql.extensions \| 2.2.0 \| SPARK-18127 \| f0de600797ff4883927d0c70732675fd8629e239#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.queryExecutionListeners \| 2.3.0 \| SPARK-19558 \| bd4eb9ce57da7bacff69d9ed958c94f349b7e6fb#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.streaming.streamingQueryListeners \| 2.4.0 \| SPARK-24479 \| 7703b46d2843db99e28110c4c7ccf60934412504#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.ui.retainedExecutions \| 1.5.0 \| SPARK-8861 and SPARK-8862 \| ebc3aad272b91cf58e2e1b4aa92b49b8a947a045#diff-81764e4d52817f83bdd5336ef1226bd9 \| spark.sql.broadcastExchange.maxThreadThreshold \| 3.0.0 \| SPARK-26601 \| 126310ca68f2f248ea8b312c4637eccaba2fdc2b#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.subquery.maxThreadThreshold \| 2.4.6 \| SPARK-30556 \| 2fc562cafd71ec8f438f37a28b65118906ab2ad2#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.event.truncate.length \| 3.0.0 \| SPARK-27045 \| e60d8fce0b0cf2a6d766ea2fc5f994546550570a#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.legacy.sessionInitWithConfigDefaults \| 3.0.0 \| SPARK-27253 \| 83f628b57da39ad9732d1393aebac373634a2eb9#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.defaultUrlStreamHandlerFactory.enabled \| 3.0.0 \| SPARK-25694 \| 8469614c0513fbed87977d4e741649db3fdd8add#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.streaming.ui.enabled \| 3.0.0 \| SPARK-29543 \| f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.streaming.ui.retainedProgressUpdates \| 3.0.0 \| SPARK-29543 \| f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 \| spark.sql.streaming.ui.retainedQueries \| 3.0.0 \| SPARK-29543 \| f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27981 from beliefer/add-version-to-sql-static-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-31 12:31:25 +09:00
Luca Canali	aa98ac52db	[SPARK-30775][DOC] Improve the description of executor metrics in the monitoring documentation ### What changes were proposed in this pull request? This PR (SPARK-30775) aims to improve the description of the executor metrics in the monitoring documentation. ### Why are the changes needed? Improve and clarify monitoring documentation by: - adding reference to the Prometheus end point, as implemented in [SPARK-29064] - extending the list and descripion of executor metrics, following up from [SPARK-27157] ### Does this PR introduce any user-facing change? Documentation update. ### How was this patch tested? n.a. Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-30 18:00:54 -07:00
Kengo Seki	60dd1a690f	[SPARK-31293][DSTREAMS][KINESIS][DOC] Fix wrong examples and help messages for Kinesis integration ### What changes were proposed in this pull request? This PR (SPARK-31293) fixes wrong command examples, parameter descriptions and help message format for Amazon Kinesis integration with Spark Streaming. ### Why are the changes needed? To improve usability of those commands. ### Does this PR introduce any user-facing change? No ### How was this patch tested? I ran the fixed commands manually and confirmed they worked as expected. Closes #28063 from sekikn/SPARK-31293. Authored-by: Kengo Seki <sekikn@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-29 14:27:19 -07:00
Huaxin Gao	e656e99061	[SPARK-30363][SQL][DOCS][FOLLOWUP] Fix a broken link in SQL Reference ### What changes were proposed in this pull request? Fix a broken link and make the relevant docs reference to the new doc ### Why are the changes needed? ### Does this PR introduce any user-facing change? Yes, make CACHE TABLE, UNCACHE TABLE, CLEAR CACHE, REFRESH TABLE link to the new doc ### How was this patch tested? Manually build and check Closes #28065 from huaxingao/spark-30363-follow-up. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-29 11:19:24 -05:00
HyukjinKwon	34c7476cb5	[SPARK-30722][DOCS][FOLLOW-UP] Add Pandas Function API into the menu ### What changes were proposed in this pull request? This PR adds "Pandas Function API" into the menu. ### Why are the changes needed? To be consistent and to make easier to navigate. ### Does this PR introduce any user-facing change? No, master only. ![Screen Shot 2020-03-27 at 11 40 29 PM](https://user-images.githubusercontent.com/6477701/77767405-60306600-7084-11ea-944a-93726259cd00.png) ### How was this patch tested? Manually verified by `SKIP_API=1 jekyll build`. Closes #28054 from HyukjinKwon/followup-spark-30722. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-28 18:36:34 -07:00
gatorsmile	b9eafcb526	[SPARK-31088][SQL] Add back HiveContext and createExternalTable ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - HiveContext - createExternalTable APIs ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? add a new test suite for createExternalTable APIs. Closes #27815 from gatorsmile/addAPIsBack. Lead-authored-by: gatorsmile <gatorsmile@gmail.com> Co-authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2020-03-26 23:51:15 -07:00
Wenchen Fan	05498af72e	[SPARK-31201][SQL] Add an individual config for skewed partition threshold ### What changes were proposed in this pull request? Skew join handling comes with an overhead: we need to read some data repeatedly. We should treat a partition as skewed if it's large enough so that it's beneficial to do so. Currently the size threshold is the advisory partition size, which is 64 MB by default. This is not large enough for the skewed partition size threshold. This PR adds a new config for the threshold and set default value as 256 MB. ### Why are the changes needed? Avoid skew join handling that may introduce a perf regression. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #27967 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-26 22:57:01 +09:00
beliefer	35d286bafb	[SPARK-31228][DSTREAMS] Add version information to the configuration of Kafka ### What changes were proposed in this pull request? Add version information to the configuration of Kafka. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.streaming.kafka.consumer.cache.enabled \| 2.2.1 \| SPARK-19185 \| 02cf178bb2a7dc8b4c06eb040c44b6453e41ed15#diff-c465bbcc83b2ecc7530d1c0128e4432b \| spark.streaming.kafka.consumer.poll.ms \| 2.0.1 \| SPARK-12177 \| 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 \| spark.streaming.kafka.consumer.cache.initialCapacity \| 2.0.1 \| SPARK-12177 \| 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 \| spark.streaming.kafka.consumer.cache.maxCapacity \| 2.0.1 \| SPARK-12177 \| 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 \| spark.streaming.kafka.consumer.cache.loadFactor \| 2.0.1 \| SPARK-12177 \| 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 \| spark.streaming.kafka.maxRatePerPartition \| 1.3.0 \| SPARK-4964 \| a119cae48030520da9f26ee9a1270bed7f33031e#diff-26cb4369f86050dc2e75cd16291b2844 \| spark.streaming.kafka.minRatePerPartition \| 2.4.0 \| SPARK-25233 \| 135ff16a3510a4dfb3470904004dae9848005019#diff-815f6ec5caf9e4beb355f5f981171f1f \| spark.streaming.kafka.allowNonConsecutiveOffsets \| 2.3.1 \| SPARK-24067 \| 1d598b771de3b588a2f377ae7ccf8193156641f2#diff-4597d93a0e951f7199697dba7dd0dc32 \| spark.kafka.producer.cache.timeout \| 2.2.1 \| SPARK-19968 \| f6730a70cb47ebb3df7f42209df7b076aece1093#diff-ac8844e8d791a75aaee3d0d10bfc1f2a \| spark.kafka.producer.cache.evictorThreadRunInterval \| 3.0.0 \| SPARK-21869 \| 7bff2db9ed803e05a43c2d875c1dea819d81248a#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.cache.capacity \| 3.0.0 \| SPARK-27687 \| efa303581ac61d6f517aacd08883da2d01530bd2#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.cache.jmx.enable \| 3.0.0 \| SPARK-25151 \| 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.cache.timeout \| 3.0.0 \| SPARK-25151 \| 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.cache.evictorThreadRunInterval \| 3.0.0 \| SPARK-25151 \| 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.fetchedData.cache.timeout \| 3.0.0 \| SPARK-25151 \| 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.consumer.fetchedData.cache.evictorThreadRunInterval \| 3.0.0 \| SPARK-25151 \| 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd \| spark.kafka.clusters.${cluster}.auth.bootstrap.servers \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.target.bootstrap.servers.regex \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.security.protocol \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.sasl.kerberos.service.name \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.ssl.truststore.location \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.ssl.truststore.password \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.ssl.keystore.location \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.ssl.keystore.password \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.ssl.key.password \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| spark.kafka.clusters.${cluster}.sasl.token.mechanism \| 3.0.0 \| SPARK-27294 \| 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27989 from beliefer/add-version-to-kafka-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-26 20:11:15 +09:00
Kent Yao	44bd36ad7b	[SPARK-31234][SQL] ResetCommand should reset config to sc.conf only ### What changes were proposed in this pull request? Currently, ResetCommand clear all configurations, including sql configs, static sql configs and spark context level configs. for example: ```sql spark-sql> set xyz=abc; xyz abc spark-sql> set; spark.app.id local-1585055396930 spark.app.name SparkSQL::10.242.189.214 spark.driver.host 10.242.189.214 spark.driver.port 65094 spark.executor.id driver spark.jars spark.master local[*] spark.sql.catalogImplementation hive spark.sql.hive.version 1.2.1 spark.submit.deployMode client xyz abc spark-sql> reset; spark-sql> set; spark-sql> set spark.sql.hive.version; spark.sql.hive.version 1.2.1 spark-sql> set spark.app.id; spark.app.id <undefined> ``` In this PR, we restore spark confs to RuntimeConfig after it is cleared ### Why are the changes needed? reset command overkills configs which are static. ### Does this PR introduce any user-facing change? yes, the ResetCommand do not change static configs now ### How was this patch tested? add ut Closes #28003 from yaooqinn/SPARK-31234. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-26 15:03:16 +08:00
Huaxin Gao	ee6f8991a7	[SPARK-30934][ML][FOLLOW-UP] Update ml-guide to include MulticlassClassificationEvaluator weight support in highlights ### What changes were proposed in this pull request? Update ml-guide to include ```MulticlassClassificationEvaluator``` weight support in highlights ### Why are the changes needed? ```MulticlassClassificationEvaluator``` weight support is very important, so should include it in highlights ### Does this PR introduce any user-facing change? Yes after: ![image](https://user-images.githubusercontent.com/13592258/77614952-6ccd8680-6eeb-11ea-9354-fa20004132df.png) ### How was this patch tested? manually build and check Closes #28031 from huaxingao/highlights-followup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>	2020-03-26 14:24:53 +08:00
Wenchen Fan	4f274a4de9	[SPARK-31147][SQL] Forbid CHAR type in non-Hive-Serde tables ### What changes were proposed in this pull request? Spark introduced CHAR type for hive compatibility but it only works for hive tables. CHAR type is never documented and is treated as STRING type for non-Hive tables. However, this leads to confusing behaviors Apache Spark 3.0.0-preview2 ``` spark-sql> CREATE TABLE t(a CHAR(3)); spark-sql> INSERT INTO TABLE t SELECT 'a '; spark-sql> SELECT a, length(a) FROM t; a 2 ``` Apache Spark 2.4.5 ``` spark-sql> CREATE TABLE t(a CHAR(3)); spark-sql> INSERT INTO TABLE t SELECT 'a '; spark-sql> SELECT a, length(a) FROM t; a 3 ``` According to the SQL standard, `CHAR(3)` should guarantee all the values are of length 3. Since `CHAR(3)` is treated as STRING so Spark doesn't guarantee it. This PR forbids CHAR type in non-Hive tables as it's not supported correctly. ### Why are the changes needed? avoid confusing/wrong behavior ### Does this PR introduce any user-facing change? yes, now users can't create/alter non-Hive tables with CHAR type. ### How was this patch tested? new tests Closes #27902 from cloud-fan/char. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-25 09:25:55 -07:00
Wenchen Fan	1d0f54951e	[SPARK-31205][SQL] support string literal as the second argument of date_add/date_sub functions ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/26412 introduced a behavior change that `date_add`/`date_sub` functions can't accept string and double values in the second parameter. This is reasonable as it's error-prone to cast string/double to int at runtime. However, using string literals as function arguments is very common in SQL databases. To avoid breaking valid use cases that the string literal is indeed an integer, this PR proposes to add ansi_cast for string literal in date_add/date_sub functions. If the string value is not a valid integer, we fail at query compiling time because of constant folding. ### Why are the changes needed? avoid breaking changes ### Does this PR introduce any user-facing change? Yes, now 3.0 can run `date_add('2011-11-11', '1')` like 2.4 ### How was this patch tested? new tests. Closes #27965 from cloud-fan/string. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-24 12:07:22 +08:00
Wenchen Fan	d929c0dfe8	[SPARK-31133][SQL][DOC] fix sql ref doc for DML ### What changes were proposed in this pull request? `INSERT OVERWRITE DIRECTORY` can only use file format (class implements `org.apache.spark.sql.execution.datasources.FileFormat`). This PR fixes it and other minor improvement. ### Why are the changes needed? ### Does this PR introduce any user-facing change? ### How was this patch tested? Closes #27891 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-23 22:00:50 +08:00
beliefer	a0cf972985	[SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams ### What changes were proposed in this pull request? Add version information to the configuration of `Dstreams`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.streaming.backpressure.enabled \| 1.5.0 \| SPARK-9967 and SPARK-10099 \| 392bd19d678567751cd3844d9d166a7491c5887e#diff-1b584c4ed88a9022abb11d594f760997 \| spark.streaming.backpressure.initialRate \| 2.0.0 \| SPARK-11627 \| 7218c0eba957e0a079a407b79c3a050cce9647b2#diff-c64d571ef32d2dbf76e965ecd04a9f52 \| spark.streaming.blockInterval \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-54d85b29e4349628a0de525c119399b5 \| spark.streaming.receiver.maxRate \| 1.0.2 \| SPARK-1341 \| ca19cfbcd5cfac9ad731350dfeea14355aec87d6#diff-c64d571ef32d2dbf76e965ecd04a9f52 \| spark.streaming.receiver.writeAheadLog.enable \| 1.2.1 \| SPARK-4482 \| ce5ea0fd611ce560f6e1fac83562469bdb97091e#diff-0607b70e4e79cbbc1a128c45784cb813 \| spark.streaming.unpersist \| 0.9.0 \| None \| 08b9fec93d00ff0ebb49af4d9ac72d2806eded02#diff-bcf5f84f78d23ebde7d532bea756bc57 \| spark.streaming.stopGracefullyOnShutdown \| 1.4.0 \| SPARK-7776 \| a17a5cb302c5fa6a4d3e9e3e0fa2100c0b5436d6#diff-8a7f0e3f26c15ba484e6312c3caf033d \| spark.streaming.kafka.maxRetries \| 1.3.0 \| SPARK-4964 \| a119cae48030520da9f26ee9a1270bed7f33031e#diff-26cb4369f86050dc2e75cd16291b2844 \| spark.streaming.ui.retainedBatches \| 1.0.0 \| SPARK-1386 \| f36dc3fed0a0671b0712d664db859da28c0a98e2#diff-56b8d67d07284cfab165d5363bd3500e \| spark.streaming.driver.writeAheadLog.closeFileAfterWrite \| 1.6.0 \| SPARK-11324 \| 4f030b9e82172659d250281782ac573cbd1438fc#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.receiver.writeAheadLog.closeFileAfterWrite \| 1.6.0 \| SPARK-11324 \| 4f030b9e82172659d250281782ac573cbd1438fc#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.receiver.writeAheadLog.class \| 1.4.0 \| SPARK-7056 \| 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.receiver.writeAheadLog.rollingIntervalSecs \| 1.4.0 \| SPARK-7056 \| 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.receiver.writeAheadLog.maxFailures \| 1.2.0 \| SPARK-4028 \| 234de9232bcfa212317a8073c4a82c3863b36b14#diff-8cec1a581eebcad673dc8930b1a2801c \| spark.streaming.driver.writeAheadLog.class \| 1.4.0 \| SPARK-7056 \| 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.driver.writeAheadLog.rollingIntervalSecs \| 1.4.0 \| SPARK-7056 \| 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.driver.writeAheadLog.maxFailures \| 1.4.0 \| SPARK-7056 \| 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.driver.writeAheadLog.allowBatching \| 1.6.0 \| SPARK-11141 \| dccc4645df629f35c4788d50b2c0a6ab381db4b7#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.driver.writeAheadLog.batchingTimeout \| 1.6.0 \| SPARK-11141 \| dccc4645df629f35c4788d50b2c0a6ab381db4b7#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d \| spark.streaming.sessionByKey.deltaChainThreshold \| 1.6.0 \| SPARK-11290 \| daa74be6f863061221bb0c2f94e70672e6fcbeaa#diff-e0a40541298f885606a2361ff9c5af6c \| spark.streaming.backpressure.rateEstimator \| 1.5.0 \| SPARK-8977 \| 819be46e5a73f2d19230354ebba30c58538590f5#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 \| spark.streaming.backpressure.pid.proportional \| 1.5.0 \| SPARK-8979 \| 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 \| spark.streaming.backpressure.pid.integral \| 1.5.0 \| SPARK-8979 \| 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 \| spark.streaming.backpressure.pid.derived \| 1.5.0 \| SPARK-8979 \| 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 \| spark.streaming.backpressure.pid.minRate \| 1.5.0 \| SPARK-9966 \| 612b4609bdd38763725ae07d77c2176aa6756e64#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 \| spark.streaming.concurrentJobs \| 0.7.0 \| None \| c97ebf64377e853ab7c616a103869a4417f25954#diff-839f06302b2d648a85436486fc13c85d \| spark.streaming.internal.batchTime \| 1.4.0 \| SPARK-6862 \| 1b7106b867bc0aa4d64b669d79b646f862acaf47#diff-25124e4f06a1da237bf486eceb1f7967 \| It's not a configuration, it's a property spark.streaming.internal.outputOpId \| 1.4.0 \| SPARK-6862 \| 1b7106b867bc0aa4d64b669d79b646f862acaf47#diff-25124e4f06a1da237bf486eceb1f7967 \| It's not a configuration, it's a property spark.streaming.clock \| 0.7.0 \| None \| cae894ee7aefa4cf9b1952038a48be81e1d2a856#diff-839f06302b2d648a85436486fc13c85d \| spark.streaming.gracefulStopTimeout \| 1.0.0 \| SPARK-1332 \| 94cbe2329021296b660d88f3e8ef3734374020d2#diff-2f8c5c038fda47b9875e10785fdd2498 \| spark.streaming.manualClock.jump \| 0.7.0 \| None \| fc3d0b602a08fdd182c2138506d1cd9952631f95#diff-839f06302b2d648a85436486fc13c85d \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No' ### How was this patch tested? Exists UT Closes #27898 from beliefer/add-version-to-dstream-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-23 13:01:44 +09:00
beliefer	ae0699d4b5	[SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core ### What changes were proposed in this pull request? This PR follows up #27847, #27852 and https://github.com/apache/spark/pull/27913. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.storage.localDiskByExecutors.cacheSize \| 3.0.0 \| SPARK-27651 \| fd2bf55abaab08798a428d4e47d4050ba2b82a95#diff-6bdad48cfc34314e89599655442ff210 \| spark.storage.memoryMapLimitForTests \| 2.3.0 \| SPARK-3151 \| b8ffb51055108fd606b86f034747006962cd2df3#diff-abd96f2ae793cd6ea6aab5b96a3c1d7a \| spark.barrier.sync.timeout \| 2.4.0 \| SPARK-24817 \| 388f5a0635a2812cd71b08352e3ddc20293ec189#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.blacklist.unschedulableTaskSetTimeout \| 2.4.1 \| SPARK-22148 \| 52e9711d01694158ecb3691f2ec25c0ebe4b0207#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.barrier.maxConcurrentTasksCheck.interval \| 2.4.0 \| SPARK-24819 \| bfb74394a5513134ea1da9fcf4a1783b77dd64e4#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.barrier.maxConcurrentTasksCheck.maxFailures \| 2.4.0 \| SPARK-24819 \| bfb74394a5513134ea1da9fcf4a1783b77dd64e4#diff-6bdad48cfc34314e89599655442ff210 \| spark.unsafe.exceptionOnMemoryLeak \| 1.4.0 \| SPARK-7076 and SPARK-7077 and SPARK-7080 \| f49284b5bf3a69ed91a5e3e6e0ed3be93a6ab9e4#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.unsafe.sorter.spill.read.ahead.enabled \| 2.3.0 \| SPARK-21113 \| 1e978b17d63d7ba20368057aa4e65f5ef6e87369#diff-93a086317cea72a113cf81056882c206 \| spark.unsafe.sorter.spill.reader.buffer.size \| 2.1.0 \| SPARK-16862 \| c1937dd19a23bd096a4707656c7ba19fb5c16966#diff-93a086317cea72a113cf81056882c206 \| spark.plugins \| 3.0.0 \| SPARK-29397 \| d51d228048d519a9a666f48dc532625de13e7587#diff-6bdad48cfc34314e89599655442ff210 \| spark.cleaner.periodicGC.interval \| 1.6.0 \| SPARK-8414 \| 72da2a21f0940b97757ace5975535e559d627688#diff-75141521b1d55bc32d72b70032ad96c0 \| spark.cleaner.referenceTracking \| 1.0.0 \| SPARK-1103 \| 11eabbe125b2ee572fad359c33c93f5e6fdf0b2d#diff-364713d7776956cb8b0a771e9b62f82d \| spark.cleaner.referenceTracking.blocking \| 1.0.0 \| SPARK-1103 \| 11eabbe125b2ee572fad359c33c93f5e6fdf0b2d#diff-364713d7776956cb8b0a771e9b62f82d \| spark.cleaner.referenceTracking.blocking.shuffle \| 1.1.1 \| SPARK-3139 \| 5cf1e440137006eedd6846ac8fa57ccf9fd1958d#diff-75141521b1d55bc32d72b70032ad96c0 \| spark.cleaner.referenceTracking.cleanCheckpoints \| 1.4.0 \| SPARK-2033 \| 25998e4d73bcc95ac85d9af71adfdc726ec89568#diff-440e866c5df0b8386aff57f9f8bd8db1 \| spark.executor.logs.rolling.strategy \| 1.1.0 \| SPARK-1940 \| 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 \| spark.executor.logs.rolling.time.interval \| 1.1.0 \| SPARK-1940 \| 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 \| spark.executor.logs.rolling.maxSize \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.executor.logs.rolling.maxRetainedFiles \| 1.1.0 \| SPARK-1940 \| 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 \| spark.executor.logs.rolling.enableCompression \| 2.0.2 \| SPARK-17711 \| 26e978a93f029e1a1b5c7524d0b52c8141b70997#diff-2b4575e096e4db7165e087f9429f2a02 \| spark.master.rest.enabled \| 1.3.0 \| SPARK-5388 \| 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.master.rest.port \| 1.3.0 \| SPARK-5388 \| 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-29dffdccd5a7f4c8b496c293e87c8668 \| spark.master.ui.port \| 1.1.0 \| SPARK-2857 \| 12f99cf5f88faf94d9dbfe85cb72d0010a3a25ac#diff-366c88f47e9b5cfa4d4305febeb8b026 \| spark.io.compression.snappy.blockSize \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.io.compression.lz4.blockSize \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.io.compression.codec \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-df9e6118c481ceb27faa399114fac0a1 \| spark.io.compression.zstd.bufferSize \| 2.3.0 \| SPARK-19112 \| 444bce1c98c45147fe63e2132e9743a0c5e49598#diff-df9e6118c481ceb27faa399114fac0a1 \| spark.io.compression.zstd.level \| 2.3.0 \| SPARK-19112 \| 444bce1c98c45147fe63e2132e9743a0c5e49598#diff-df9e6118c481ceb27faa399114fac0a1 \| spark.io.warning.largeFileThreshold \| 3.0.0 \| SPARK-28366 \| 26d03b62e20d053943d03b5c5573dd349e49654c#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.compression.codec \| 3.0.0 \| SPARK-28118 \| 47f54b1ec717d0d744bf3ad46bb1ed3542b667c8#diff-6bdad48cfc34314e89599655442ff210 \| spark.buffer.size \| 0.5.0 \| None \| 4b1646a25f7581cecae108553da13833e842e68a#diff-eaf125f56ce786d64dcef99cf446a751 \| spark.locality.wait.process \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 \| spark.locality.wait.node \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 \| spark.locality.wait.rack \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 \| spark.reducer.maxSizeInFlight \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.reducer.maxReqsInFlight \| 2.0.0 \| SPARK-6166 \| 894921d813a259f2f266fde7d86d2ecb5a0af24b#diff-eb30a71e0d04150b8e0b64929852e38b \| spark.broadcast.compress \| 0.6.0 \| None \| efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 \| spark.broadcast.blockSize \| 0.5.0 \| None \| b8ab7862b8bd168bca60bd930cd97c1099fbc8a8#diff-271d7958e14cdaa46cf3737cfcf51341 \| spark.broadcast.checksum \| 2.1.1 \| SPARK-18188 \| 06a56df226aa0c03c21f23258630d8a96385c696#diff-4f43d14923008c6650a8eb7b40c07f74 \| spark.broadcast.UDFCompressionThreshold \| 3.0.0 \| SPARK-28355 \| 79e204770300dab4a669b9f8e2421ef905236e7b#diff-6bdad48cfc34314e89599655442ff210 \| spark.rdd.compress \| 0.6.0 \| None \| efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 \| spark.rdd.parallelListingThreshold \| 2.0.0 \| SPARK-9926 \| 80a4bfa4d1c86398b90b26c34d8dcbc2355f5a6a#diff-eaababfc87ea4949f97860e8b89b7586 \| spark.rdd.limit.scaleUpFactor \| 2.1.0 \| SPARK-16984 \| 806d8a8e980d8ba2f4261bceb393c40bafaa2f73#diff-1d55e54678eff2076263f2fe36150c17 \| spark.serializer \| 0.5.0 \| None \| fd1d255821bde844af28e897fabd59a715659038#diff-b920b65c23bf3a1b3326325b0d6a81b2 \| spark.serializer.objectStreamReset \| 1.0.0 \| SPARK-942 \| 40566e10aae4b21ffc71ea72702b8df118ac5c8e#diff-6a59dfc43d1b31dc1c3072ceafa829f5 \| spark.serializer.extraDebugInfo \| 1.3.0 \| SPARK-5307 \| 636408311deeebd77fb83d2249e0afad1a1ba149#diff-6a59dfc43d1b31dc1c3072ceafa829f5 \| spark.jars \| 0.9.0 \| None \| f1d206c6b4c0a5b2517b05af05fdda6049e2f7c2#diff-364713d7776956cb8b0a771e9b62f82d \| spark.files \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-364713d7776956cb8b0a771e9b62f82d \| spark.submit.deployMode \| 1.5.0 \| SPARK-6797 \| 7f487c8bde14dbdd244a3493ad11a129ef2bb327#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.submit.pyFiles \| 1.0.1 \| SPARK-1549 \| d7ddb26e1fa02e773999cc4a97c48d2cd1723956#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.scheduler.allocation.file \| 0.8.1 \| None \| 976fe60f7609d7b905a34f18743efabd966407f0#diff-9bc0105ee454005379abed710cd20ced \| spark.scheduler.minRegisteredResourcesRatio \| 1.1.1 \| SPARK-2635 \| 3311da2f9efc5ff2c7d01273ac08f719b067d11d#diff-7d99a7c7a051e5e851aaaefb275a44a1 \| spark.scheduler.maxRegisteredResourcesWaitingTime \| 1.1.1 \| SPARK-2635 \| 3311da2f9efc5ff2c7d01273ac08f719b067d11d#diff-7d99a7c7a051e5e851aaaefb275a44a1 \| spark.scheduler.mode \| 0.8.0 \| None \| 98fb69822cf780160bca51abeaab7c82e49fab54#diff-cb7a25b3c9a7341c6d99bcb8e9780c92 \| spark.scheduler.revive.interval \| 0.8.1 \| None \| d0c9d41a061969d409715b86a91937d8de4c29f7#diff-7d99a7c7a051e5e851aaaefb275a44a1 \| spark.speculation \| 0.6.0 \| None \| e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-4e188f32951dc989d97fa7577858bc7c \| spark.speculation.interval \| 0.6.0 \| None \| e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-4e188f32951dc989d97fa7577858bc7c \| spark.speculation.multiplier \| 0.6.0 \| None \| e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-fff59f72dfe6ca4ccb607ad12535da07 \| spark.speculation.quantile \| 0.6.0 \| None \| e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-fff59f72dfe6ca4ccb607ad12535da07 \| spark.speculation.task.duration.threshold \| 3.0.0 \| SPARK-29976 \| ad238a2238a9d0da89be4424574436cbfaee579d#diff-6bdad48cfc34314e89599655442ff210 \| spark.yarn.stagingDir \| 2.0.0 \| SPARK-13063 \| bc36df127d3b9f56b4edaeb5eca7697d4aef761a#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.buffer.pageSize \| 1.5.0 \| SPARK-9411 \| 1b0099fc62d02ff6216a76fbfe17a4ec5b2f3536#diff-1b22e54318c04824a6d53ed3f4d1bb35 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27931 from beliefer/add-version-to-core-config-part-four. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-23 11:07:43 +09:00
yan ma	fae981e5f3	[SPARK-30773][ML] Support NativeBlas for level-1 routines ### What changes were proposed in this pull request? Change BLAS for part of level-1 routines(axpy, dot, scal(double, denseVector)) from java implementation to NativeBLAS when vector size>256 ### Why are the changes needed? In current ML BLAS.scala, all level-1 routines are fixed to use java implementation. But NativeBLAS(intel MKL, OpenBLAS) can bring up to 11X performance improvement based on performance test which apply direct calls against these methods. We should provide a way to allow user take advantage of NativeBLAS for level-1 routines. Here we do it through switching to NativeBLAS for these methods from f2jBLAS. ### Does this PR introduce any user-facing change? Yes, methods axpy, dot, scal in level-1 routines will switch to NativeBLAS when it has more than nativeL1Threshold(fixed value 256) elements and will fallback to f2jBLAS if native BLAS is not properly configured in system. ### How was this patch tested? Perf test direct calls level-1 routines Closes #27546 from yma11/SPARK-30773. Lead-authored-by: yan ma <yan.ma@intel.com> Co-authored-by: Ma Yan <yan.ma@intel.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-20 10:32:58 -05:00
Kent Yao	88ae6c4481	[SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document ### What changes were proposed in this pull request? Fix errors and missing parts for datetime pattern document 1. The pattern we use is similar to DateTimeFormatter and SimpleDateFormat but not identical. So we shouldn't use any of them in the API docs but use a link to the doc of our own. 2. Some pattern letters are missing 3. Some pattern letters are explicitly banned - Set('A', 'c', 'e', 'n', 'N') 4. the second fraction pattern different logic for parsing and formatting ### Why are the changes needed? fix and improve doc ### Does this PR introduce any user-facing change? yes, new and updated doc ### How was this patch tested? pass Jenkins viewed locally with `jekyll serve` ![image](https://user-images.githubusercontent.com/8326978/77044447-6bd3bb00-69fa-11ea-8d6f-7084166c5dea.png) Closes #27956 from yaooqinn/SPARK-31189. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-20 21:59:26 +08:00
Wenchen Fan	8643e5d9c5	[SPARK-31171][SQL][FOLLOWUP] update document ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-19 07:29:31 +09:00
Huaxin Gao	d22c9f6c0d	[SPARK-30933][ML][DOCS] ML, GraphX 3.0 QA: Update user guide for new features & APIs ### What changes were proposed in this pull request? Change ml-tuning.html. ### Why are the changes needed? Add description for ```MultilabelClassificationEvaluator``` and ```RankingEvaluator```. ### Does this PR introduce any user-facing change? Yes before: ![image](https://user-images.githubusercontent.com/13592258/76437013-2c5ffb80-6376-11ea-8946-f5c2e7379b7c.png) after: ![image](https://user-images.githubusercontent.com/13592258/76437054-397cea80-6376-11ea-867f-fe8d8fa4e5b3.png) ### How was this patch tested? Closes #27880 from huaxingao/spark-30933. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-18 13:21:24 -05:00
Kent Yao	57fcc49306	[SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar ### What changes were proposed in this pull request? The meaning of 'u' was day number of the week in SimpleDateFormat, it was changed to year in DateTimeFormatter. Now we keep the old meaning of 'u' by substituting 'u' to 'e' internally and use DateTimeFormatter to parse the pattern string. In DateTimeFormatter, the 'e' and 'c' also represents day-of-week. e.g. ```sql select date_format(timestamp '2019-10-06', 'yyyy-MM-dd uuuu'); select date_format(timestamp '2019-10-06', 'yyyy-MM-dd uuee'); select date_format(timestamp '2019-10-06', 'yyyy-MM-dd eeee'); ``` Because of the substitution, they all goes to `.... eeee` silently. The users may congitive problems of their meanings, so we should mark them as illegal pattern characters to stay the same as before. This pr move the method `convertIncompatiblePattern` from `DatetimeUtils` to `DateTimeFormatterHelper` object, since it is quite specific for `DateTimeFormatterHelper` class. And 'e' and 'c' char checking in this method. Besides,`convertIncompatiblePattern` has a bug that will lose the last `'` if it ends with it, this pr fixes this too. e.g. ```sql spark-sql> select date_format(timestamp "2019-10-06", "yyyy-MM-dd'S'"); 20/03/18 11:19:45 ERROR SparkSQLDriver: Failed in [select date_format(timestamp "2019-10-06", "yyyy-MM-dd'S'")] java.lang.IllegalArgumentException: Pattern ends with an incomplete string literal: uuuu-MM-dd'S spark-sql> select to_timestamp("2019-10-06S", "yyyy-MM-dd'S'"); NULL ``` ### Why are the changes needed? avoid vagueness bug fix ### Does this PR introduce any user-facing change? no, these are not exposed yet ### How was this patch tested? add ut Closes #27939 from yaooqinn/SPARK-31176. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-18 20:19:50 +08:00
jiake	21c02ee5d0	[SPARK-30864][SQL][DOC] add the user guide for Adaptive Query Execution ### What changes were proposed in this pull request? This PR will add the user guide for AQE and the detailed configurations about the three mainly features in AQE. ### Why are the changes needed? Add the detailed configurations. ### Does this PR introduce any user-facing change? No ### How was this patch tested? only add doc no need ut. Closes #27616 from JkSelf/aqeuserguide. Authored-by: jiake <ke.a.jia@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-16 23:33:56 +08:00
beliefer	f4cd7495f1	[SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core ### What changes were proposed in this pull request? This PR follows up #27847 and https://github.com/apache/spark/pull/27852. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.metrics.namespace \| 2.1.0 \| SPARK-5847 \| 70f846a313061e4db6174e0dc6c12c8c806ccf78#diff-6bdad48cfc34314e89599655442ff210 \| spark.metrics.conf \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-7ea2624e832b166ca27cd4baca8691d9 \| spark.metrics.executorMetricsSource.enabled \| 3.0.0 \| SPARK-27189 \| 729f43f499f3dd2718c0b28d73f2ca29cc811eac#diff-6bdad48cfc34314e89599655442ff210 \| spark.metrics.staticSources.enabled \| 3.0.0 \| SPARK-30060 \| 60f20e5ea2000ab8f4a593b5e4217fd5637c5e22#diff-6bdad48cfc34314e89599655442ff210 \| spark.pyspark.driver.python \| 2.1.0 \| SPARK-13081 \| 7a9e25c38380e6c62080d62ad38a4830e44fe753#diff-6bdad48cfc34314e89599655442ff210 \| spark.pyspark.python \| 2.1.0 \| SPARK-13081 \| 7a9e25c38380e6c62080d62ad38a4830e44fe753#diff-6bdad48cfc34314e89599655442ff210 \| spark.history.ui.maxApplications \| 2.0.1 \| SPARK-17243 \| 021aa28f439443cda1bc7c5e3eee7c85b40c1a2d#diff-6bdad48cfc34314e89599655442ff210 \| spark.io.encryption.enabled \| 2.1.0 \| SPARK-5682 \| 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 \| spark.io.encryption.keygen.algorithm \| 2.1.0 \| SPARK-5682 \| 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 \| spark.io.encryption.keySizeBits \| 2.1.0 \| SPARK-5682 \| 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 \| spark.io.encryption.commons.config.* \| 2.1.0 \| SPARK-5682 \| `4b4e329e49` \| spark.io.crypto.cipher.transformation \| 2.1.0 \| SPARK-5682 \| 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.host \| 0.7.0 \| None \| 02a6761589c35f15f1a6e3b63a7964ba057d3ba6#diff-eaf125f56ce786d64dcef99cf446a751 \| spark.driver.port \| 0.7.0 \| None \| 02a6761589c35f15f1a6e3b63a7964ba057d3ba6#diff-eaf125f56ce786d64dcef99cf446a751 \| spark.driver.supervise \| 1.3.0 \| SPARK-5388 \| 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.bindAddress \| 2.1.0 \| SPARK-4563 \| 2cd1bfa4f0c6625b0ab1dbeba2b9586b9a6a9f42#diff-6bdad48cfc34314e89599655442ff210 \| spark.blockManager.port \| 1.1.0 \| SPARK-2157 \| 31090e43ca91f687b0bc6e25c824dc25bd7027cd#diff-2b643ea78c1add0381754b1f47eec132 \| spark.driver.blockManager.port \| 2.1.0 \| SPARK-4563 \| 2cd1bfa4f0c6625b0ab1dbeba2b9586b9a6a9f42#diff-6bdad48cfc34314e89599655442ff210 \| spark.files.ignoreCorruptFiles \| 2.1.0 \| SPARK-17850 \| 47776e7c0c68590fe446cef910900b1aaead06f9#diff-6bdad48cfc34314e89599655442ff210 \| spark.files.ignoreMissingFiles \| 2.4.0 \| SPARK-22676 \| ed4101d29f50d54fd7846421e4c00e9ecd3599d0#diff-6bdad48cfc34314e89599655442ff210 \| spark.log.callerContext \| 2.2.0 \| SPARK-16759 \| 3af894511be6fcc17731e28b284dba432fe911f5#diff-6bdad48cfc34314e89599655442ff210 \| In branch-2.2 but pom.xml is 2.1.0-SNAPSHOT spark.files.maxPartitionBytes \| 2.1.0 \| SPARK-16575 \| c8879bf1ee2af9ccd5d5656571d931d2fc1da024#diff-6bdad48cfc34314e89599655442ff210 \| spark.files.openCostInBytes \| 2.1.0 \| SPARK-16575 \| c8879bf1ee2af9ccd5d5656571d931d2fc1da024#diff-6bdad48cfc34314e89599655442ff210 \| spark.hadoopRDD.ignoreEmptySplits \| 2.3.0 \| SPARK-22233 \| 0fa10666cf75e3c4929940af49c8a6f6ea874759#diff-6bdad48cfc34314e89599655442ff210 \| spark.redaction.regex \| 2.1.2 \| SPARK-18535 and SPARK-19720 \| 444cca14d7ac8c5ab5d7e9d080b11f4d6babe3bf#diff-6bdad48cfc34314e89599655442ff210 \| spark.redaction.string.regex \| 2.2.0 \| SPARK-20070 \| 91fa80fe8a2480d64c430bd10f97b3d44c007bcc#diff-6bdad48cfc34314e89599655442ff210 \| spark.authenticate.secret \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.authenticate.secretBitLength \| 1.6.0 \| SPARK-11073 \| f8d93edec82eedab59d50aec06ca2de7e4cf14f6#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.authenticate \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.authenticate.enableSaslEncryption \| 1.4.0 \| SPARK-6229 \| 38d4e9e446b425ca6a8fe8d8080f387b08683842#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.authenticate.secret.file \| 3.0.0 \| SPARK-26239 \| 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 \| spark.authenticate.secret.driver.file \| 3.0.0 \| SPARK-26239 \| 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 \| spark.authenticate.secret.executor.file \| 3.0.0 \| SPARK-26239 \| 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 \| spark.buffer.write.chunkSize \| 2.3.0 \| SPARK-21527 \| 574ef6c987c636210828e96d2f797d8f10aff05e#diff-6bdad48cfc34314e89599655442ff210 \| spark.checkpoint.compress \| 2.2.0 \| SPARK-19525 \| 1405862382185e04b09f84af18f82f2f0295a755#diff-6bdad48cfc34314e89599655442ff210 \| spark.rdd.checkpoint.cachePreferredLocsExpireTime \| 3.0.0 \| SPARK-29182 \| 4ecbdbb6a7bd3908da32c82832e886b4f9f9e596#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.accurateBlockThreshold \| 2.2.1 \| SPARK-20801 \| 81f63c8923416014d5c6bc227dd3c4e2a62bac8e#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.registration.timeout \| 2.3.0 \| SPARK-20640 \| d107b3b910d8f434fb15b663a9db4c2dfe0a9f43#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.registration.maxAttempts \| 2.3.0 \| SPARK-20640 \| d107b3b910d8f434fb15b663a9db4c2dfe0a9f43#diff-6bdad48cfc34314e89599655442ff210 \| spark.reducer.maxBlocksInFlightPerAddress \| 2.2.1 \| SPARK-21243 \| 88dccda393bc79dc6032f71b6acf8eb2b4b152be#diff-6bdad48cfc34314e89599655442ff210 \| spark.network.maxRemoteBlockSizeFetchToMem \| 3.0.0 \| SPARK-26700 \| d8613571bc1847775dd5c1945757279234cb388c#diff-6bdad48cfc34314e89599655442ff210 \| spark.taskMetrics.trackUpdatedBlockStatuses \| 2.3.0 \| SPARK-20923 \| 5b5a69bea9de806e2c39b04b248ee82a7b664d7b#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.sort.io.plugin.class \| 3.0.0 \| SPARK-28209 \| abef84a868e9e15f346eea315bbab0ec8ac8e389#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.file.buffer \| 1.4.0 \| SPARK-7081 \| c53ebea9db418099df50f9adc1a18cee7849cd97#diff-ecdafc46b901740134261d2cab24ccd9 \| spark.shuffle.unsafe.file.output.buffer \| 2.3.0 \| SPARK-20950 \| 565e7a8d4ae7879ee704fb94ae9b3da31e202d7e#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.spill.diskWriteBufferSize \| 2.3.0 \| SPARK-20950 \| 565e7a8d4ae7879ee704fb94ae9b3da31e202d7e#diff-6bdad48cfc34314e89599655442ff210 \| spark.storage.unrollMemoryCheckPeriod \| 2.3.0 \| SPARK-21923 \| a11db942aaf4c470a85f8a1b180f034f7a584254#diff-6bdad48cfc34314e89599655442ff210 \| spark.storage.unrollMemoryGrowthFactor \| 2.3.0 \| SPARK-21923 \| a11db942aaf4c470a85f8a1b180f034f7a584254#diff-6bdad48cfc34314e89599655442ff210 \| spark.yarn.dist.forceDownloadSchemes \| 2.3.0 \| SPARK-21917 \| 8319432af60b8e1dc00f08d794f7d80591e24d0c#diff-6bdad48cfc34314e89599655442ff210 \| spark.extraListeners \| 1.3.0 \| SPARK-5411 \| 47e4d579eb4a9aab8e0dd9c1400394d80c8d0388#diff-364713d7776956cb8b0a771e9b62f82d \| spark.shuffle.spill.numElementsForceSpillThreshold \| 1.6.0 \| SPARK-10708 \| f6d06adf05afa9c5386dc2396c94e7a98730289f#diff-3eedc75de4787b842477138d8cc7f150 \| spark.shuffle.mapOutput.parallelAggregationThreshold \| 2.3.0 \| SPARK-22537 \| efd0036ec88bdc385f5a9ea568d2e2bbfcda2912#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.maxResultSize \| 1.2.0 \| SPARK-3466 \| 6181577e9935f46b646ba3925b873d031aa3d6ba#diff-d239aee594001f8391676e1047a0381e \| spark.security.credentials.renewalRatio \| 2.4.0 \| SPARK-23361 \| 5fa438471110afbf4e2174df449ac79e292501f8#diff-6bdad48cfc34314e89599655442ff210 \| spark.security.credentials.retryWait \| 2.4.0 \| SPARK-23361 \| 5fa438471110afbf4e2174df449ac79e292501f8#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.sort.initialBufferSize \| 2.1.0 \| SPARK-15958 \| bf665a958631125a1670504ef5966ef1a0e14798#diff-a1d00506391c1c4b2209f9bbff590c5b \| On branch-2.1, but in pom.xml it is 2.0.0-SNAPSHOT spark.shuffle.compress \| 0.6.0 \| None \| efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 \| spark.shuffle.spill.compress \| 0.9.0 \| None \| c3816de5040e3c48e58ed4762d2f4eb606812938#diff-2b643ea78c1add0381754b1f47eec132 \| spark.shuffle.mapStatus.compression.codec \| 3.0.0 \| SPARK-29939 \| 456cfe6e4693efd26d64f089d53c4e01bf8150a2#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.spill.initialMemoryThreshold \| 1.1.1 \| SPARK-4480 \| 16bf5f3d17624db2a96c921fe8a1e153cdafb06c#diff-31417c461d8901d8e08167b0cbc344c1 \| spark.shuffle.spill.batchSize \| 0.9.0 \| None \| c3816de5040e3c48e58ed4762d2f4eb606812938#diff-a470b9812a5ac8c37d732da7d9fbe39a \| spark.shuffle.sort.bypassMergeThreshold \| 1.1.1 \| SPARK-2787 \| 0f2274f8ed6131ad17326e3fff7f7e093863b72d#diff-31417c461d8901d8e08167b0cbc344c1 \| spark.shuffle.manager \| 1.1.0 \| SPARK-2044 \| 508fd371d6dbb826fd8a00787d347235b549e189#diff-60df49b5d3c59f2c4540fa16a90033a1 \| spark.shuffle.reduceLocality.enabled \| 1.5.0 \| SPARK-2774 \| 96a7c888d806adfdb2c722025a1079ed7eaa2052#diff-6a9ff7fb74fd490a50462d45db2d5e11 \| spark.shuffle.mapOutput.minSizeForBroadcast \| 2.0.0 \| SPARK-1239 \| d98dd72e7baeb59eacec4fefd66397513a607b2f#diff-609c3f8c26150ca96a94cd27146a809b \| spark.shuffle.mapOutput.dispatcher.numThreads \| 2.0.0 \| SPARK-1239 \| d98dd72e7baeb59eacec4fefd66397513a607b2f#diff-609c3f8c26150ca96a94cd27146a809b \| spark.shuffle.detectCorrupt \| 2.2.0 \| SPARK-4105 \| cf33a86285629abe72c1acf235b8bfa6057220a8#diff-eb30a71e0d04150b8e0b64929852e38b \| spark.shuffle.detectCorrupt.useExtraMemory \| 3.0.0 \| SPARK-26089 \| 688b0c01fac0db80f6473181673a89f1ce1be65b#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.sync \| 0.8.0 \| None \| 31da065b1d08c1fad5283e4bcf8e0ed01818c03e#diff-ad46ed23fcc3fa87f30d05204917b917 \| spark.shuffle.unsafe.fastMergeEnabled \| 1.4.0 \| SPARK-7081 \| c53ebea9db418099df50f9adc1a18cee7849cd97#diff-642ce9f439435408382c3ac3b5c5e0a0 \| spark.shuffle.sort.useRadixSort \| 2.0.0 \| SPARK-14724 \| e2b5647ab92eb478b3f7b36a0ce6faf83e24c0e5#diff-3eedc75de4787b842477138d8cc7f150 \| spark.shuffle.minNumPartitionsToHighlyCompress \| 2.4.0 \| SPARK-24519 \| 39dfaf2fd167cafc84ec9cc637c114ed54a331e3#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.useOldFetchProtocol \| 3.0.0 \| SPARK-25341 \| f725d472f51fb80c6ce1882ec283ff69bafb0de4#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.readHostLocalDisk \| 3.0.0 \| SPARK-30812 \| 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27913 from beliefer/add-version-to-core-config-part-three. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-16 10:08:07 +09:00
gatorsmile	4d4c00c1b5	[SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-15 07:35:20 +09:00
HyukjinKwon	9628aca68b	[MINOR][DOCS] Fix [[...]] to `...` and <code>...</code> in documentation ### What changes were proposed in this pull request? Before: - ![Screen Shot 2020-03-13 at 1 19 12 PM](https://user-images.githubusercontent.com/6477701/76589452-7c34f300-652d-11ea-9da7-3754f8575796.png) - ![Screen Shot 2020-03-13 at 1 19 24 PM](https://user-images.githubusercontent.com/6477701/76589455-7d662000-652d-11ea-9dbe-f5fe10d1e7ad.png) - ![Screen Shot 2020-03-13 at 1 19 03 PM](https://user-images.githubusercontent.com/6477701/76589449-7b03c600-652d-11ea-8e99-dbe47f561f9c.png) After: - ![Screen Shot 2020-03-13 at 1 17 37 PM](https://user-images.githubusercontent.com/6477701/76589437-74754e80-652d-11ea-99f5-14fb4761f915.png) - ![Screen Shot 2020-03-13 at 1 17 46 PM](https://user-images.githubusercontent.com/6477701/76589442-76d7a880-652d-11ea-8c10-53e595421081.png) - ![Screen Shot 2020-03-13 at 1 18 15 PM](https://user-images.githubusercontent.com/6477701/76589443-7808d580-652d-11ea-9b1b-e5d11d638335.png) ### Why are the changes needed? To render the code block properly in the documentation ### Does this PR introduce any user-facing change? Yes, code rendering in documentation. ### How was this patch tested? Manually built the doc via `SKIP_API=1 jekyll build`. Closes #27899 from HyukjinKwon/minor-docss. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-13 16:44:23 -07:00
gatorsmile	1c8526dc87	[SPARK-28093][FOLLOW-UP] Remove migration guide of TRIM changes ### What changes were proposed in this pull request? Since we reverted the original change in https://github.com/apache/spark/pull/27540, this PR is to remove the corresponding migration guide made in the commit https://github.com/apache/spark/pull/24948 ### Why are the changes needed? N/A ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? N/A Closes #27896 from gatorsmile/SPARK-28093Followup. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-13 11:45:59 +09:00
Gabor Somogyi	231e65092f	[SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector ### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added Postgres support (other supported databases will come in later PRs). What this PR contains: * Added `keytab` and `principal` JDBC options * Added `ConnectionProvider` trait and it's impementations: * `BasicConnectionProvider` => unsecure connection * `PostgresConnectionProvider` => postgres secure connection * Added `ConnectionProvider` tests * Added `PostgresKrbIntegrationSuite` docker integration test * Created `SecurityUtils` to concentrate re-usable security related functionalities * Documentation ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce any user-facing change? Yes, 2 additional JDBC options added: * keytab * principal If both provided then Spark does kerberos authentication. ### How was this patch tested? To demonstrate the functionality with a standalone application I've created this repository: https://github.com/gaborgsomogyi/docker-kerberos * Additional + existing unit tests * Additional docker integration test * Test on cluster manually * `SKIP_API=1 jekyll build` Closes #27637 from gaborgsomogyi/SPARK-30874. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>	2020-03-12 19:04:35 -07:00
Kent Yao	7b4b29e8d9	[SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled ### What changes were proposed in this pull request? spark.sql.legacy.timeParser.enabled should be removed from SQLConf and the migration guide spark.sql.legacy.timeParsePolicy is the right one ### Why are the changes needed? fix doc ### Does this PR introduce any user-facing change? no ### How was this patch tested? Pass the jenkins Closes #27889 from yaooqinn/SPARK-31131. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-12 09:24:49 -07:00
beliefer	bd2b3f9132	[SPARK-30911][CORE][DOC] Add version information to the configuration of Status ### What changes were proposed in this pull request? 1.Add version information to the configuration of `Status`. 2.Update the docs of `Status`. 3.By the way supplementary documentation about https://github.com/apache/spark/pull/27847 I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.appStateStore.asyncTracking.enable \| 2.3.0 \| SPARK-20653 \| 772e4648d95bda3353723337723543c741ea8476#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c \| spark.ui.liveUpdate.period \| 2.3.0 \| SPARK-20644 \| c7f38e5adb88d43ef60662c5d6ff4e7a95bff580#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c \| spark.ui.liveUpdate.minFlushPeriod \| 2.4.2 \| SPARK-27394 \| a8a2ba11ac10051423e58920062b50f328b06421#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c \| spark.ui.retainedJobs \| 1.2.0 \| SPARK-2321 \| 9530316887612dca060a128fca34dd5a6ab2a9a9#diff-1f32bcb61f51133bd0959a4177a066a5 \| spark.ui.retainedStages \| 0.9.0 \| None \| 112c0a1776bbc866a1026a9579c6f72f293414c4#diff-1f32bcb61f51133bd0959a4177a066a5 \| 0.9.0-incubating-SNAPSHOT spark.ui.retainedTasks \| 2.0.1 \| SPARK-15083 \| 55db26245d69bb02b7d7d5f25029b1a1cd571644#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.retainedDeadExecutors \| 2.0.0 \| SPARK-7729 \| 9f4263392e492b5bc0acecec2712438ff9a257b7#diff-a0ba36f9b1f9829bf3c4689b05ab6cf2 \| spark.ui.dagGraph.retainedRootRDDs \| 2.1.0 \| SPARK-17171 \| cc87280fcd065b01667ca7a59a1a32c7ab757355#diff-3f492c527ea26679d4307041b28455b8 \| spark.metrics.appStatusSource.enabled \| 3.0.0 \| SPARK-30060 \| 60f20e5ea2000ab8f4a593b5e4217fd5637c5e22#diff-9f796ae06b0272c1f0a012652a5b68d0 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27848 from beliefer/add-version-to-status-config. Lead-authored-by: beliefer <beliefer@163.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 11:03:47 +09:00
beliefer	1cd80fa9fa	[SPARK-31109][MESOS][DOC] Add version information to the configuration of Mesos ### What changes were proposed in this pull request? Add version information to the configuration of `Mesos`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.mesos.$taskType.secret.names \| 2.3.0 \| SPARK-22131 \| 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.$taskType.secret.values \| 2.3.0 \| SPARK-22131 \| 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.$taskType.secret.envkeys \| 2.3.0 \| SPARK-22131 \| 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.$taskType.secret.filenames \| 2.3.0 \| SPARK-22131 \| 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.principal \| 1.5.0 \| SPARK-6284 \| d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 \| spark.mesos.principal.file \| 2.4.0 \| SPARK-16501 \| 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d \| spark.mesos.secret \| 1.5.0 \| SPARK-6284 \| d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 \| spark.mesos.secret.file \| 2.4.0 \| SPARK-16501 \| 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d \| spark.shuffle.cleaner.interval \| 2.0.0 \| SPARK-12583 \| 310981d49a332bd329303f610b150bbe02cf5f87#diff-2fafefee94f2a2023ea9765536870258 \| spark.mesos.dispatcher.webui.url \| 2.0.0 \| SPARK-13492 \| a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-f541460c7a74cee87cbb460b3b01665e \| spark.mesos.dispatcher.historyServer.url \| 2.1.0 \| SPARK-16809 \| 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-3779e2035d9a09fa5f6af903925b9512 \| spark.mesos.driver.labels \| 2.3.0 \| SPARK-21000 \| 8da3f7041aafa71d7596b531625edb899970fec2#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.driver.webui.url \| 2.0.0 \| SPARK-13492 \| a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.driver.failoverTimeout \| 2.3.0 \| SPARK-21456 \| c42ef953343073a50ef04c5ce848b574ff7f2238#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.network.name \| 2.1.0 \| SPARK-18232 \| d89bfc92302424406847ac7a9cfca714e6b742fc#diff-ab5bf34f1951a8f7ea83c9456a6c3ab7 \| spark.mesos.network.labels \| 2.3.0 \| SPARK-21694 \| ce0d3bb377766bdf4df7852272557ae846408877#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.driver.constraints \| 2.2.1 \| SPARK-19606 \| f6ee3d90d5c299e67ae6e2d553c16c0d9759d4b5#diff-91e6e5f871160782dc50d4060d6faea3 \| spark.mesos.driver.frameworkId \| 2.1.0 \| SPARK-16809 \| 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-02a6d899f7a529eb7cfbb12182a110b0 \| spark.executor.uri \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-a885e7df97790e9b59c21c63353e7476 \| spark.mesos.proxy.baseURL \| 2.3.0 \| SPARK-13041 \| 663f30d14a0c9219e07697af1ab56e11a714d9a6#diff-0b9b4e122eb666155aa189a4321a6ca8 \| spark.mesos.coarse \| 0.6.0 \| None \| 63051dd2bcc4bf09d413ff7cf89a37967edc33ba#diff-eaf125f56ce786d64dcef99cf446a751 \| spark.mesos.coarse.shutdownTimeout \| 2.0.0 \| SPARK-12330 \| c756bda477f458ba4aad7fdb2026263507e0ad9b#diff-d425d35aa23c47a62fbb538554f2f2cf \| spark.mesos.maxDrivers \| 1.4.0 \| SPARK-5338 \| 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.retainedDrivers \| 1.4.0 \| SPARK-5338 \| 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.cluster.retry.wait.max \| 1.4.0 \| SPARK-5338 \| 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.fetcherCache.enable \| 2.1.0 \| SPARK-15994 \| e34b4e12673fb76c92f661d7c03527410857a0f8#diff-772ea7311566edb25f11a4c4f882179a \| spark.mesos.appJar.local.resolution.mode \| 2.4.0 \| SPARK-24326 \| 22df953f6bb191858053eafbabaa5b3ebca29f56#diff-6e4d0a0445975f03f975fdc1e3d80e49 \| spark.mesos.rejectOfferDuration \| 2.2.0 \| SPARK-19702 \| 2e30c0b9bcaa6f7757bd85d1f1ec392d5f916f83#diff-daf48dabbe58afaeed8787751750b01d \| spark.mesos.rejectOfferDurationForUnmetConstraints \| 1.6.0 \| SPARK-10471 \| 74f50275e429e649212928a9f36552941b862edc#diff-02a6d899f7a529eb7cfbb12182a110b0 \| spark.mesos.rejectOfferDurationForReachedMaxCores \| 2.0.0 \| SPARK-13001 \| 1e7d9bfb5a41f5c2479ab3b4d4081f00bf00bd31#diff-02a6d899f7a529eb7cfbb12182a110b0 \| spark.mesos.uris \| 1.5.0 \| SPARK-8798 \| a2f805729b401c68b60bd690ad02533b8db57b58#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.executor.home \| 1.1.1 \| SPARK-3264 \| 069ecfef02c4af69fc0d3755bd78be321b68b01d#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.mesosExecutor.cores \| 1.4.0 \| SPARK-6350 \| 6fbeb82e13db7117d8f216e6148632490a4bc5be#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.extra.cores \| 0.6.0 \| None \| 2d761e3353651049f6707c74bb5ffdd6e86f6f35#diff-37af8c6e3634f97410ade813a5172621 \| spark.mesos.executor.memoryOverhead \| 1.1.1 \| SPARK-3535 \| 6f150978477830bbc14ba983786dd2bce12d1fe2#diff-6b498f5407d10e848acac4a1b182457c \| spark.mesos.executor.docker.image \| 1.4.0 \| SPARK-2691 \| 8f50a07d2188ccc5315d979755188b1e5d5b5471#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.executor.docker.forcePullImage \| 2.1.0 \| SPARK-15271 \| 978cd5f125eb5a410bad2e60bf8385b11cf1b978#diff-0dd025320c7ecda2ea310ed7172d7f5a \| spark.mesos.executor.docker.portmaps \| 1.4.0 \| SPARK-7373 \| 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.executor.docker.parameters \| 2.2.0 \| SPARK-19740 \| a888fed3099e84c2cf45e9419f684a3658ada19d#diff-4139e6605a8c7f242f65cde538770c99 \| spark.mesos.executor.docker.volumes \| 1.4.0 \| SPARK-7373 \| 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.gpus.max \| 2.1.0 \| SPARK-14082 \| 29f186bfdf929b1e8ffd8e33ee37b76d5dc5af53#diff-d427ee890b913c5a7056be21eb4f39d7 \| spark.mesos.task.labels \| 2.2.0 \| SPARK-20085 \| c8fc1f3badf61bcfc4bd8eeeb61f73078ca068d1#diff-387c5d0c916278495fc28420571adf9e \| spark.mesos.constraints \| 1.5.0 \| SPARK-6707 \| 1165b17d24cdf1dbebb2faca14308dfe5c2a652c#diff-e3a5e67b8de2069ce99801372e214b8e \| spark.mesos.containerizer \| 2.1.0 \| SPARK-16637 \| 266b92faffb66af24d8ed2725beb80770a2d91f8#diff-0dd025320c7ecda2ea310ed7172d7f5a \| spark.mesos.role \| 1.5.0 \| SPARK-6284 \| d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 \| The following appears in the document \| \| \| \| spark.mesos.driverEnv.[EnvironmentVariableName] \| 2.1.0 \| SPARK-16194 \| 235cb256d06653bcde4c3ed6b081503a94996321#diff-b964c449b99c51f0a5fd77270b2951a4 \| spark.mesos.dispatcher.driverDefault.[PropertyName] \| 2.1.0 \| SPARK-16927 and SPARK-16923 \| eca58755fbbc11937b335ad953a3caff89b818e6#diff-b964c449b99c51f0a5fd77270b2951a4 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27863 from beliefer/add-version-to-mesos-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 11:02:29 +09:00
beliefer	1254c88034	[SPARK-31118][K8S][DOC] Add version information to the configuration of K8S ### What changes were proposed in this pull request? Add version information to the configuration of `K8S`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.kubernetes.context \| 3.0.0 \| SPARK-25887 \| c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.master \| 3.0.0 \| SPARK-30371 \| f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.namespace \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image \| 2.3.0 \| SPARK-22994 \| b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.container.image \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.container.image \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image.pullPolicy \| 2.3.0 \| SPARK-22807 \| fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.container.image.pullSecrets \| 2.4.0 \| SPARK-23668 \| cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.requestTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.connectionTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.requestTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.connectionTimeout \| 3.0.0 \| SPARK-27023 \| e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 \| KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName \| 3.1.0 \| SPARK-30122 \| f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.executor spark.kubernetes.driver.limit.cores \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.request.cores \| 3.0.0 \| SPARK-27754 \| 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submitInDriver \| 2.4.0 \| SPARK-22839 \| f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.limit.cores \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.scheduler.name \| 3.0.0 \| SPARK-29436 \| f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.request.cores \| 2.4.0 \| SPARK-23285 \| fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.pod.name \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.resourceNamePrefix \| 3.0.0 \| SPARK-25876 \| 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podNamePrefix \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.allocation.batch.size \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.allocation.batch.delay \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.lostCheck.maxAttempts \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.submission.waitAppCompletion \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.report.interval \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.apiPollingInterval \| 2.4.0 \| SPARK-24248 \| 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.eventProcessingInterval \| 2.4.0 \| SPARK-24248 \| 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.memoryOverheadFactor \| 2.4.0 \| SPARK-23984 \| 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.pyspark.pythonVersion \| 2.4.0 \| SPARK-23984 \| a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.krb5.path \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.krb5.configMapName \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.hadoop.configMapName \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.tokenSecret.name \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.kerberos.tokenSecret.itemKey \| 3.0.0 \| SPARK-23257 \| 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.resource.type \| 2.4.1 \| SPARK-25021 \| 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.local.dirs.tmpfs \| 3.0.0 \| SPARK-25262 \| da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 \| It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot spark.kubernetes.driver.podTemplateFile \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podTemplateFile \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.podTemplateContainerName \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.podTemplateContainerName \| 3.0.0 \| SPARK-24434 \| f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.deleteOnTermination \| 3.0.0 \| SPARK-25515 \| 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.dynamicAllocation.deleteGracePeriod \| 3.0.0 \| SPARK-28487 \| 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.appKillPodDeletionGracePeriod \| 3.0.0 \| SPARK-24793 \| 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.file.upload.path \| 3.0.0 \| SPARK-23153 \| 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 \| The following appears in the document \| \| \| \| spark.kubernetes.authenticate.submission.caCertFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.clientKeyFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.clientCertFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.oauthToken \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.submission.oauthTokenFile \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.caCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.clientKeyFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.clientCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.oauthToken \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.oauthTokenFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.caCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.clientKeyFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.clientCertFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.driver.mounted.oauthTokenFile \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.caCertFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.clientKeyFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.clientCertFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.oauthToken \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.authenticate.oauthTokenFile \| 2.4.0 \| SPARK-23146 \| 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.label.[LabelName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.annotation.[AnnotationName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.label.[LabelName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.annotation.[AnnotationName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.node.selector.[labelKey] \| 2.3.0 \| SPARK-18278 \| e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driverEnv.[EnvironmentVariableName] \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.secrets.[SecretName] \| 2.3.0 \| SPARK-22757 \| 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.secrets.[SecretName] \| 2.3.0 \| SPARK-22757 \| 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.secretKeyRef.[EnvName] \| 2.4.0 \| SPARK-24232 \| 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.secretKeyRef.[EnvName] \| 2.4.0 \| SPARK-24232 \| 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath \| 3.0.0 \| SPARK-25960 \| 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath \| 3.0.0 \| SPARK-25960 \| 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 \| spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] \| 2.4.0 \| SPARK-23529 \| 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No' ### How was this patch tested? Exists UT Closes #27875 from beliefer/add-version-to-k8s-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 09:54:08 +09:00
beliefer	0722dc5fb8	[SPARK-31092][YARN][DOC] Add version information to the configuration of Yarn ### What changes were proposed in this pull request? Add version information to the configuration of `Yarn`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.yarn.tags \| 1.5.0 \| SPARK-9782 \| 9b731fad2b43ca18f3c5274062d4c7bc2622ab72#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.priority \| 3.0.0 \| SPARK-29603 \| 4615769736f4c052ae1a2de26e715e229154cd2f#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.am.attemptFailuresValidityInterval \| 1.6.0 \| SPARK-10739 \| f97e9323b526b3d0b0fee0ca03f4276f37bb5750#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.executor.failuresValidityInterval \| 2.0.0 \| SPARK-6735 \| 8b44bd52fa40c0fc7d34798c3654e31533fd3008#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.maxAppAttempts \| 1.3.0 \| SPARK-2165 \| 8fdd48959c93b9cf809f03549e2ae6c4687d1fcd#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.user.classpath.first \| 1.3.0 \| SPARK-5087 \| 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.config.gatewayPath \| 1.5.0 \| SPARK-8302 \| 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.config.replacementPath \| 1.5.0 \| SPARK-8302 \| 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.queue \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b \| spark.yarn.historyServer.address \| 1.0.0 \| SPARK-1408 \| 0058b5d2c74147d24b127a5432f89ebc7050dc18#diff-923ae58523a12397f74dd590744b8b41 \| spark.yarn.historyServer.allowTracking \| 2.2.0 \| SPARK-19554 \| 4661d30b988bf773ab45a15b143efb2908d33743#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.archive \| 2.0.0 \| SPARK-13577 \| 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.jars \| 2.0.0 \| SPARK-13577 \| 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.dist.archives \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b \| spark.yarn.dist.files \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b \| spark.yarn.dist.jars \| 2.0.0 \| SPARK-12343 \| 8ba2b7f28fee39c4839e5ea125bd25f5091a3a1e#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.preserve.staging.files \| 1.1.0 \| SPARK-2933 \| b92d823ad13f6fcc325eeb99563bea543871c6aa#diff-85a1f4b2810b3e11b8434dcefac5bb85 \| spark.yarn.submit.file.replication \| 0.8.1 \| None \| 4668fcb9ff8f9c176c4866480d52dde5d67c8522#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.submit.waitAppCompletion \| 1.4.0 \| SPARK-3591 \| b65bad65c3500475b974ca0219f218eef296db2c#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.report.interval \| 0.9.0 \| None \| ebdfa6bb9766209bc5a3c4241fa47141c5e9c5cb#diff-e0a7ae95b6d8e04a67ebca0945d27b65 \| spark.yarn.clientLaunchMonitorInterval \| 2.3.0 \| SPARK-16019 \| 1cad31f00644d899d8e74d58c6eb4e9f72065473#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.am.waitTime \| 1.3.0 \| SPARK-3779 \| 253b72b56fe908bbab5d621eae8a5f359c639dfd#diff-87125050a2e2eaf87ea83aac9c19b200 \| spark.yarn.metrics.namespace \| 2.4.0 \| SPARK-24594 \| d2436a85294a178398525c37833dae79d45c1452#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.am.nodeLabelExpression \| 1.6.0 \| SPARK-7173 \| 7db3610327d0725ec2ad378bc873b127a59bb87a#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.containerLauncherMaxThreads \| 1.2.0 \| SPARK-1713 \| 1f4a648d4e30e837d6cf3ea8de1808e2254ad70b#diff-801a04f9e67321f3203399f7f59234c1 \| spark.yarn.max.executor.failures \| 1.0.0 \| SPARK-1183 \| 698373211ef3cdf841c82d48168cd5dbe00a57b4#diff-0c239e58b37779967e0841fb42f3415a \| spark.yarn.scheduler.reporterThread.maxFailures \| 1.2.0 \| SPARK-3304 \| 11c10df825419372df61a8d23c51e8c3cc78047f#diff-85a1f4b2810b3e11b8434dcefac5bb85 \| spark.yarn.scheduler.heartbeat.interval-ms \| 0.8.1 \| None \| ee22be0e6c302fb2cdb24f83365c2b8a43a1baab#diff-87125050a2e2eaf87ea83aac9c19b200 \| spark.yarn.scheduler.initial-allocation.interval \| 1.4.0 \| SPARK-7533 \| 3ddf051ee7256f642f8a17768d161c7b5f55c7e1#diff-87125050a2e2eaf87ea83aac9c19b200 \| spark.yarn.am.finalMessageLimit \| 2.4.0 \| SPARK-25174 \| f8346d2fc01f1e881e4e3f9c4499bf5f9e3ceb3f#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.am.cores \| 1.3.0 \| SPARK-1507 \| 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-746d34aa06bfa57adb9289011e725472 \| spark.yarn.am.extraJavaOptions \| 1.3.0 \| SPARK-5087 \| 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.am.extraLibraryPath \| 1.4.0 \| SPARK-7281 \| 7b5dd3e3c0030087eea5a8224789352c03717c1d#diff-b050df3f55b82065803d6e83453b9706 \| spark.yarn.am.memoryOverhead \| 1.3.0 \| SPARK-1953 \| e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 \| spark.yarn.am.memory \| 1.3.0 \| SPARK-1953 \| e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 \| spark.driver.appUIAddress \| 1.1.0 \| SPARK-1291 \| 72ea56da8e383c61c6f18eeefef03b9af00f5158#diff-2b4617e158e9c5999733759550440b96 \| spark.yarn.executor.nodeLabelExpression \| 1.4.0 \| SPARK-6470 \| 82fee9d9aad2c9ba2fb4bd658579fe99218cafac#diff-d4620cf162e045960d84c88b2e0aa428 \| spark.yarn.unmanagedAM.enabled \| 3.0.0 \| SPARK-22404 \| f06bc0cd1dee2a58e04ebf24bf719a2f7ef2dc4e#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.rolledLog.includePattern \| 2.0.0 \| SPARK-15990 \| 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.rolledLog.excludePattern \| 2.0.0 \| SPARK-15990 \| 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.user.jar \| 1.1.0 \| SPARK-1395 \| e380767de344fd6898429de43da592658fd86a39#diff-50e237ea17ce94c3ccfc44143518a5f7 \| spark.yarn.secondary.jars \| 0.9.2 \| SPARK-1870 \| 1d3aab96120c6770399e78a72b5692cf8f61a144#diff-50b743cff4885220c828b16c44eeecfd \| spark.yarn.cache.filenames \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.cache.sizes \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.cache.timestamps \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.cache.visibilities \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.cache.types \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.cache.confArchive \| 2.0.0 \| SPARK-14602 \| f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 \| spark.yarn.blacklist.executor.launch.blacklisting.enabled \| 2.4.0 \| SPARK-16630 \| b56e9c613fb345472da3db1a567ee129621f6bf3#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.exclude.nodes \| 3.0.0 \| SPARK-26688 \| caceaec93203edaea1d521b88e82ef67094cdea9#diff-4804e0f83ca7f891183eb0db229b4b9a \| The following appears in the document \| \| \| \| spark.yarn.am.resource.{resource-type}.amount \| 3.0.0 \| SPARK-20327 \| 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.driver.resource.{resource-type}.amount \| 3.0.0 \| SPARK-20327 \| 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.executor.resource.{resource-type}.amount \| 3.0.0 \| SPARK-20327 \| 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a \| spark.yarn.appMasterEnv.[EnvironmentVariableName] \| 1.1.0 \| SPARK-1680 \| 7b798e10e214cd407d3399e2cab9e3789f9a929e#diff-50e237ea17ce94c3ccfc44143518a5f7 \| spark.yarn.kerberos.relogin.period \| 2.3.0 \| SPARK-22290 \| dc2714da50ecba1bf1fdf555a82a4314f763a76e#diff-4804e0f83ca7f891183eb0db229b4b9a \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT Closes #27856 from beliefer/add-version-to-yarn-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 09:52:57 +09:00
beliefer	c1b2675f2e	[SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core ### What changes were proposed in this pull request? This PR follows up https://github.com/apache/spark/pull/27847. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.yarn.isPython \| 1.5.0 \| SPARK-5479 \| 38112905bc3b33f2ae75274afba1c30e116f6e46#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.task.cpus \| 0.5.0 \| None \| e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-391214d132a0fb4478f4f9c2313d8966 \| spark.dynamicAllocation.enabled \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.testing \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.minExecutors \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.initialExecutors \| 1.3.0 \| SPARK-4585 \| b2047b55c5fc85de6b63276d8ab9610d2496e08b#diff-b096353602813e47074ace09a3890d56 \| spark.dynamicAllocation.maxExecutors \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.executorAllocationRatio \| 2.4.0 \| SPARK-22683 \| 55c4ca88a3b093ee197a8689631be8d1fac1f10f#diff-6bdad48cfc34314e89599655442ff210 \| spark.dynamicAllocation.cachedExecutorIdleTimeout \| 1.4.0 \| SPARK-7955 \| 6faaf15ba311bc3a79aae40a6c9c4befabb6889f#diff-b096353602813e47074ace09a3890d56 \| spark.dynamicAllocation.executorIdleTimeout \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.shuffleTracking.enabled \| 3.0.0 \| SPARK-27963 \| 2ddeff97d7329942a98ef363991eeabc3fa71a76#diff-6bdad48cfc34314e89599655442ff210 \| spark.dynamicAllocation.shuffleTimeout \| 3.0.0 \| SPARK-27963 \| 2ddeff97d7329942a98ef363991eeabc3fa71a76#diff-6bdad48cfc34314e89599655442ff210 \| spark.dynamicAllocation.schedulerBacklogTimeout \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.dynamicAllocation.sustainedSchedulerBacklogTimeout \| 1.2.0 \| SPARK-3795 \| 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d \| spark.locality.wait \| 0.5.0 \| None \| e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-391214d132a0fb4478f4f9c2313d8966 \| spark.shuffle.service.enabled \| 1.2.0 \| SPARK-3796 \| f55218aeb1e9d638df6229b36a59a15ce5363482#diff-2b643ea78c1add0381754b1f47eec132 \| Constants.SHUFFLE_SERVICE_FETCH_RDD_ENABLED \| 3.0.0 \| SPARK-27677 \| e9f3f62b2c0f521f3cc23fef381fc6754853ad4f#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.service.fetch.rdd.enabled spark.shuffle.service.db.enabled \| 3.0.0 \| SPARK-26288 \| 8b0aa59218c209d39cbba5959302d8668b885cf6#diff-6bdad48cfc34314e89599655442ff210 \| spark.shuffle.service.port \| 1.2.0 \| SPARK-3796 \| f55218aeb1e9d638df6229b36a59a15ce5363482#diff-2b643ea78c1add0381754b1f47eec132 \| spark.kerberos.keytab \| 3.0.0 \| SPARK-25372 \| 51540c2fa677658be954c820bc18ba748e4c8583#diff-6bdad48cfc34314e89599655442ff210 \| spark.kerberos.principal \| 3.0.0 \| SPARK-25372 \| 51540c2fa677658be954c820bc18ba748e4c8583#diff-6bdad48cfc34314e89599655442ff210 \| spark.kerberos.relogin.period \| 3.0.0 \| SPARK-23781 \| 68dde3481ea458b0b8deeec2f99233c2d4c1e056#diff-6bdad48cfc34314e89599655442ff210 \| spark.kerberos.renewal.credentials \| 3.0.0 \| SPARK-26595 \| 2a67dbfbd341af166b1c85904875f26a6dea5ba8#diff-6bdad48cfc34314e89599655442ff210 \| spark.kerberos.access.hadoopFileSystems \| 3.0.0 \| SPARK-26766 \| d0443a74d185ec72b747fa39994fa9a40ce974cf#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.instances \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.yarn.dist.pyFiles \| 2.2.1 \| SPARK-21714 \| d10c9dc3f631a26dbbbd8f5c601ca2001a5d7c80#diff-6bdad48cfc34314e89599655442ff210 \| spark.task.maxDirectResultSize \| 2.0.0 \| SPARK-13830 \| 2ef4c5963bff3574fe17e669d703b25ddd064e5d#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.task.maxFailures \| 0.8.0 \| None \| 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 \| spark.task.reaper.enabled \| 2.0.3 \| SPARK-18761 \| 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.task.reaper.killTimeout \| 2.0.3 \| SPARK-18761 \| 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.task.reaper.pollingInterval \| 2.0.3 \| SPARK-18761 \| 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.task.reaper.threadDump \| 2.0.3 \| SPARK-18761 \| 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.blacklist.enabled \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.task.maxTaskAttemptsPerExecutor \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.task.maxTaskAttemptsPerNode \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.application.maxFailedTasksPerExecutor \| 2.2.0 \| SPARK-8425 \| 93cdb8a7d0f124b4db069fd8242207c82e263c52#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.stage.maxFailedTasksPerExecutor \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.application.maxFailedExecutorsPerNode \| 2.2.0 \| SPARK-8425 \| 93cdb8a7d0f124b4db069fd8242207c82e263c52#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.stage.maxFailedExecutorsPerNode \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.timeout \| 2.1.0 \| SPARK-17675 \| 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 \| spark.blacklist.killBlacklistedExecutors \| 2.2.0 \| SPARK-16554 \| 6287c94f08200d548df5cc0a401b73b84f9968c4#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.executorTaskBlacklistTime \| 1.0.0 \| None \| ab747d39ddc7c8a314ed2fb26548fc5652af0d74#diff-bad3987c83bd22d46416d3dd9d208e76 \| spark.blacklist.application.fetchFailure.enabled \| 2.3.0 \| SPARK-13669 and SPARK-20898 \| 9e50a1d37a4cf0c34e20a7c1a910ceaff41535a2#diff-6bdad48cfc34314e89599655442ff210 \| spark.files.fetchFailure.unRegisterOutputOnHost \| 2.3.0 \| SPARK-19753 \| dccc0aa3cf957c8eceac598ac81ac82f03b52105#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.listenerbus.eventqueue.capacity \| 2.3.0 \| SPARK-20887 \| 629f38e171409da614fd635bd8dd951b7fde17a4#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.listenerbus.metrics.maxListenerClassesTimed \| 2.3.0 \| SPARK-20863 \| 2a23cdd078a7409d0bb92cf27718995766c41b1d#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.listenerbus.logSlowEvent \| 3.0.0 \| SPARK-30812 \| 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 \| spark.scheduler.listenerbus.logSlowEvent.threshold \| 3.0.0 \| SPARK-29001 \| 0346afa8fc348aa1b3f5110df747a64e3b2da388#diff-6bdad48cfc34314e89599655442ff210 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27852 from beliefer/add-version-to-core-config-part-two. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-12 09:52:20 +09:00
Wenchen Fan	0f0ccdadb1	[SPARK-31110][DOCS][SQL] refine sql doc for SELECT ### What changes were proposed in this pull request? A few improvements to the sql ref SELECT doc: 1. correct the syntax of SELECT query 2. correct the default of null sort order 3. correct the GROUP BY syntax 4. several minor fixes ### Why are the changes needed? refine document ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? N/A Closes #27866 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-11 16:52:40 -07:00
Wenchen Fan	8efb71013d	[SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/26051 and https://github.com/apache/spark/pull/26066 ### Why are the changes needed? There is no standard requiring that `size(null)` must return null, and returning -1 looks reasonable as well. This is kind of a cosmetic change and we should avoid it if it breaks existing queries. This is similar to reverting TRIM function parameter order change. ### Does this PR introduce any user-facing change? Yes, change the behavior of `size(null)` back to be the same as 2.4. ### How was this patch tested? N/A Closes #27834 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-11 09:55:24 -07:00
Yuanjian Li	3493162c78	[SPARK-31030][SQL] Backward Compatibility for Parsing and formatting Datetime ### What changes were proposed in this pull request? In Spark version 2.4 and earlier, datetime parsing, formatting and conversion are performed by using the hybrid calendar (Julian + Gregorian). Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as well as the chosen one in ANSI SQL standard, Spark 3.0 switches to it by using Java 8 API classes (the java.time packages that are based on ISO chronology ). The switching job is completed in SPARK-26651. But after the switching, there are some patterns not compatible between Java 8 and Java 7, Spark needs its own definition on the patterns rather than depends on Java API. In this PR, we achieve this by writing the document and shadow the incompatible letters. See more details in [SPARK-31030](https://issues.apache.org/jira/browse/SPARK-31030) ### Why are the changes needed? For backward compatibility. ### Does this PR introduce any user-facing change? No. After we define our own datetime parsing and formatting patterns, it's same to old Spark version. ### How was this patch tested? Existing and new added UT. Locally document test: ![image](https://user-images.githubusercontent.com/4833765/76064100-f6acc280-5fc3-11ea-9ef7-82e7dc074205.png) Closes #27830 from xuanyuanking/SPARK-31030. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-11 14:11:13 +08:00
Qianyang Yu	0f54dc7c03	[SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 ### What changes were proposed in this pull request? ### Why are the changes needed? Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), we want to add all the support `Alter Table` syntax for V1 table. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Before: The documentation looks like [Alter Table](https://github.com/apache/spark/pull/25590) After: <img width="850" alt="Screen Shot 2020-03-03 at 2 02 23 PM" src="https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png"> <img width="977" alt="Screen Shot 2020-03-03 at 2 02 41 PM" src="https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png"> <img width="1028" alt="Screen Shot 2020-03-03 at 2 02 59 PM" src="https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png"> <img width="864" alt="Screen Shot 2020-03-03 at 2 03 14 PM" src="https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png"> <img width="823" alt="Screen Shot 2020-03-03 at 2 03 28 PM" src="https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png"> <img width="811" alt="Screen Shot 2020-03-03 at 2 03 42 PM" src="https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png"> <img width="827" alt="Screen Shot 2020-03-03 at 2 03 53 PM" src="https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png"> <img width="783" alt="Screen Shot 2020-03-03 at 2 04 03 PM" src="https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png"> Notes: Those syntaxes are not supported by v1 Table. - `ALTER TABLE .. RENAME COLUMN` - `ALTER TABLE ... DROP (COLUMN \| COLUMNS)` - `ALTER TABLE ... (ALTER \| CHANGE) COLUMN? alterColumnAction` only support change comments, not other actions: `datatype, position, (SET \| DROP) NOT NULL` - `ALTER TABLE .. CHANGE COLUMN?` - `ALTER TABLE .... REPLACE COLUMNS` - `ALTER TABLE ... RECOVER PARTITIONS` - Closes #27779 from kevinyu98/spark-30962-alterT. Authored-by: Qianyang Yu <qyu@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-11 08:47:30 +09:00
beliefer	bc490f383d	[SPARK-31002][CORE][DOC] Add version information to the configuration of Core ### What changes were proposed in this pull request? Add version information to the configuration of `Core`. Note: Because `Core` has a lot of configuration items, I split the items into four PR. Other PR will follows this PR. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.resources.discoveryPlugin \| 3.0.0 \| SPARK-30689 \| 742e35f1d48c2523dda2ce21d73b7ab5ade20582#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.resourcesFile \| 3.0.0 \| SPARK-27835 \| 6748b486a9afe8370786efb64a8c9f3470c62dcf#diff-6bdad48cfc34314e89599655442ff210 \| SparkLauncher.DRIVER_EXTRA_CLASSPATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraClassPath SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraJavaOptions SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraLibraryPath spark.driver.userClassPathFirst \| 1.3.0 \| SPARK-2996 \| 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.cores \| 1.3.0 \| SPARK-1507 \| 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-4d2ab44195558d5a9d5f15b8803ef39d \| SparkLauncher.DRIVER_MEMORY \| 1.1.1 \| SPARK-3243 \| c1ffa3e4cdfbd1f84b5c8d8de5d0fb958a19e211#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.memory spark.driver.memoryOverhead \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.dfsDir \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.layout \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.persistToDfs.enabled \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.allowErasureCoding \| 3.0.0 \| SPARK-29105 \| 276aaaae8d404975f8701089e9f4dfecd16e0d9f#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.enabled \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.dir \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.compress \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.logBlockUpdates.enabled \| 2.3.0 \| SPARK-22050 \| 1437e344ec0c29a44a19f4513986f5f184c44695#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.erasureCoding.enabled \| 3.0.0 \| SPARK-25855 \| 35506dced739ef16136e9f3d5d48c638899d3cec#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.testing \| 1.0.1 \| None \| d4c8af87994acf3707027e6fab25363f51fd4615#diff-e4a5a68c15eed95d038acfed84b0b66a \| spark.eventLog.buffer.kb \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.logStageExecutorMetrics \| 3.0.0 \| SPARK-30812 \| 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.gcMetrics.youngGenerationGarbageCollectors \| 3.0.0 \| SPARK-25865 \| e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.gcMetrics.oldGenerationGarbageCollectors \| 3.0.0 \| SPARK-25865 \| e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.overwrite \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.longForm.enabled \| 2.4.0 \| SPARK-23820 \| 71f70130f1b2b4ec70595627f0a02a88e2c0e27d#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.rolling.enabled \| 3.0.0 \| SPARK-28869 \| 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.rolling.maxFileSize \| 3.0.0 \| SPARK-28869 \| 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.id \| 1.2.0 \| SPARK-3377 \| 79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-364713d7776956cb8b0a771e9b62f82d \| SparkLauncher.EXECUTOR_EXTRA_CLASSPATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraClassPath spark.executor.heartbeat.dropZeroAccumulatorUpdates \| 3.0.0 \| SPARK-25449 \| 9362c5cc273fdd09f9b3b512e2f6b64bcefc25ab#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.heartbeatInterval \| 1.1.0 \| SPARK-2099 \| 8d338f64c4eda45d22ae33f61ef7928011cc2846#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.executor.heartbeat.maxFailures \| 1.6.2 \| SPARK-13522 \| 86bf93e65481b8fe5d7532ca6d4cd29cafc9e9dd#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.executor.processTreeMetrics.enabled \| 3.0.0 \| SPARK-27324 \| 387ce89a0631f1a4c6668b90ff2a7bbcf11919cd#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.metrics.pollingInterval \| 3.0.0 \| SPARK-26329 \| 80ab19b9fd268adfc419457f12b99a5da7b6d1c7#diff-6bdad48cfc34314e89599655442ff210 \| SparkLauncher.EXECUTOR_EXTRA_JAVA_OPTIONS \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraJavaOptions SparkLauncher.EXECUTOR_EXTRA_LIBRARY_PATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraLibraryPath spark.executor.userClassPathFirst \| 1.3.0 \| SPARK-2996 \| 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-529fc5c06b9731c1fbda6f3db60b16aa \| SparkLauncher.EXECUTOR_CORES \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.cores SparkLauncher.EXECUTOR_MEMORY \| 0.7.0 \| None \| 696eec32c982ca516c506de33f383a173bcbd131#diff-4f50ad37deb6742ad45472636c9a870b \| spark.executor.memory spark.executor.memoryOverhead \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 \| spark.cores.max \| 0.6.0 \| None \| 0a472840030e4e7e84fe748f7bfa49f1ece599c5#diff-b6cc54c092b861f645c3cd69ea0f91e2 \| spark.memory.offHeap.enabled \| 1.6.0 \| SPARK-12251 \| 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.offHeap.size \| 1.6.0 \| SPARK-12251 \| 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.storageFraction \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.fraction \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.storage.safetyFraction \| 1.1.0 \| [SPARK-1777 \| ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.unrollMemoryThreshold \| 1.1.0 \| SPARK-1777 \| ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-692a329b5a7fb4134c55d559457b94e4 \| spark.storage.replication.proactive \| 2.2.0 \| SPARK-15355 \| fa7c582e9442b985a0493fb1dd15b3fb9b6031b4#diff-186864190089a718680accb51de5f0d4 \| spark.storage.memoryMapThreshold \| 0.9.2 \| SPARK-1145 \| 76339495153dd895667ad609815c887b2c8960ea#diff-abd96f2ae793cd6ea6aab5b96a3c1d7a \| spark.storage.replication.policy \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.replication.topologyMapper \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-186864190089a718680accb51de5f0d4 \| spark.storage.cachedPeersTtl \| 1.1.1 \| SPARK-3495 and SPARK-3496 \| be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.maxReplicationFailures \| 1.1.1 \| SPARK-3495 and SPARK-3496 \| be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.replication.topologyFile \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-e550ce522c12a31d805a7d0f41e802af \| spark.storage.exceptionOnPinLeak \| 1.6.2 \| SPARK-13566 \| ab006523b840b1d2dbf3f5ff0a238558e7665a1e#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.storage.blockManagerTimeoutIntervalMs \| 0.7.3 \| None \| 9085ebf3750c7d9bb7c6b5f6b4bdc5b807af93c2#diff-76170a9c8f67b542bc58240a0a12fe08 \| spark.storage.blockManagerSlaveTimeoutMs \| 0.7.0 \| None \| 97434f49b8c029e9b78c91ec5f58557cd1b5c943#diff-2ce6374aac24d70c69182b067216e684 \| spark.storage.cleanupFilesAfterExecutorExit \| 2.4.0 \| SPARK-24340 \| 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 \| spark.diskStore.subDirectories \| 0.6.0 \| None \| 815d6bd69a0c1ba0e94fc0785f5c3619b37f19c5#diff-e8b73c5b81c403a5e5d581f97624c510 \| spark.block.failures.beforeLocationRefresh \| 2.0.0 \| SPARK-13328 \| ff776b2fc1cd4c571fd542dbf807e6fa3373cb34#diff-2b643ea78c1add0381754b1f47eec132 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27847 from beliefer/add-version-to-core-config-part-one. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-08 12:31:57 +09:00
Huaxin Gao	513f76ac38	[SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release ### What changes were proposed in this pull request? Update ml-guide and ml-migration-guide for 3.0. ### Why are the changes needed? This is required for each release. ### Does this PR introduce any user-facing change? Yes. ![image](https://user-images.githubusercontent.com/13592258/75957386-c8699e80-5e6e-11ea-9dec-7295f8f0bf33.png) ![image](https://user-images.githubusercontent.com/13592258/75957406-cef81600-5e6e-11ea-921f-20509771b49b.png) ![image](https://user-images.githubusercontent.com/13592258/75957423-d4edf700-5e6e-11ea-8e75-d41c532c8ba9.png) ![image](https://user-images.githubusercontent.com/13592258/75957434-da4b4180-5e6e-11ea-899b-f4e080b318ff.png) ### How was this patch tested? Manually build and check. Closes #27785 from huaxingao/spark-30934. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 18:09:00 -06:00
Nicholas Chammas	7892f88f84	[SPARK-30879][DOCS] Refine workflow for building docs ### What changes were proposed in this pull request? This PR makes the following refinements to the workflow for building docs: * Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile. * Pin the Python and Ruby versions we use. * Pin all direct Python and Ruby dependency versions. * Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`. ### Why are the changes needed? This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code. Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: https://github.com/apache/spark/pull/27459#discussion_r376135719 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests: * I was able to build the Docker image successfully, minus the final part about `RUN useradd`. * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key. * I built the docs locally and viewed them in the browser. I think I need a committer to more fully test out these changes. Closes #27534 from nchammas/SPARK-30731-building-docs. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 11:43:32 -06:00
Huaxin Gao	4a64901ab7	[SPARK-31012][ML][PYSPARK][DOCS] Updating ML API docs for 3.0 changes ### What changes were proposed in this pull request? Updating ML docs for 3.0 changes ### Why are the changes needed? I am auditing 3.0 ML changes, found some docs are missing or not updated. Need to update these. ### Does this PR introduce any user-facing change? Yes, doc changes ### How was this patch tested? Manually build and check Closes #27762 from huaxingao/spark-doc. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 11:42:05 -06:00
Takeshi Yamamuro	71c73d58f6	[SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID ### What changes were proposed in this pull request? This pr intends to support 32 or more grouping attributes for GROUPING_ID. In the current master, an integer overflow can occur to compute grouping IDs; `e75d9afb2f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (L613)` For example, the query below generates wrong grouping IDs in the master; ``` scala> val numCols = 32 // or, 31 scala> val cols = (0 until numCols).map { i => s"c$i" } scala> sql(s"create table test_$numCols (${cols.map(c => s"$c int").mkString(",")}, v int) using parquet") scala> val insertVals = (0 until numCols).map { _ => 1 }.mkString(",") scala> sql(s"insert into test_$numCols values ($insertVals,3)") scala> sql(s"select grouping_id(), sum(v) from test_$numCols group by grouping sets ((${cols.mkString(",")}), (${cols.init.mkString(",")}))").show(10, false) scala> sql(s"drop table test_$numCols") // numCols = 32 +-------------+------+ \|grouping_id()\|sum(v)\| +-------------+------+ \|0 \|3 \| \|0 \|3 \| // Wrong Grouping ID +-------------+------+ // numCols = 31 +-------------+------+ \|grouping_id()\|sum(v)\| +-------------+------+ \|0 \|3 \| \|1 \|3 \| +-------------+------+ ``` To fix this issue, this pr change code to use long values for `GROUPING_ID` instead of int values. ### Why are the changes needed? To support more cases in `GROUPING_ID`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added unit tests. Closes #26918 from maropu/FixGroupingIdIssue. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-06 16:57:03 +09:00
beliefer	e36227e2d9	[SPARK-30914][CORE][DOC] Add version information to the configuration of UI ### What changes were proposed in this pull request? 1.Add version information to the configuration of `UI`. 2.Update the docs of `UI`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.ui.showConsoleProgress \| 1.2.1 \| SPARK-4017 \| 04b1bdbae31c3039125100e703121daf7d9dabf5#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.consoleProgress.update.interval \| 2.1.0 \| SPARK-16919 \| e076fb05ac83a3ed6995e29bb03ea07ea05e39db#diff-fbf4e388a66b6a37e984b91cd71a3e2c \| spark.ui.enabled \| 1.1.1 \| SPARK-3490 \| 937de93e80e6d299c4d08be426da2d5bc2d66f98#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.port \| 0.7.0 \| None \| f03d9760fd8ac67fd0865cb355ba75d2eff507fe#diff-ed8dbcebe16fda5ecd6df1a981dc6fee \| spark.ui.filters \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-f79a5ead735b3d0b34b6b94486918e1c \| spark.ui.allowFramingFrom \| 1.6.0 \| SPARK-10589 \| 5dbaf3d3911bbfa003bc75459aaad66b4f6e0c67#diff-f79a5ead735b3d0b34b6b94486918e1c \| spark.ui.reverseProxy \| 2.1.0 \| SPARK-15487 \| 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.reverseProxyUrl \| 2.1.0 \| SPARK-15487 \| 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.killEnabled \| 1.0.0 \| SPARK-1202 \| 211f97447b5f078afcb1619a08d2e2349325f61a#diff-a40023c80383451b6e29ee7a6e0593e9 \| spark.ui.threadDumpsEnabled \| 1.2.0 \| SPARK-611 \| 866c7bbe56f9c7fd96d3f4afe8a76405dc877a6e#diff-5d18fb70c572369a0fff0b97de94f265 \| spark.ui.prometheus.enabled \| 3.0.0 \| SPARK-29064 \| bbfaadb280a80b511a98d18881641c6d9851dd51#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| spark.ui.xXssProtection \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.xContentTypeOptions.enabled \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.strictTransportSecurity \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.requestHeaderSize \| 2.2.3 \| SPARK-26118 \| 9ceee6f188e6c3794d31ce15cc61d29f907bebf7#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.timeline.tasks.maximum \| 1.4.0 \| SPARK-7296 \| a5f7b3b9c7f05598a1cc8e582e5facee1029cd5e#diff-fa4cfb2cce1b925f55f41f2dfa8c8501 \| spark.acls.enable \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.user.groups.mapping \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.proxyRedirectUri \| 3.0.0 \| SPARK-30240 \| a9fbd310300e57ed58818d7347f3c3172701c491#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| spark.ui.custom.executor.log.url \| 3.0.0 \| SPARK-26792 \| d5bda2c9e8dde6afc075cc7f65b15fa9aa82231c#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27806 from beliefer/add-version-to-UI-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 11:08:57 +09:00
Takeshi Yamamuro	ffec7a1964	[SQL][DOCS][MINOR] Fix typos and wrong phrases in docs ### What changes were proposed in this pull request? This PR intends to fix typos and phrases in the `/docs` directory. To find them, I run the Intellij typo checker. ### Why are the changes needed? For better documents. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #27819 from maropu/TypoFix-20200306. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-03-05 16:54:59 -08:00
Wenchen Fan	807ea413b4	[SPARK-31019][SQL] make it clear that people can deduplicate map keys ### What changes were proposed in this pull request? rename the config and make it non-internal. ### Why are the changes needed? Now we fail the query if duplicated map keys are detected, and provide a legacy config to deduplicate it. However, we must provide a way to get users out of this situation, instead of just rejecting to run the query. This exit strategy should always be there, while legacy config indicates that it may be removed someday. ### Does this PR introduce any user-facing change? no, just rename a config which was added in 3.0 ### How was this patch tested? add more tests for the fail behavior. Closes #27772 from cloud-fan/map. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-05 20:43:52 +09:00
Kent Yao	3edab6cc1d	[MINOR][CORE] Expose the alias -c flag of --conf for spark-submit ### What changes were proposed in this pull request? -c is short for --conf, it was introduced since v1.1.0 but hidden from users until now ### Why are the changes needed? ### Does this PR introduce any user-facing change? no expose hidden feature ### How was this patch tested? Nah Closes #27802 from yaooqinn/conf. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-03-04 20:37:51 -08:00
beliefer	ebcff675e0	[SPARK-30889][SPARK-30913][CORE][DOC] Add version information to the configuration of Tests.scala and Worker ### What changes were proposed in this pull request? 1.Add version information to the configuration of `Tests` and `Worker`. 2.Update the docs of `Worker`. I sorted out some information of `Tests` show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.testing.memory \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-395d07dcd46359cca610ce74357f0bb4 \| spark.testing.dynamicAllocation.scheduleInterval \| 2.3.0 \| SPARK-22864 \| 4e9e6aee44bb2ddb41b567d659358b22fd824222#diff-b096353602813e47074ace09a3890d56 \| spark.testing \| 1.0.1 \| SPARK-1606 \| ce57624b8232159fe3ec6db228afc622133df591#diff-d239aee594001f8391676e1047a0381e \| spark.test.noStageRetry \| 1.2.0 \| SPARK-3796 \| f55218aeb1e9d638df6229b36a59a15ce5363482#diff-6a9ff7fb74fd490a50462d45db2d5e11 \| spark.testing.reservedMemory \| 1.6.0 \| SPARK-12081 \| 84c44b500b5c90dffbe1a6b0aa86f01699b09b96#diff-395d07dcd46359cca610ce74357f0bb4 \| spark.testing.nHosts \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.nExecutorsPerHost \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.nCoresPerExecutor \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.resources.warnings.testing \| 3.1.0 \| SPARK-29148 \| 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.resourceProfileManager \| 3.1.0 \| SPARK-29148 \| 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| I sorted out some information of `Worker` show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.worker.resourcesFile \| 3.0.0 \| SPARK-27369 \| 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da \| spark.worker.timeout \| 0.6.2 \| None \| e395aa295aeec6767df798bf1002b1f30983c1cd#diff-776a630ac2b2ec5fe85c07ca20a58fc0 \| spark.worker.driverTerminateTimeout \| 2.1.2 \| SPARK-20843 \| ebd72f453aa0b4f68760d28b3e93e6dd33856659#diff-829a8674171f92acd61007bedb1bfa4f \| spark.worker.cleanup.enabled \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.cleanup.interval \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.cleanup.appDataTtl \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.preferConfiguredMasterAddress \| 2.2.1 \| SPARK-20529 \| 75e5ea294c15ecfb7366ae15dce196aa92c87ca4#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.port \| 1.1.0 \| SPARK-2857 \| 12f99cf5f88faf94d9dbfe85cb72d0010a3a25ac#diff-48ca297b6536cb92362bec1487581f05 \| spark.worker.ui.retainedExecutors \| 1.5.0 \| SPARK-9202 \| c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.retainedDrivers \| 1.5.0 \| SPARK-9202 \| c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.compressedLogFileLengthCacheSize \| 2.0.2 \| SPARK-17711 \| 26e978a93f029e1a1b5c7524d0b52c8141b70997#diff-d239aee594001f8391676e1047a0381e \| spark.worker.decommission.enabled \| 3.1.0 \| SPARK-20628 \| d273a2bb0fac452a97f5670edd69d3e452e3e57e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27783 from beliefer/add-version-to-tests-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-05 11:58:21 +09:00
Yuanjian Li	f7f1948a8c	[SPARK-30289][FOLLOWUP][DOC] Update the migration guide for `spark.sql.legacy.ctePrecedencePolicy` ### What changes were proposed in this pull request? Fix the migration guide document for `spark.sql.legacy.ctePrecedence.enabled`, which is introduced in #27579. ### Why are the changes needed? The config value changed. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Document only. Closes #27782 from xuanyuanking/SPARK-30829-follow. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-04 13:56:02 +09:00
roland-ondeviceresearch	a4aaee01fa	[MINOR][DOCS] ForeachBatch java example fix ### What changes were proposed in this pull request? ForEachBatch Java example was incorrect ### Why are the changes needed? Example did not compile ### Does this PR introduce any user-facing change? Yes, to docs. ### How was this patch tested? In IDE. Closes #27740 from roland1982/foreachwriter_java_example_fix. Authored-by: roland-ondeviceresearch <roland@ondeviceresearch.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-03 09:24:33 -06:00
yi.wu	b517f991fe	[SPARK-30969][CORE] Remove resource coordination support from Standalone ### What changes were proposed in this pull request? Remove automatically resource coordination support from Standalone. ### Why are the changes needed? Resource coordination is mainly designed for the scenario where multiple workers launched on the same host. However, it's, actually, a non-existed scenario for today's Spark. Because, Spark now can start multiple executors in a single Worker, while it only allow one executor per Worker at very beginning. So, now, it really help nothing for user to launch multiple workers on the same host. Thus, it's not worth for us to bring over complicated implementation and potential high maintain cost for such an impossible scenario. ### Does this PR introduce any user-facing change? No, it's Spark 3.0 feature. ### How was this patch tested? Pass Jenkins. Closes #27722 from Ngone51/abandon_coordination. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2020-03-02 11:23:07 -08:00
beliefer	c63366a693	[SPARK-30891][CORE][DOC] Add version information to the configuration of History ### What changes were proposed in this pull request? 1.Add version information to the configuration of `History`. 2.Update the docs of `History`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.history.fs.logDirectory \| 1.1.0 \| SPARK-1768 \| 21ddd7d1e9f8e2a726427f32422c31706a20ba3f#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.safemodeCheck.interval \| 1.6.0 \| SPARK-11020 \| cf04fdfe71abc395163a625cc1f99ec5e54cc07e#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.update.interval \| 1.4.0 \| SPARK-6046 \| 4527761bcd6501c362baf2780905a0018b9a74ba#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.cleaner.enabled \| 1.3.0 \| SPARK-3562 \| 8942b522d8a3269a2a357e3a274ed4b3e66ebdde#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| Branch branch-1.3 does not exist, exists in branch-1.4, but it is 1.3.0-SNAPSHOT in pom.xml spark.history.fs.cleaner.interval \| 1.4.0 \| SPARK-5933 \| 1991337336596f94698e79c2366f065c374128ab#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.cleaner.maxAge \| 1.4.0 \| SPARK-5933 \| 1991337336596f94698e79c2366f065c374128ab#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.cleaner.maxNum \| 3.0.0 \| SPARK-28294 \| bbc2be4f425c4c26450e1bf21db407e81046ce21#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.store.path \| 2.3.0 \| SPARK-20642 \| 74daf622de4e534d5a5929b424a6e836850eefad#diff-19f35f981fdc5b0a46f070b879a9a9fc \| spark.history.store.maxDiskUsage \| 2.3.0 \| SPARK-20654 \| 8b497046c647a21bbed1bdfbdcb176745a1d5cd5#diff-19f35f981fdc5b0a46f070b879a9a9fc \| spark.history.ui.port \| 1.0.0 \| SPARK-1276 \| 9ae80bf9bd3e4da7443af97b41fe26aa5d35d70b#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.fs.inProgressOptimization.enabled \| 2.4.0 \| SPARK-6951 \| 653fe02415a537299e15f92b56045569864b6183#diff-19f35f981fdc5b0a46f070b879a9a9fc \| spark.history.fs.endEventReparseChunkSize \| 2.4.0 \| SPARK-6951 \| 653fe02415a537299e15f92b56045569864b6183#diff-19f35f981fdc5b0a46f070b879a9a9fc \| spark.history.fs.eventLog.rolling.maxFilesToRetain \| 3.0.0 \| SPARK-30481 \| a2fe73b83c0e7c61d1c83b236565a71e3d005a71#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.fs.eventLog.rolling.compaction.score.threshold \| 3.0.0 \| SPARK-30481 \| a2fe73b83c0e7c61d1c83b236565a71e3d005a71#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.fs.driverlog.cleaner.enabled \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.fs.driverlog.cleaner.interval \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.fs.driverlog.cleaner.maxAge \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.ui.acls.enable \| 1.0.1 \| Spark 1489 \| c8dd13221215275948b1a6913192d40e0c8cbadd#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.ui.admin.acls \| 2.1.1 \| SPARK-19033 \| 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.ui.admin.acls.groups \| 2.1.1 \| SPARK-19033 \| 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.fs.numReplayThreads \| 2.0.0 \| SPARK-13988 \| 6fdd0e32a6c3fdce1f3f7e1f8d252af05c419f7b#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.retainedApplications \| 1.0.0 \| SPARK-1276 \| 9ae80bf9bd3e4da7443af97b41fe26aa5d35d70b#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.provider \| 1.1.0 \| SPARK-1768 \| 21ddd7d1e9f8e2a726427f32422c31706a20ba3f#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e \| spark.history.kerberos.enabled \| 1.0.1 \| Spark-1490 \| 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.kerberos.principal \| 1.0.1 \| Spark-1490 \| 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.kerberos.keytab \| 1.0.1 \| Spark-1490 \| 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 \| spark.history.custom.executor.log.url \| 3.0.0 \| SPARK-26311 \| ae5b2a6a92be4986ef5b8062d7fb59318cff6430#diff-6bddeb5e25239974fc13db66266b167b \| spark.history.custom.executor.log.url.applyIncompleteApplication \| 3.0.0 \| SPARK-26311 \| ae5b2a6a92be4986ef5b8062d7fb59318cff6430#diff-6bddeb5e25239974fc13db66266b167b \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27751 from beliefer/add-version-to-history-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-02 15:15:49 +09:00
beliefer	3beb4f875d	[SPARK-30908][CORE][DOC] Add version information to the configuration of Kryo ### What changes were proposed in this pull request? 1.Add version information to the configuration of `Kryo`. 2.Update the docs of `Kryo`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.kryo.registrationRequired \| 1.1.0 \| SPARK-2102 \| efdaeb111917dd0314f1d00ee8524bed1e2e21ca#diff-1f81c62dad0e2dfc387a974bb08c497c \| spark.kryo.registrator \| 0.5.0 \| None \| 91c07a33d90ab0357e8713507134ecef5c14e28a#diff-792ed56b3398163fa14e8578549d0d98 \| This is not a release version, do we need to record it? spark.kryo.classesToRegister \| 1.2.0 \| SPARK-1813 \| 6bb56faea8d238ea22c2de33db93b1b39f492b3a#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.kryo.unsafe \| 2.1.0 \| SPARK-928 \| bc167a2a53f5a795d089e8a884569b1b3e2cd439#diff-1f81c62dad0e2dfc387a974bb08c497c \| spark.kryo.pool \| 3.0.0 \| SPARK-26466 \| 38f030725c561979ca98b2a6cc7ca6c02a1f80ed#diff-a3c6b992784f9abeb9f3047d3dcf3ed9 \| spark.kryo.referenceTracking \| 0.8.0 \| None \| 0a8cc309211c62f8824d76618705c817edcf2424#diff-1f81c62dad0e2dfc387a974bb08c497c \| spark.kryoserializer.buffer \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-1f81c62dad0e2dfc387a974bb08c497c \| spark.kryoserializer.buffer.max \| 1.4.0 \| SPARK-5932 \| 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-1f81c62dad0e2dfc387a974bb08c497c \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27734 from beliefer/add-version-to-kryo-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-02 15:14:47 +09:00

... 4 5 6 7 8 ...

3267 commits