Commit graph

2907 commits

Author SHA1 Message Date
Yuming Wang b11e42663b
[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7
### What changes were proposed in this pull request?

**Hive 2.3.7** fixed these issues:
- HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
- HIVE-21980:Parsing time can be high in case of deeply nested subqueries
- HIVE-22249: Support Parquet through HCatalog

### Why are the changes needed?
Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245).

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?

- [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840)
- [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353)
- [x] Manual test with remote hive metastore.

Hive side:

```
export JAVA_HOME=/usr/lib/jdk1.8.0_221
export PATH=$JAVA_HOME/bin:$PATH
cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6
bin/schematool -dbType derby -initSchema --verbose
bin/hive --service metastore
```

Spark side:

```
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=$JAVA_HOME/bin:$PATH
build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083
```

Closes #28148 from wangyum/SPARK-31381.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-20 13:38:24 -07:00
gatorsmile 6c792a79c1 [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration
### What changes were proposed in this pull request?
This PR is the follow-up PR of https://github.com/apache/spark/pull/28003

- add a migration guide
- add an end-to-end test case.

### Why are the changes needed?
The original PR made the major behavior change in the user-facing RESET command.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added a new end-to-end test

Closes #28265 from gatorsmile/spark-31234followup.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2020-04-20 13:08:55 -07:00
Huaxin Gao 142f43629c [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section
### What changes were proposed in this pull request?
Document Window Function in SQL syntax

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1050" alt="Screen Shot 2020-04-16 at 9 13 34 PM" src="https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 14 12 PM" src="https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 14 45 PM" src="https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 15 10 PM" src="https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 15 25 PM" src="https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png">

### How was this patch tested?
Manually build and check

Closes #28220 from huaxingao/sql-win-fun.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-18 09:31:52 +09:00
Dongjoon Hyun fde996be87
[SPARK-31394][DOC][FOLLOWUP] Add nfs volume type description
### What changes were proposed in this pull request?

This adds newly supported `nfs` volume type description into the document for Apache Spark 3.1.0.

### Why are the changes needed?

To complete the document.

### Does this PR introduce any user-facing change?

Yes. (Doc)

![nfs_screen_shot](https://user-images.githubusercontent.com/9700541/79530887-8f077f80-8025-11ea-8cc1-e0b551802d5d.png)

### How was this patch tested?

Manually generate doc and check it.
```
SKIP_API=1 jekyll build
```

Closes #28236 from dongjoon-hyun/SPARK-NFS-DOC.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-17 12:07:34 -07:00
Huaxin Gao 92c1b24617 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference
### What changes were proposed in this pull request?
Document Common Table Expression in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1050" alt="Screen Shot 2020-04-13 at 12 06 35 AM" src="https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png">

<img width="1050" alt="Screen Shot 2020-04-13 at 12 07 09 AM" src="https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png">

<img width="1050" alt="Screen Shot 2020-04-13 at 12 07 35 AM" src="https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png">

Also link to Select page

<img width="1045" alt="Screen Shot 2020-04-12 at 4 14 30 PM" src="https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png">

### How was this patch tested?
Manually build and check

Closes #28196 from huaxingao/cte.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-16 08:34:26 +09:00
yi.wu 0d4e4df061 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone
### What changes were proposed in this pull request?

Update the document and shell script to warn user about the deprecation of multiple workers on the same host support.

### Why are the changes needed?

This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0.

### Does this PR introduce any user-facing change?

Yeah, user see warning when they run start worker script.

### How was this patch tested?

Tested manually.

Closes #27768 from Ngone51/deprecate_spark_worker_instances.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2020-04-15 11:29:55 -07:00
Huaxin Gao 46be1e01e9 [SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF
### What changes were proposed in this pull request?
Add a SQL example for UDAF

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes.
Add the following page, also change ```Sql``` to ```SQL``` in the example tab for all the sql examples.
<img width="1110" alt="Screen Shot 2020-04-13 at 6 09 24 PM" src="https://user-images.githubusercontent.com/13592258/79175240-06cd7400-7db2-11ea-8f3e-af71a591a64b.png">

### How was this patch tested?
Manually build and check

Closes #28209 from huaxingao/udf_followup.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-14 13:29:44 +09:00
Takeshi Yamamuro 853c6c9909 [SPARK-31434][SQL][DOCS] Drop builtin function pages from SQL references
### What changes were proposed in this pull request?

This PR intends to drop the built-in function pages from SQL references. We've already had a complete list of built-in functions in the API documents.

See related discussions for more details:
https://github.com/apache/spark/pull/28170#issuecomment-611917191

### Why are the changes needed?

For better SQL documents.

### Does this PR introduce any user-facing change?

![functions](https://user-images.githubusercontent.com/692303/79109009-793e5400-7db2-11ea-8cb7-4c3cf31ccb77.png)

### How was this patch tested?

Manually checked.

Closes #28203 from maropu/DropBuiltinFunctionDocs.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-14 10:22:46 +09:00
Takeshi Yamamuro 179289f0bf [SPARK-31383][SQL][DOC] Clean up the SQL documents in docs/sql-ref*
### What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;

 - Fixes wrong syntaxes and capitalize sub-titles
 - Adds some DDL queries in `Examples` so that users can run examples there
 - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
 - Adds/Removes spaces, Indents, or blank lines to follow the format below;

```
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>

  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...
```

### Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Manually checked.

Closes #28151 from maropu/MakeRightAligned.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 23:40:36 -05:00
Huaxin Gao 310bef1ac7 [SPARK-31419][SQL][DOCS] Document Table-valued Function and Inline Table
### What changes were proposed in this pull request?
Document Table-valued Function and Inline Table

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1050" alt="Screen Shot 2020-04-11 at 5 34 25 PM" src="https://user-images.githubusercontent.com/13592258/79057852-cedff880-7c1a-11ea-9e1e-7882594ab573.png">

<img width="1050" alt="Screen Shot 2020-04-11 at 5 34 46 PM" src="https://user-images.githubusercontent.com/13592258/79057854-d4d5d980-7c1a-11ea-94cc-92ef1121fa43.png">

<img width="1050" alt="Screen Shot 2020-04-10 at 7 36 00 PM" src="https://user-images.githubusercontent.com/13592258/79033391-c2986480-7b62-11ea-9d0a-6c60de823256.png">

<img width="1051" alt="Screen Shot 2020-04-10 at 7 36 21 PM" src="https://user-images.githubusercontent.com/13592258/79033392-c5935500-7b62-11ea-88d4-e7d7812a7add.png">

<img width="1051" alt="Screen Shot 2020-04-11 at 5 09 48 PM" src="https://user-images.githubusercontent.com/13592258/79057555-6ba09700-7c17-11ea-9683-16bbde63a529.png">

Also, linked the newly added pages to select statement

<img width="1050" alt="Screen Shot 2020-04-10 at 3 27 59 PM" src="https://user-images.githubusercontent.com/13592258/79027245-5147ba00-7b40-11ea-9b10-527fd9639958.png">

### How was this patch tested?
Manually build and check

Closes #28185 from huaxingao/tvf.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 23:39:27 -05:00
Huaxin Gao 3bbd80dbc3 [SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference
### What changes were proposed in this pull request?
Document UDF in SQL Reference

### Why are the changes needed?
To make SQL Reference complete.

### Does this PR introduce any user-facing change?
Yes. Here are the new pages:
<img width="1050" alt="Screen Shot 2020-04-09 at 5 06 42 PM" src="https://user-images.githubusercontent.com/13592258/78950977-585dc200-7a85-11ea-875c-ce14c3795e0f.png">

<img width="1049" alt="Screen Shot 2020-04-09 at 5 07 06 PM" src="https://user-images.githubusercontent.com/13592258/78950979-5b58b280-7a85-11ea-81f3-bd5d91bd07e3.png">

<img width="1049" alt="Screen Shot 2020-04-09 at 5 07 26 PM" src="https://user-images.githubusercontent.com/13592258/78950985-5e53a300-7a85-11ea-86be-f63152c1501b.png">

<img width="1051" alt="Screen Shot 2020-04-09 at 5 07 54 PM" src="https://user-images.githubusercontent.com/13592258/78950991-63185700-7a85-11ea-9379-8da46cfc434c.png">

<img width="1060" alt="Screen Shot 2020-04-09 at 5 08 17 PM" src="https://user-images.githubusercontent.com/13592258/78950994-657ab100-7a85-11ea-8b34-d2c87f94b03b.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 09 27 PM" src="https://user-images.githubusercontent.com/13592258/78951001-6875a180-7a85-11ea-874e-8abd14a3d3d3.png">

<img width="1060" alt="Screen Shot 2020-04-09 at 5 10 00 PM" src="https://user-images.githubusercontent.com/13592258/78951005-6f041900-7a85-11ea-9e57-520eb8db59ec.png">

<img width="1049" alt="Screen Shot 2020-04-09 at 5 11 10 PM" src="https://user-images.githubusercontent.com/13592258/78951014-73303680-7a85-11ea-93ab-32d68d2e2d59.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 11 41 PM" src="https://user-images.githubusercontent.com/13592258/78951019-75929080-7a85-11ea-9d3b-600e8e157c05.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 16 22 PM" src="https://user-images.githubusercontent.com/13592258/78951137-dfab3580-7a85-11ea-8512-c6b660aa271e.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 22 15 PM" src="https://user-images.githubusercontent.com/13592258/78951466-22214200-7a87-11ea-93dd-6e36492421f1.png">

<img width="1049" alt="Screen Shot 2020-04-09 at 5 22 46 PM" src="https://user-images.githubusercontent.com/13592258/78951469-24839c00-7a87-11ea-93a9-fe30d689adbd.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 23 08 PM" src="https://user-images.githubusercontent.com/13592258/78951472-26e5f600-7a87-11ea-84db-087a3528aa53.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 23 34 PM" src="https://user-images.githubusercontent.com/13592258/78951474-29e0e680-7a87-11ea-8be4-2a5be1bc3788.png">

<img width="1049" alt="Screen Shot 2020-04-09 at 5 23 57 PM" src="https://user-images.githubusercontent.com/13592258/78951481-2cdbd700-7a87-11ea-8894-0a39abf54a3b.png">

<img width="1050" alt="Screen Shot 2020-04-09 at 5 24 15 PM" src="https://user-images.githubusercontent.com/13592258/78951483-2f3e3100-7a87-11ea-8845-ffebf89d7898.png">

### How was this patch tested?
Manually build and check

Closes #28087 from huaxingao/udf.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 23:38:17 -05:00
Huaxin Gao fda910d4e2 [SPARK-31348][SQL][DOCS] Document Join in SQL Reference
### What changes were proposed in this pull request?
Document join in SQL Reference.

### Why are the changes needed?
To make SQL Reference complete.

### Does this PR introduce any user-facing change?
Yes
<img width="1050" alt="Screen Shot 2020-04-05 at 8 46 47 PM" src="https://user-images.githubusercontent.com/13592258/78521722-ab7efe80-777f-11ea-90f5-1fac09282721.png">

<img width="1049" alt="Screen Shot 2020-04-05 at 8 47 20 PM" src="https://user-images.githubusercontent.com/13592258/78521724-ade15880-777f-11ea-9238-183d999ed918.png">

<img width="1049" alt="Screen Shot 2020-04-05 at 8 47 41 PM" src="https://user-images.githubusercontent.com/13592258/78521726-b043b280-777f-11ea-996f-a8e86d453c01.png">

<img width="1049" alt="Screen Shot 2020-04-05 at 8 48 11 PM" src="https://user-images.githubusercontent.com/13592258/78521731-b3d73980-777f-11ea-85c8-c24798ef41ac.png">

<img width="1049" alt="Screen Shot 2020-04-05 at 8 48 33 PM" src="https://user-images.githubusercontent.com/13592258/78521734-b5a0fd00-777f-11ea-8b2c-96af30f3bf49.png">

### How was this patch tested?
Manually build and check.

Closes #28121 from huaxingao/join.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-12 13:57:54 -05:00
Huaxin Gao f69b0ef25d [SPARK-31355][SQL][DOCS] Document TABLESAMPLE in SQL Reference
### What changes were proposed in this pull request?
Document TABLESAMPLE in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1049" alt="Screen Shot 2020-04-06 at 10 23 52 PM" src="https://user-images.githubusercontent.com/13592258/78633123-96749f00-7855-11ea-9509-b7ee21da7fbd.png">

<img width="1050" alt="Screen Shot 2020-04-06 at 10 24 26 PM" src="https://user-images.githubusercontent.com/13592258/78633130-98d6f900-7855-11ea-8675-fd4b6163dfb6.png">

### How was this patch tested?
Manually build and check.

Closes #28130 from huaxingao/sampling.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-09 19:39:34 -05:00
zero323 697fe911ac [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
### What changes were proposed in this pull request?

This pull request adds SparkR wrapper for `FMRegressor`:

- Supporting ` org.apache.spark.ml.r.FMRegressorWrapper`.
- `FMRegressionModel` S4 class.
- Corresponding `spark.fmRegressor`, `predict`, `summary` and `write.ml` generics.
- Corresponding docs and tests.

### Why are the changes needed?

Feature parity.

### Does this PR introduce any user-facing change?

No (new API).

### How was this patch tested?

New unit tests.

Closes #27571 from zero323/SPARK-30819.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-09 19:38:11 -05:00
Huaxin Gao 61f903fa7a [SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs
### What changes were proposed in this pull request?
Document Spark integration with Hive UDFs/UDAFs/UDTFs

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png">

### How was this patch tested?
Manually build and check

Closes #28104 from huaxingao/hive-udfs.

Lead-authored-by: Huaxin Gao <huaxing@us.ibm.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-09 13:28:01 -05:00
HyukjinKwon c279e6b091 [SPARK-30722][DOCS][FOLLOW-UP] Explicitly mention the same entire input/output length restriction of Series Iterator UDF
### What changes were proposed in this pull request?

This PR explicitly mention that the requirement of Iterator of Series to Iterator of Series and Iterator of Multiple Series to Iterator of Series (previously Scalar Iterator pandas UDF).

The actual limitation of this UDF is the same length of the _entire input and output_, instead of each series's length. Namely you can do something as below:

```python
from typing import Iterator, Tuple
import pandas as pd
from pyspark.sql.functions import pandas_udf

pandas_udf("long")
def func(
        iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
    return iter([pd.concat(iterator)])

spark.range(100).select(func("id")).show()
```

This characteristic allows you to prefetch the data from the iterator to speed up, compared to the regular Scalar to Scalar (previously Scalar pandas UDF).

### Why are the changes needed?

To document the correct restriction and characteristics of a feature.

### Does this PR introduce any user-facing change?

Yes in the documentation but only in unreleased branches.

### How was this patch tested?

Github Actions should test the documentation build

Closes #28160 from HyukjinKwon/SPARK-30722-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-09 16:46:27 +09:00
Gengliang Wang d89fcc64db [SPARK-31333][FOLLOWUP][DOC] Link Join Hints doc in SQL perf tuning guide
### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/28113.
There is also a brief section about Join hints in SQL perf tuning guide: https://spark.apache.org/docs/latest/sql-performance-tuning.html . We should link the new Join hint doc in it.

### Why are the changes needed?

So that users can read more examples.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manually build the doc and check it:
![image](https://user-images.githubusercontent.com/1097932/78860030-f7cb7800-79e5-11ea-8573-c0587d43a7dc.png)

Closes #28161 from gengliangwang/joinHintFollowUp.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-09 15:03:08 +09:00
zero323 0063462d55 [SPARK-30818][SPARKR][ML] Add SparkR LinearRegression wrapper
### What changes were proposed in this pull request?

This pull request adds SparkR wrapper for `LinearRegression`

- Supporting `org.apache.spark.ml.rLinearRegressionWrapper`.
- `LinearRegressionModel` S4 class.
- Corresponding `spark.lm` predict, summary and write.ml generics.
- Corresponding docs and tests.

### Why are the changes needed?

Feature parity.

### Does this PR introduce any user-facing change?

No (new API).

### How was this patch tested?

New unit tests.

Closes #27593 from zero323/SPARK-30818.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-08 22:29:44 -05:00
Huaxin Gao 5dc9b9c7c1 [SPARK-31362][SQL][DOCS] Document Set Operators in SQL Reference
### What changes were proposed in this pull request?
Document Set Operators in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1050" alt="Screen Shot 2020-04-07 at 9 20 05 AM" src="https://user-images.githubusercontent.com/13592258/78694605-c6ea2680-78b1-11ea-8590-afb43dbe5933.png">

<img width="1050" alt="Screen Shot 2020-04-07 at 9 20 41 AM" src="https://user-images.githubusercontent.com/13592258/78694613-c8b3ea00-78b1-11ea-89b9-d6cd71ee86a0.png">

<img width="1050" alt="Screen Shot 2020-04-07 at 9 21 29 AM" src="https://user-images.githubusercontent.com/13592258/78694622-ca7dad80-78b1-11ea-9acf-7611ee57d4f2.png">

<img width="1050" alt="Screen Shot 2020-04-07 at 9 21 54 AM" src="https://user-images.githubusercontent.com/13592258/78694626-cc477100-78b1-11ea-82f8-4deaf0048de7.png">

### How was this patch tested?
Manually build and check

Closes #28139 from huaxingao/set-operators.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-08 10:51:04 -05:00
gatorsmile a3d83948b8 [SPARK-31351][DOC] Migration Guide Auditing for Spark 3.0 Release
### What changes were proposed in this pull request?
This PR is to audit the migration guides in Spark 3.0 release:

- correct the grammar errors
- clean up some items
- replace HTML table by markdown table

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Screenshot:

![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-04-04-21_36_29](https://user-images.githubusercontent.com/11567269/78467043-9477d800-76bd-11ea-8ab0-3d51ea5e9fa5.png)
![Screen Shot 2020-04-04 at 9 28 13 PM](https://user-images.githubusercontent.com/11567269/78467045-98a3f580-76bd-11ea-9e4b-927bf12e683a.png)
![Screen Shot 2020-04-04 at 9 28 02 PM](https://user-images.githubusercontent.com/11567269/78467046-98a3f580-76bd-11ea-8ea3-9f13cb8d200b.png)
![Screen Shot 2020-04-04 at 9 21 40 PM](https://user-images.githubusercontent.com/11567269/78467047-993c8c00-76bd-11ea-8c29-91afc68eb590.png)

Closes #28125 from gatorsmile/updateMigrationGuide3.0.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-08 12:27:40 +09:00
beliefer 0fc859b4d5 [SPARK-31269][DOC][FOLLOWUP][MINOR] Add version head of GraphX table
### What changes were proposed in this pull request?
HyukjinKwon have ported back all the PR about version to branch-3.0.
I make a double check and found GraphX table lost version head.
This PR will fix the issue.
HyukjinKwon, please help me merge this PR to master and branch-3.0

### Why are the changes needed?
Add version head of GraphX table

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #28149 from beliefer/fix-head-of-graphx-table.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-08 12:25:06 +09:00
Eric Wu a28ed86a38
[SPARK-31113][SQL] Add SHOW VIEWS command
### What changes were proposed in this pull request?
Previously, user can issue `SHOW TABLES` to get info of both tables and views.
This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)

**Hive** -- Only show view names
```
hive> SHOW VIEWS;
OK
view_1
view_2
...
```

**Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications
```
SHOW VIEWS IN showdb;
view_1
view_2
...
```

**Spark** -- Show more information database/viewName/isTemporary
```
spark-sql> SHOW VIEWS;
userdb	view_1	false
userdb	view_2	false
...
```

### Why are the changes needed?
`SHOW VIEWS` command provides better granularity to only get information of views.

### Does this PR introduce any user-facing change?
Add new `SHOW VIEWS` SQL command

### How was this patch tested?
Add new test `show-views.sql` and pass existing tests

Closes #27897 from Eric5553/ShowViews.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-07 09:25:01 -07:00
zero323 0d37f794ef [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
### What changes were proposed in this pull request?

This pull request adds SparkR wrapper for `FMClassifier`:

- Supporting ` org.apache.spark.ml.r.FMClassifierWrapper`.
- `FMClassificationModel` S4 class.
- Corresponding `spark.fmClassifier`, `predict`, `summary` and `write.ml` generics.
- Corresponding docs and tests.

### Why are the changes needed?

Feature parity.

### Does this PR introduce any user-facing change?

No (new API).

### How was this patch tested?

New unit tests.

Closes #27570 from zero323/SPARK-30820.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-07 09:01:45 -05:00
Kent Yao 3c94a7c8f5 [SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes
### What changes were proposed in this pull request?

Add migration guide for extracting second from datetimes

### Why are the changes needed?

doc the behavior change for extract expression

### Does this PR introduce any user-facing change?

No
### How was this patch tested?

N/A, just passing jenkins

Closes #28140 from yaooqinn/SPARK-29311.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-07 07:09:45 +00:00
Huaxin Gao 44d37efba2 [SPARK-31333][SQL][DOCS] Document Join Hints
### What changes were proposed in this pull request?
Document Join Hints

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1049" alt="Screen Shot 2020-04-03 at 9 20 15 AM" src="https://user-images.githubusercontent.com/13592258/78382976-7c546b80-758c-11ea-9a8e-e46cfb7106f5.png">

<img width="1051" alt="Screen Shot 2020-04-03 at 10 39 55 AM" src="https://user-images.githubusercontent.com/13592258/78389778-356c7300-7598-11ea-8e6c-3742dadda11c.png">

### How was this patch tested?
Manually build and check

Closes #28113 from huaxingao/join-hints.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-06 09:02:22 -05:00
Takeshi Yamamuro e24f0dcd27 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
### What changes were proposed in this pull request?

This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions.

### Why are the changes needed?

To improve the SQL documents

### Does this PR introduce any user-facing change?

Yes.

<img src="https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png" width="500">

### How was this patch tested?

Manually checked.

Closes #28134 from maropu/SPARK-31358.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-06 21:36:51 +09:00
Dongjoon Hyun 3886442332 [SPARK-27963][DOCS][FOLLOWUP] Update requirements for spark.dynamicAllocation.enabled
### What changes were proposed in this pull request?

This PR fixes the outdated requirement for `spark.dynamicAllocation.enabled=true`.

### Why are the changes needed?

This is found during 3.0.0 RC1 document review and testing. As described at `spark.dynamicAllocation.shuffleTracking.enabled` in the same table, we can enabled Dynamic Allocation without external shuffle service.

### Does this PR introduce any user-facing change?

Yes. (Doc.)

### How was this patch tested?

Manually generate the doc by `SKIP_API=1 jekyll build`

**BEFORE**
![Screen Shot 2020-04-05 at 2 31 23 PM](https://user-images.githubusercontent.com/9700541/78510472-29c0ae00-774a-11ea-9916-ba80015fae82.png)

**AFTER**
![Screen Shot 2020-04-05 at 2 29 25 PM](https://user-images.githubusercontent.com/9700541/78510434-ea925d00-7749-11ea-8db8-018955507fd5.png)

Closes #28132 from dongjoon-hyun/SPARK-DA-DOC.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-06 11:04:21 +09:00
Huaxin Gao 4e45c07f5d [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference
### What changes were proposed in this pull request?
Create Function docs structure for SQL Reference...

### Why are the changes needed?
so the Function docs can be added later, also want to get a consensus about what to document for Functions in SQL Reference.

### Does this PR introduce any user-facing change?
Yes
<img width="1050" alt="Screen Shot 2020-04-02 at 12 09 20 AM" src="https://user-images.githubusercontent.com/13592258/78220451-68b6e100-7476-11ea-9a21-733b41652785.png">

<img width="1051" alt="Screen Shot 2020-04-02 at 12 09 44 AM" src="https://user-images.githubusercontent.com/13592258/78220460-6ce2fe80-7476-11ea-887c-defefd55c19d.png">

<img width="1051" alt="Screen Shot 2020-04-02 at 12 10 05 AM" src="https://user-images.githubusercontent.com/13592258/78220463-6f455880-7476-11ea-81fc-fd4137db7c3f.png">

### How was this patch tested?
Manually build and check

Closes #28099 from huaxingao/function.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-03 14:36:03 +09:00
Takeshi Yamamuro d98df7626b [SPARK-31325][SQL][WEB UI] Control a plan explain mode in the events of SQL listeners via SQLConf
### What changes were proposed in this pull request?

This PR intends to add a new SQL config for controlling a plan explain mode in the events of (e.g., `SparkListenerSQLExecutionStart` and `SparkListenerSQLAdaptiveExecutionUpdate`) SQL listeners. In the current master, the output of `QueryExecution.toString` (this is equivalent to the "extended" explain mode) is stored in these events. I think it is useful to control the content via `SQLConf`. For example, the query "Details" content (TPCDS q66 query) of a SQL tab in a Spark web UI will be changed as follows;

Before this PR:
![q66-extended](https://user-images.githubusercontent.com/692303/78211668-950b4580-74e8-11ea-90c6-db52d437534b.png)

After this PR:
![q66-formatted](https://user-images.githubusercontent.com/692303/78211674-9ccaea00-74e8-11ea-9d1d-43c7e2b0f314.png)

### Why are the changes needed?

For better usability.

### Does this PR introduce any user-facing change?

Yes; since Spark 3.1, SQL UI data adopts the `formatted` mode for the query plan explain results. To restore the behavior before Spark 3.0, you can set `spark.sql.ui.explainMode` to `extended`.

### How was this patch tested?

Added unit tests.

Closes #28097 from maropu/SPARK-31325.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-04-02 21:09:16 -07:00
Thomas Graves 55dea9be62 [SPARK-29153][CORE] Add ability to merge resource profiles within a stage with Stage Level Scheduling
### What changes were proposed in this pull request?

For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage.  There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc).  further documentation will be added with SPARK-30322.

This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles.

### Why are the changes needed?

To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail.

### Does this PR introduce any user-facing change?

Yes, when the config is turned on it now merges the profiles instead of errorring out.

### How was this patch tested?

Unit tests

Closes #28053 from tgravescs/SPARK-29153.

Lead-authored-by: Thomas Graves <tgraves@apache.org>
Co-authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-04-02 08:30:18 -05:00
beliefer 50e535c431 [SPARK-31295][DOC][FOLLOWUP] Supplement version for configuration appear in doc
### What changes were proposed in this pull request?
This PR supplements version for configuration appear in docs.
I sorted out some information show below.

**docs/sql-performance-tuning.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.inMemoryColumnarStorage.compressed | 1.0.1 | SPARK-2631 | 86534d0f5255362618c05a07b0171ec35c915822#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.inMemoryColumnarStorage.batchSize | 1.1.1 | SPARK-2650 | 779d1eb26d0f031791e93c908d51a59c3b422a55#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.files.maxPartitionBytes | 2.0.0 | SPARK-13664 | 17eec0a71ba8713c559d641e3f43a1be726b037c#diff-32bb9518401c0948c5ea19377b5069ab |  
spark.sql.files.openCostInBytes | 2.0.0 | SPARK-14259 | 400b2f863ffaa01a34a8dae1541c61526fef908b#diff-32bb9518401c0948c5ea19377b5069ab |  
spark.sql.broadcastTimeout | 1.3.0 | SPARK-4269 | fa66ef6c97e87c9255b67b03836a4ba50598ebae#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.autoBroadcastJoinThreshold | 1.1.0 | SPARK-2393 | c7db274be79f448fda566208946cb50958ea9b1a#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.shuffle.partitions | 1.1.0 | SPARK-1508 | 08ed9ad81397b71206c4dc903bfb94b6105691ed#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.adaptive.coalescePartitions.enabled | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.coalescePartitions.minPartitionNum | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.coalescePartitions.initialPartitionNum | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.advisoryPartitionSizeInBytes | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.skewJoin.enabled | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.skewJoin.skewedPartitionFactor | 3.0.0 | SPARK-31037 | 46b7f1796bd0b96977ce9b473601033f397a3b18#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes | 3.0.0 | SPARK-31201 | 8d0800a0803d3c47938bddefa15328d654739bc5#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-ref-ansi-compliance.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.ansi.enabled | 3.0.0 | SPARK-30125 | d9b30694122f8716d3acb448638ef1e2b96ebc7a#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.storeAssignmentPolicy | 3.0.0 | SPARK-28730 | 895c90b582cc2b2667241f66d5b733852aeef9eb#diff-9a6b543db706f1a90f790783d6930a13 |

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28096 from beliefer/supplement-version-of-performance.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-02 16:01:54 +09:00
Kousuke Saruta b9b1b549af
[SPARK-31073][DOC][FOLLOWUP] Add description for Shuffle Write Time metric in StagePage to web-ui.md
### What changes were proposed in this pull request?

This PR adds description for `Shuffle Write Time` to `web-ui.md`.

### Why are the changes needed?

#27837 added `Shuffle Write Time` metric to task metrics summary but it's not documented yet.

### Does this PR introduce any user-facing change?

Yes.
We can see the description for `Shuffle Write Time` in the new `web-ui.html`.
<img width="956" alt="shuffle-write-time-description" src="https://user-images.githubusercontent.com/4736016/78175342-a9722280-7495-11ea-9cc6-62c6f3619aa3.png">

### How was this patch tested?

Built docs by `SKIP_API=1 jekyll build` in `doc` directory and then confirmed `web-ui.html`.

Closes #28093 from sarutak/SPARK-31073-doc.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-01 12:03:41 -07:00
Huaxin Gao fd0b228127 [SPARK-31290][R] Add back the deprecated R APIs
### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by https://github.com/apache/spark/pull/22843/ and https://github.com/apache/spark/pull/22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, ...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists before assuming from https://github.com/apache/spark/pull/9192,  but seems we don't need it anymore since SparkR replaced SQLContext with Spark Session at https://github.com/apache/spark/pull/13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-01 10:38:03 +09:00
Huaxin Gao 1a7f9649b6 [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference
### What changes were proposed in this pull request?
Add a page to list all commands in SQL Reference...

### Why are the changes needed?
so it's easier for user to find a specific command.

### Does this PR introduce any user-facing change?
before:
![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png)

after:
![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png)

![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png)

Also move ```use database``` from query category to ddl category.

### How was this patch tested?
Manually build and check

Closes #28074 from huaxingao/list-all.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-01 08:42:15 +09:00
HyukjinKwon 4d4c3e76f6 Revert "[SPARK-30879][DOCS] Refine workflow for building docs"
This reverts commit 7892f88f84.
2020-03-31 16:11:59 +09:00
beliefer 47c810f8ae [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
### What changes were proposed in this pull request?
Add version information to the configuration of `Hive`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.hive.metastore.version | 1.4.0 | SPARK-6908 | 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.version | 1.1.1 | SPARK-3971 | 64945f868443fbc59cb34b34c16d782dda0fb63d#diff-12fa2178364a810b3262b30d8d48aa2d |  
spark.sql.hive.metastore.jars | 1.4.0 | SPARK-6908 | 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.convertMetastoreParquet.mergeSchema | 1.3.1 | SPARK-6575 | 778c87686af0c04df9dfe144b8f744f271a988ad#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.convertMetastoreOrc | 2.0.0 | SPARK-14070 | 1e886159849e3918445d3fdc3c4cef86c6c1a236#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.convertInsertingPartitionedTable | 3.0.0 | SPARK-28573 | d5688dc732890923c326f272b0c18c329a69459a#diff-842e3447fc453de26c706db1cac8f2c4 |  
spark.sql.hive.convertMetastoreCtas | 3.0.0 | SPARK-25271 | 5ad03607d1487e7ab3e3b6d00eef9c4028ed4975#diff-842e3447fc453de26c706db1cac8f2c4 |  
spark.sql.hive.metastore.sharedPrefixes | 1.4.0 | SPARK-7491 | a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.metastore.barrierPrefixes | 1.4.0 | SPARK-7491 | a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.hive.thriftServer.async | 1.5.0 | SPARK-6964 | eb19d3f75cbd002f7e72ce02017a8de67f562792#diff-ff50aea397a607b79df9bec6f2a841db |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #28042 from beliefer/add-version-to-hive-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-31 12:35:01 +09:00
beliefer 4fc8ee74fc [SPARK-31295][DOC] Supplement version for configuration appear in doc
### What changes were proposed in this pull request?
This PR supplements version for configuration appear in docs.
I sorted out some information show below.

**docs/spark-standalone.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.deploy.retainedApplications | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5ff93142d2dd5c79c63fa947f47a1d4db8b#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff37cd2503cc6ea82c5dd395187b0910af0#diff-0e7ae91819fc8f7b47b0f97be7116325 |  
spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f0330f22463ecf7cbee7c0465e10fba8a8#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.worker.resource.{resourceName}.amount | 3.0.0 | SPARK-27371 | cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27371 | cbad616d4cb0c58993a88df14b5e30778c7f7e85#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |  
spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59218c209d39cbba5959302d8668b885cf6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 |  
spark.deploy.recoveryMode | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.recoveryDirectory | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 |  

**docs/sql-data-sources-avro.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.legacy.replaceDatabricksSparkAvro.enabled | 2.4.0 | SPARK-25129 | ac0174e55af2e935d41545721e9f430c942b3a0c#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.compression.codec | 2.4.0 | SPARK-24881 | 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.deflate.level | 2.4.0 | SPARK-24881 | 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-orc.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.orc.impl | 2.3.0 | SPARK-20728 | 326f1d6728a7734c228d8bfaa69442a1c7b92e9b#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.orc.enableVectorizedReader | 2.3.0 | SPARK-16060 | 60f6b994505e3f82091a04eed2dc0a9e8bd523ce#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-parquet.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.parquet.binaryAsString | 1.1.1 | SPARK-2927 | de501e169f24e4573747aec85b7651c98633c028#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.int96AsTimestamp | 1.3.0 | SPARK-4987 | 67d52207b5cf2df37ca70daff2a160117510f55e#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.compression.codec | 1.1.1 | SPARK-3131 | 3a9d874d7a46ab8b015631d91ba479d9a0ba827f#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.filterPushdown | 1.2.0 | SPARK-4391 | 576688aa2a19bd4ba239a2b93af7947f983e5124#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.parquet.mergeSchema | 1.5.0 | SPARK-8690 | 246265f2bb056d5e9011d3331b809471a24ff8d7#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.writeLegacyFormat | 1.6.0 | SPARK-10400 | 01cd688f5245cbb752863100b399b525b31c3510#diff-41ef65b9ef5b518f77e2a03559893f4d |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28064 from beliefer/supplement-doc-for-data-sources.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-31 12:33:46 +09:00
beliefer fc5d67fe22 [SPARK-31282][DOC] Supplement version for configuration appear in security doc
### What changes were proposed in this pull request?
This PR supplements version for configuration appear in security doc.
I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.network.crypto.keyLength | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.network.crypto.keyFactoryAlgorithm | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.network.crypto.config.* | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.network.crypto.saslFallback | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.authenticate.enableSaslEncryption | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.network.sasl.serverAlwaysEncrypt | 1.4.0 | SPARK-6229 | 38d4e9e446b425ca6a8fe8d8080f387b08683842#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.ui.filters | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-f79a5ead735b3d0b34b6b94486918e1c |  
spark.acls.enable | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.ui.view.acls | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.ui.view.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.admin.acls | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.admin.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.modify.acls | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.modify.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.user.groups.mapping | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.history.ui.acls.enable | 1.0.1 | Spark 1489 | c8dd13221215275948b1a6913192d40e0c8cbadd#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.ui.admin.acls | 2.1.1 | SPARK-19033 | 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.ui.admin.acls.groups | 2.1.1 | SPARK-19033 | 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.ui.xXssProtection | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.xContentTypeOptions.enabled | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.strictTransportSecurity | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.security.credentials.${service}.enabled | 2.3.0 | SPARK-20434 | a18d637112b97d2caaca0a8324bdd99086664b24#diff-da6c1fd6d8b0c7538a3e77a09e06a083 |  
spark.kerberos.access.hadoopFileSystems | 3.0.0 | SPARK-26766 | d0443a74d185ec72b747fa39994fa9a40ce974cf#diff-6bdad48cfc34314e89599655442ff210 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28044 from beliefer/supplement-version-to-security-doc.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-31 12:33:01 +09:00
beliefer 18b73a5b59 [SPARK-31269][DOC] Supplement version for configuration only appear in configuration doc
### What changes were proposed in this pull request?
The `configuration.md` exists some config not organized by `ConfigEntry`.
This PR supplements version for configuration only appear in configuration doc.
I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.app.name | 0.9.0 | None | 994f080f8ae3372366e6004600ba791c8a372ff0#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.driver.resource.{resourceName}.amount | 3.0.0 | SPARK-27760 | d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.driver.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27488 | 74e5e41eebf9ed596b48e6db52a2a9c642e5cbc3#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.driver.resource.{resourceName}.vendor | 3.0.0 | SPARK-27362 | 1277f8fa92da85d9e39d9146e3099fcb75c71a8f#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.executor.resource.{resourceName}.amount | 3.0.0 | SPARK-27760 | d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.executor.resource.{resourceType}.discoveryScript | 3.0.0 | SPARK-27024 | db2e3c43412e4a7fb4a46c58d73d9ab304a1e949#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.executor.resource.{resourceName}.vendor | 3.0.0 | SPARK-27362 | 1277f8fa92da85d9e39d9146e3099fcb75c71a8f#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.local.dir | 0.5.0 | None | 0e93891d3d7df849cff6442038c111ffd42a5243#diff-17fd275d280b667722664ed833c6402a |  
spark.logConf | 0.9.0 | None | d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.master | 0.9.0 | SPARK-544 | 2573add94cf920a88f74d80d8ea94218d812704d#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.driver.defaultJavaOptions | 3.0.0 | SPARK-23472 | f83000597f250868de9722d8285fed013abc5ecf#diff-a78ecfc6a89edfaf0b60a5eaa0381970 |  
spark.executor.defaultJavaOptions | 3.0.0 | SPARK-23472 | f83000597f250868de9722d8285fed013abc5ecf#diff-a78ecfc6a89edfaf0b60a5eaa0381970 |  
spark.executorEnv.[EnvironmentVariableName] | 0.9.0 | None | 642029e7f43322f84abe4f7f36bb0b1b95d8101d#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.python.profile | 1.2.0 | SPARK-3478 | 1aa549ba9839565274a12c52fa1075b424f138a6#diff-d6fe2792e44f6babc94aabfefc8b9bce |  
spark.python.profile.dump | 1.2.0 | SPARK-3478 | 1aa549ba9839565274a12c52fa1075b424f138a6#diff-d6fe2792e44f6babc94aabfefc8b9bce |  
spark.python.worker.memory | 1.1.0 | SPARK-2538 | 14174abd421318e71c16edd24224fd5094bdfed4#diff-d6fe2792e44f6babc94aabfefc8b9bce |  
spark.jars.packages | 1.5.0 | SPARK-9263 | 34335719a372c1951fdb4dd25b75b086faf1076f#diff-63a5d817d2d45ae24de577f6a1bd80f9 |  
spark.jars.excludes | 1.5.0 | SPARK-9263 | 34335719a372c1951fdb4dd25b75b086faf1076f#diff-63a5d817d2d45ae24de577f6a1bd80f9 |  
spark.jars.ivy | 1.3.0 | SPARK-5341 | 3b7acd22ab4a134c74746e3b9a803dbd34d43855#diff-63a5d817d2d45ae24de577f6a1bd80f9 |  
spark.jars.ivySettings | 2.2.0 | SPARK-17568 | 3bc2eff8880a3ba8d4318118715ea1a47048e3de#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
spark.jars.repositories | 2.3.0 | SPARK-21403 | d8257b99ddae23f702f312640a5335ddb4554403#diff-4d2ab44195558d5a9d5f15b8803ef39d |
spark.shuffle.io.maxRetries | 1.2.0 | SPARK-4188 | c1ea5c542f3267c0b23a7775887e3a6ece793fe3#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.shuffle.io.numConnectionsPerPeer | 1.2.1 | SPARK-4740 | 441ec3451730c7ae3dbef8952e313071d6147ab6#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.shuffle.io.preferDirectBufs | 1.2.0 | SPARK-4188 | c1ea5c542f3267c0b23a7775887e3a6ece793fe3#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.shuffle.io.retryWait | 1.2.1 | None | 5e5d8f469a1bea9bbe606f772ccdcab7c184c651#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.shuffle.io.backLog | 1.1.1 | SPARK-2468 | 66b4c81db7e826c00f7fb449b8a8af810cf7dd9a#diff-bdee8e601924d41e93baa7287189e878 |  
spark.shuffle.service.index.cache.size | 2.3.0 | SPARK-21501 | 1662e93119d68498942386906de309d35f4a135f#diff-97d5edc927a83a678e013ae00343df94 |
spark.shuffle.maxChunksBeingTransferred | 2.3.0 | SPARK-21175 | 799e13161e89f1ea96cb1bc7b507a05af2e89cd0#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.sql.ui.retainedExecutions | 1.5.0 | SPARK-8861 and SPARK-8862 | ebc3aad272b91cf58e2e1b4aa92b49b8a947a045#diff-81764e4d52817f83bdd5336ef1226bd9 |  
spark.streaming.ui.retainedBatches | 1.0.0 | SPARK-1386 | f36dc3fed0a0671b0712d664db859da28c0a98e2#diff-56b8d67d07284cfab165d5363bd3500e |
spark.default.parallelism | 0.5.0 | None | e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-0544ebf7533fa70ff5103e0fe1f0b036 |  
spark.files.fetchTimeout | 1.0.0 | None | f6f9d02e85d17da2f742ed0062f1648a9293e73c#diff-d239aee594001f8391676e1047a0381e |  
spark.files.useFetchCache | 1.2.2 | SPARK-6313 | a2a94a154bdd00753b8d5e344d712664c7151050#diff-d239aee594001f8391676e1047a0381e |
spark.files.overwrite | 1.0.0 | None | 84670f2715392859624df290c1b52eb4ed4a9cb1#diff-d239aee594001f8391676e1047a0381e | Exists in branch-1.0, but the version of pom is 0.9.0-incubating-SNAPSHOT
spark.hadoop.cloneConf | 1.0.3 | SPARK-2546 | 6d8f1dd15afdc7432b5721c89f9b2b402460322b#diff-83eb37f7b0ebed3c14ccb7bff0d577c2 |  
spark.hadoop.validateOutputSpecs | 1.0.1 | SPARK-1677 | 8100cbdb7546e8438019443cfc00683017c81278#diff-f70e97c099b5eac05c75288cb215e080 |
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version | 2.2.0 | SPARK-20107 | edc87d76efea7b4d19d9d0c4ddba274a3ccb8752#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.rpc.io.backLog | 3.0.0 | SPARK-27868 | 09ed64d795d3199a94e175273fff6fcea6b52131#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.network.io.preferDirectBufs | 3.0.0 | SPARK-24920 | e103c4a5e72bab8862ff49d6d4c1e62e642fc412#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.port.maxRetries | 1.1.1 | SPARK-3565 | 32f2222e915f31422089139944a077e2cbd442f9#diff-d239aee594001f8391676e1047a0381e |  
spark.core.connection.ack.wait.timeout | 1.1.1 | SPARK-2677 | bd3ce2ffb8964abb4d59918ebb2c230fe4614aa2#diff-f748e95f2aa97ed715afa53ddeeac9de |  
spark.scheduler.listenerbus.eventqueue.shared.capacity | 3.0.0 | SPARK-28574 | c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 |  
spark.scheduler.listenerbus.eventqueue.appStatus.capacity | 3.0.0 | SPARK-28574 | c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 |  
spark.scheduler.listenerbus.eventqueue.executorManagement.capacity | 3.0.0 | SPARK-28574 | c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 |  
spark.scheduler.listenerbus.eventqueue.eventLog.capacity | 3.0.0 | SPARK-28574 | c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 |  
spark.scheduler.listenerbus.eventqueue.streams.capacity | 3.0.0 | SPARK-28574 | c212c9d9ed7375cd1ea16c118733edd84037ec0d#diff-eb519ad78cc3cf0b95839cc37413b509 |  
spark.task.resource.{resourceName}.amount | 3.0.0 | SPARK-27760 | d30284b5a51dd784f663eb4eea37087b35a54d00#diff-76e731333fb756df3bff5ddb3b731c46 |  
spark.stage.maxConsecutiveAttempts | 2.2.0 | SPARK-13369 | 7b5d873aef672aa0aee41e338bab7428101e1ad3#diff-6a9ff7fb74fd490a50462d45db2d5e11 |  
spark.{driver\|executor}.rpc.io.serverThreads | 1.6.0 | SPARK-10745 | 7c5b641808740ba5eed05ba8204cdbaf3fc579f5#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.{driver\|executor}.rpc.io.clientThreads | 1.6.0 | SPARK-10745 | 7c5b641808740ba5eed05ba8204cdbaf3fc579f5#diff-d2ce9b38bdc38ca9d7119f9c2cf79907 |  
spark.{driver\|executor}.rpc.netty.dispatcher.numThreads | 3.0.0 | SPARK-29398 | 2f0a38cb50e3e8b4b72219c7b2b8b15d51f6b931#diff-a68a21481fea5053848ca666dd3201d8 |  
spark.r.driver.command | 1.5.3 | SPARK-10971 | 9695f452e86a88bef3bcbd1f3c0b00ad9e9ac6e1#diff-025470e1b7094d7cf4a78ea353fb3981 |  
spark.r.shell.command | 2.1.0 | SPARK-17178 | fa6347938fc1c72ddc03a5f3cd2e929b5694f0a6#diff-a78ecfc6a89edfaf0b60a5eaa0381970 |  
spark.graphx.pregel.checkpointInterval | 2.2.0 | SPARK-5484 | f971ce5dd0788fe7f5d2ca820b9ea3db72033ddc#diff-e399679417ffa6eeedf26a7630baca16 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28035 from beliefer/supplement-configuration-version.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-31 12:32:04 +09:00
beliefer bed21770af [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL
### What changes were proposed in this pull request?
Add version information to the static configuration of `SQL`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.warehouse.dir | 2.0.0 | SPARK-14994 | 054f991c4350af1350af7a4109ee77f4a34822f0#diff-32bb9518401c0948c5ea19377b5069ab |  
spark.sql.catalogImplementation | 2.0.0 | SPARK-14720 and SPARK-13643 | 8fc267ab3322e46db81e725a5cb1adb5a71b2b4d#diff-6bdad48cfc34314e89599655442ff210 |  
spark.sql.globalTempDatabase | 2.1.0 | SPARK-17338 | 23ddff4b2b2744c3dc84d928e144c541ad5df376#diff-6bdad48cfc34314e89599655442ff210 |  
spark.sql.sources.schemaStringLengthThreshold | 1.3.1 | SPARK-6024 | 6200f0709c5c8440decae8bf700d7859f32ac9d5#diff-41ef65b9ef5b518f77e2a03559893f4d | 1.3
spark.sql.filesourceTableRelationCacheSize | 2.2.0 | SPARK-19265 | 9d9d67c7957f7cbbdbe889bdbc073568b2bfbb16#diff-32bb9518401c0948c5ea19377b5069ab |
spark.sql.codegen.cache.maxEntries | 2.4.0 | SPARK-24727 | b2deef64f604ddd9502a31105ed47cb63470ec85#diff-5081b9388de3add800b6e4a6ddf55c01 |
spark.sql.codegen.comments | 2.0.0 | SPARK-15680 | f0e8738c1ec0e4c5526aeada6f50cf76428f9afd#diff-8bcc5aea39c73d4bf38aef6f6951d42c |  
spark.sql.debug | 2.1.0 | SPARK-17899 | db8784feaa605adcbd37af4bc8b7146479b631f8#diff-32bb9518401c0948c5ea19377b5069ab |  
spark.sql.hive.thriftServer.singleSession | 1.6.0 | SPARK-11089 | 167ea61a6a604fd9c0b00122a94d1bc4b1de24ff#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.extensions | 2.2.0 | SPARK-18127 | f0de600797ff4883927d0c70732675fd8629e239#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.queryExecutionListeners | 2.3.0 | SPARK-19558 | bd4eb9ce57da7bacff69d9ed958c94f349b7e6fb#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.streaming.streamingQueryListeners | 2.4.0 | SPARK-24479 | 7703b46d2843db99e28110c4c7ccf60934412504#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.ui.retainedExecutions | 1.5.0 | SPARK-8861 and SPARK-8862 | ebc3aad272b91cf58e2e1b4aa92b49b8a947a045#diff-81764e4d52817f83bdd5336ef1226bd9 |  
spark.sql.broadcastExchange.maxThreadThreshold | 3.0.0 | SPARK-26601 | 126310ca68f2f248ea8b312c4637eccaba2fdc2b#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.subquery.maxThreadThreshold | 2.4.6 | SPARK-30556 | 2fc562cafd71ec8f438f37a28b65118906ab2ad2#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.event.truncate.length | 3.0.0 | SPARK-27045 | e60d8fce0b0cf2a6d766ea2fc5f994546550570a#diff-5081b9388de3add800b6e4a6ddf55c01 |
spark.sql.legacy.sessionInitWithConfigDefaults | 3.0.0 | SPARK-27253 | 83f628b57da39ad9732d1393aebac373634a2eb9#diff-5081b9388de3add800b6e4a6ddf55c01 |
spark.sql.defaultUrlStreamHandlerFactory.enabled | 3.0.0 | SPARK-25694 | 8469614c0513fbed87977d4e741649db3fdd8add#diff-5081b9388de3add800b6e4a6ddf55c01 |
spark.sql.streaming.ui.enabled | 3.0.0 | SPARK-29543 | f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.streaming.ui.retainedProgressUpdates | 3.0.0 | SPARK-29543 | f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 |  
spark.sql.streaming.ui.retainedQueries | 3.0.0 | SPARK-29543 | f9b86370cb04b72a4f00cbd4d60873960aa2792c#diff-5081b9388de3add800b6e4a6ddf55c01 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27981 from beliefer/add-version-to-sql-static-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-31 12:31:25 +09:00
Luca Canali aa98ac52db
[SPARK-30775][DOC] Improve the description of executor metrics in the monitoring documentation
### What changes were proposed in this pull request?
This PR (SPARK-30775) aims to improve the description of the executor metrics in the monitoring documentation.

### Why are the changes needed?
Improve and clarify monitoring documentation by:
- adding reference to the Prometheus end point, as implemented in [SPARK-29064]
- extending the list and descripion of executor metrics, following up from [SPARK-27157]

### Does this PR introduce any user-facing change?
Documentation update.

### How was this patch tested?
n.a.

Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-30 18:00:54 -07:00
Kengo Seki 60dd1a690f
[SPARK-31293][DSTREAMS][KINESIS][DOC] Fix wrong examples and help messages for Kinesis integration
### What changes were proposed in this pull request?

This PR (SPARK-31293) fixes wrong command examples, parameter descriptions and help message format for Amazon Kinesis integration with Spark Streaming.

### Why are the changes needed?

To improve usability of those commands.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

I ran the fixed commands manually and confirmed they worked as expected.

Closes #28063 from sekikn/SPARK-31293.

Authored-by: Kengo Seki <sekikn@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-29 14:27:19 -07:00
Huaxin Gao e656e99061 [SPARK-30363][SQL][DOCS][FOLLOWUP] Fix a broken link in SQL Reference
### What changes were proposed in this pull request?
Fix a broken link and make the relevant docs reference to the new doc

### Why are the changes needed?

### Does this PR introduce any user-facing change?
Yes, make CACHE TABLE, UNCACHE TABLE, CLEAR CACHE, REFRESH TABLE link to the new doc

### How was this patch tested?
Manually build and check

Closes #28065 from huaxingao/spark-30363-follow-up.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-29 11:19:24 -05:00
HyukjinKwon 34c7476cb5
[SPARK-30722][DOCS][FOLLOW-UP] Add Pandas Function API into the menu
### What changes were proposed in this pull request?

This PR adds "Pandas Function API" into the menu.

### Why are the changes needed?

To be consistent and to make easier to navigate.

### Does this PR introduce any user-facing change?

No, master only.

![Screen Shot 2020-03-27 at 11 40 29 PM](https://user-images.githubusercontent.com/6477701/77767405-60306600-7084-11ea-944a-93726259cd00.png)

### How was this patch tested?

Manually verified by `SKIP_API=1 jekyll build`.

Closes #28054 from HyukjinKwon/followup-spark-30722.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-28 18:36:34 -07:00
gatorsmile b9eafcb526 [SPARK-31088][SQL] Add back HiveContext and createExternalTable
### What changes were proposed in this pull request?
Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small.

- HiveContext
- createExternalTable APIs

### Why are the changes needed?

Avoid breaking the APIs that are commonly used.

### Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released.

### How was this patch tested?

add a new test suite for createExternalTable APIs.

Closes #27815 from gatorsmile/addAPIsBack.

Lead-authored-by: gatorsmile <gatorsmile@gmail.com>
Co-authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2020-03-26 23:51:15 -07:00
Wenchen Fan 05498af72e [SPARK-31201][SQL] Add an individual config for skewed partition threshold
### What changes were proposed in this pull request?

Skew join handling comes with an overhead: we need to read some data repeatedly. We should treat a partition as skewed if it's large enough so that it's beneficial to do so.

Currently the size threshold is the advisory partition size, which is 64 MB by default. This is not large enough for the skewed partition size threshold.

This PR adds a new config for the threshold and set default value as 256 MB.

### Why are the changes needed?

Avoid skew join handling that may introduce a  perf regression.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27967 from cloud-fan/aqe.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-26 22:57:01 +09:00
beliefer 35d286bafb [SPARK-31228][DSTREAMS] Add version information to the configuration of Kafka
### What changes were proposed in this pull request?
Add version information to the configuration of Kafka.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.streaming.kafka.consumer.cache.enabled | 2.2.1 | SPARK-19185 | 02cf178bb2a7dc8b4c06eb040c44b6453e41ed15#diff-c465bbcc83b2ecc7530d1c0128e4432b |  
spark.streaming.kafka.consumer.poll.ms | 2.0.1 | SPARK-12177 | 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 |  
spark.streaming.kafka.consumer.cache.initialCapacity | 2.0.1 | SPARK-12177 | 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 |  
spark.streaming.kafka.consumer.cache.maxCapacity | 2.0.1 | SPARK-12177 | 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 |  
spark.streaming.kafka.consumer.cache.loadFactor | 2.0.1 | SPARK-12177 | 3134f116a3565c3a299fa2e7094acd7304d64280#diff-4597d93a0e951f7199697dba7dd0dc32 |  
spark.streaming.kafka.maxRatePerPartition | 1.3.0 | SPARK-4964 | a119cae48030520da9f26ee9a1270bed7f33031e#diff-26cb4369f86050dc2e75cd16291b2844 |  
spark.streaming.kafka.minRatePerPartition | 2.4.0 | SPARK-25233 | 135ff16a3510a4dfb3470904004dae9848005019#diff-815f6ec5caf9e4beb355f5f981171f1f |  
spark.streaming.kafka.allowNonConsecutiveOffsets | 2.3.1 | SPARK-24067 | 1d598b771de3b588a2f377ae7ccf8193156641f2#diff-4597d93a0e951f7199697dba7dd0dc32 |  
spark.kafka.producer.cache.timeout | 2.2.1 | SPARK-19968 | f6730a70cb47ebb3df7f42209df7b076aece1093#diff-ac8844e8d791a75aaee3d0d10bfc1f2a |  
spark.kafka.producer.cache.evictorThreadRunInterval | 3.0.0 | SPARK-21869 | 7bff2db9ed803e05a43c2d875c1dea819d81248a#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.cache.capacity | 3.0.0 | SPARK-27687 | efa303581ac61d6f517aacd08883da2d01530bd2#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.cache.jmx.enable | 3.0.0 | SPARK-25151 | 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.cache.timeout | 3.0.0 | SPARK-25151 | 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.cache.evictorThreadRunInterval | 3.0.0 | SPARK-25151 | 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.fetchedData.cache.timeout | 3.0.0 | SPARK-25151 | 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.consumer.fetchedData.cache.evictorThreadRunInterval | 3.0.0 | SPARK-25151 | 594c9c5a3ece0e913949c7160bb4925e5d289e44#diff-ea8349d528fe8d1b0a8ffa2840ff4bcd |  
spark.kafka.clusters.${cluster}.auth.bootstrap.servers | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.target.bootstrap.servers.regex | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.security.protocol | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.sasl.kerberos.service.name | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.ssl.truststore.location | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.ssl.truststore.password | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.ssl.keystore.location | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.ssl.keystore.password | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.ssl.key.password | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  
spark.kafka.clusters.${cluster}.sasl.token.mechanism | 3.0.0 | SPARK-27294 | 2f558094257c38d26650049f2ac93be6d65d6d85#diff-7df71bd47f5a3428ebdb05ced3c31f49 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27989 from beliefer/add-version-to-kafka-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-26 20:11:15 +09:00
Kent Yao 44bd36ad7b [SPARK-31234][SQL] ResetCommand should reset config to sc.conf only
### What changes were proposed in this pull request?
Currently, ResetCommand clear all configurations, including sql configs, static sql configs and spark context level configs.
for example:
```sql
spark-sql> set xyz=abc;
xyz abc
spark-sql> set;
spark.app.id local-1585055396930
spark.app.name SparkSQL::10.242.189.214
spark.driver.host 10.242.189.214
spark.driver.port 65094
spark.executor.id driver
spark.jars
spark.master local[*]
spark.sql.catalogImplementation hive
spark.sql.hive.version 1.2.1
spark.submit.deployMode client
xyz abc
spark-sql> reset;
spark-sql> set;
spark-sql> set spark.sql.hive.version;
spark.sql.hive.version 1.2.1
spark-sql> set spark.app.id;
spark.app.id <undefined>
```
In this PR, we restore spark confs to  RuntimeConfig after it is cleared

### Why are the changes needed?
reset command overkills configs which are static.
### Does this PR introduce any user-facing change?

yes, the ResetCommand do not change static configs now

### How was this patch tested?

add ut

Closes #28003 from yaooqinn/SPARK-31234.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-26 15:03:16 +08:00
Huaxin Gao ee6f8991a7 [SPARK-30934][ML][FOLLOW-UP] Update ml-guide to include MulticlassClassificationEvaluator weight support in highlights
### What changes were proposed in this pull request?
Update ml-guide to include ```MulticlassClassificationEvaluator``` weight support in highlights

### Why are the changes needed?
```MulticlassClassificationEvaluator``` weight support is very important, so should include it in highlights

### Does this PR introduce any user-facing change?
Yes

after:
![image](https://user-images.githubusercontent.com/13592258/77614952-6ccd8680-6eeb-11ea-9354-fa20004132df.png)

### How was this patch tested?
manually build and check

Closes #28031 from huaxingao/highlights-followup.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2020-03-26 14:24:53 +08:00
Wenchen Fan 4f274a4de9
[SPARK-31147][SQL] Forbid CHAR type in non-Hive-Serde tables
### What changes were proposed in this pull request?

Spark introduced CHAR type for hive compatibility but it only works for hive tables. CHAR type is never documented and is treated as STRING type for non-Hive tables.

However, this leads to confusing behaviors

**Apache Spark 3.0.0-preview2**
```
spark-sql> CREATE TABLE t(a CHAR(3));

spark-sql> INSERT INTO TABLE t SELECT 'a ';

spark-sql> SELECT a, length(a) FROM t;
a 	2
```

**Apache Spark 2.4.5**
```
spark-sql> CREATE TABLE t(a CHAR(3));

spark-sql> INSERT INTO TABLE t SELECT 'a ';

spark-sql> SELECT a, length(a) FROM t;
a  	3
```

According to the SQL standard, `CHAR(3)` should guarantee all the values are of length 3. Since `CHAR(3)` is treated as STRING so Spark doesn't guarantee it.

This PR forbids CHAR type in non-Hive tables as it's not supported correctly.

### Why are the changes needed?

avoid confusing/wrong behavior

### Does this PR introduce any user-facing change?

yes, now users can't create/alter non-Hive tables with CHAR type.

### How was this patch tested?

new tests

Closes #27902 from cloud-fan/char.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-25 09:25:55 -07:00
Wenchen Fan 1d0f54951e [SPARK-31205][SQL] support string literal as the second argument of date_add/date_sub functions
### What changes were proposed in this pull request?

https://github.com/apache/spark/pull/26412 introduced a behavior change that `date_add`/`date_sub` functions can't accept string and double values in the second parameter. This is reasonable as it's error-prone to cast string/double to int at runtime.

However, using string literals as function arguments is very common in SQL databases. To avoid breaking valid use cases that the string literal is indeed an integer, this PR proposes to add ansi_cast for string literal in date_add/date_sub functions. If the string value is not a valid integer, we fail at query compiling time because of constant folding.

### Why are the changes needed?

avoid breaking changes

### Does this PR introduce any user-facing change?

Yes, now 3.0 can run `date_add('2011-11-11', '1')` like 2.4

### How was this patch tested?

new tests.

Closes #27965 from cloud-fan/string.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-24 12:07:22 +08:00
Wenchen Fan d929c0dfe8 [SPARK-31133][SQL][DOC] fix sql ref doc for DML
### What changes were proposed in this pull request?

`INSERT OVERWRITE DIRECTORY` can only use file format (class implements `org.apache.spark.sql.execution.datasources.FileFormat`). This PR fixes it and other minor improvement.

### Why are the changes needed?

### Does this PR introduce any user-facing change?

### How was this patch tested?

Closes #27891 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-23 22:00:50 +08:00
beliefer a0cf972985 [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams
### What changes were proposed in this pull request?
Add version information to the configuration of `Dstreams`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.streaming.backpressure.enabled | 1.5.0 | SPARK-9967 and SPARK-10099 | 392bd19d678567751cd3844d9d166a7491c5887e#diff-1b584c4ed88a9022abb11d594f760997 |  
spark.streaming.backpressure.initialRate | 2.0.0 | SPARK-11627 | 7218c0eba957e0a079a407b79c3a050cce9647b2#diff-c64d571ef32d2dbf76e965ecd04a9f52 |  
spark.streaming.blockInterval | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-54d85b29e4349628a0de525c119399b5 |  
spark.streaming.receiver.maxRate | 1.0.2 | SPARK-1341 | ca19cfbcd5cfac9ad731350dfeea14355aec87d6#diff-c64d571ef32d2dbf76e965ecd04a9f52 |  
spark.streaming.receiver.writeAheadLog.enable | 1.2.1 | SPARK-4482 | ce5ea0fd611ce560f6e1fac83562469bdb97091e#diff-0607b70e4e79cbbc1a128c45784cb813 |  
spark.streaming.unpersist | 0.9.0 | None | 08b9fec93d00ff0ebb49af4d9ac72d2806eded02#diff-bcf5f84f78d23ebde7d532bea756bc57 |  
spark.streaming.stopGracefullyOnShutdown | 1.4.0 | SPARK-7776 | a17a5cb302c5fa6a4d3e9e3e0fa2100c0b5436d6#diff-8a7f0e3f26c15ba484e6312c3caf033d |  
spark.streaming.kafka.maxRetries | 1.3.0 | SPARK-4964 | a119cae48030520da9f26ee9a1270bed7f33031e#diff-26cb4369f86050dc2e75cd16291b2844 |  
spark.streaming.ui.retainedBatches | 1.0.0 | SPARK-1386 | f36dc3fed0a0671b0712d664db859da28c0a98e2#diff-56b8d67d07284cfab165d5363bd3500e |
spark.streaming.driver.writeAheadLog.closeFileAfterWrite | 1.6.0 | SPARK-11324 | 4f030b9e82172659d250281782ac573cbd1438fc#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.receiver.writeAheadLog.closeFileAfterWrite | 1.6.0 | SPARK-11324 | 4f030b9e82172659d250281782ac573cbd1438fc#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.receiver.writeAheadLog.class | 1.4.0 | SPARK-7056 | 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.receiver.writeAheadLog.rollingIntervalSecs | 1.4.0 | SPARK-7056 | 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.receiver.writeAheadLog.maxFailures | 1.2.0 | SPARK-4028 | 234de9232bcfa212317a8073c4a82c3863b36b14#diff-8cec1a581eebcad673dc8930b1a2801c |  
spark.streaming.driver.writeAheadLog.class | 1.4.0 | SPARK-7056 | 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.driver.writeAheadLog.rollingIntervalSecs | 1.4.0 | SPARK-7056 | 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.driver.writeAheadLog.maxFailures | 1.4.0 | SPARK-7056 | 1868bd40dcce23990b98748b0239bd00452b1ca5#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.driver.writeAheadLog.allowBatching | 1.6.0 | SPARK-11141 | dccc4645df629f35c4788d50b2c0a6ab381db4b7#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.driver.writeAheadLog.batchingTimeout | 1.6.0 | SPARK-11141 | dccc4645df629f35c4788d50b2c0a6ab381db4b7#diff-a1b3ec72e8d7cc91433a1cc64fe6e91d |  
spark.streaming.sessionByKey.deltaChainThreshold | 1.6.0 | SPARK-11290 | daa74be6f863061221bb0c2f94e70672e6fcbeaa#diff-e0a40541298f885606a2361ff9c5af6c |  
spark.streaming.backpressure.rateEstimator | 1.5.0 | SPARK-8977 | 819be46e5a73f2d19230354ebba30c58538590f5#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 |  
spark.streaming.backpressure.pid.proportional | 1.5.0 | SPARK-8979 | 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 |  
spark.streaming.backpressure.pid.integral | 1.5.0 | SPARK-8979 | 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 |  
spark.streaming.backpressure.pid.derived | 1.5.0 | SPARK-8979 | 0a1d2ca42c8b31d6b0e70163795f0185d4622f87#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 |  
spark.streaming.backpressure.pid.minRate | 1.5.0 | SPARK-9966 | 612b4609bdd38763725ae07d77c2176aa6756e64#diff-5dcaea3a4eca07f898fa88fe6d69e5c3 |  
spark.streaming.concurrentJobs | 0.7.0 | None | c97ebf64377e853ab7c616a103869a4417f25954#diff-839f06302b2d648a85436486fc13c85d |  
spark.streaming.internal.batchTime | 1.4.0 | SPARK-6862 | 1b7106b867bc0aa4d64b669d79b646f862acaf47#diff-25124e4f06a1da237bf486eceb1f7967 | It's not a configuration, it's a property
spark.streaming.internal.outputOpId | 1.4.0 | SPARK-6862 | 1b7106b867bc0aa4d64b669d79b646f862acaf47#diff-25124e4f06a1da237bf486eceb1f7967 | It's not a configuration, it's a property
spark.streaming.clock | 0.7.0 | None | cae894ee7aefa4cf9b1952038a48be81e1d2a856#diff-839f06302b2d648a85436486fc13c85d |  
spark.streaming.gracefulStopTimeout | 1.0.0 | SPARK-1332 | 94cbe2329021296b660d88f3e8ef3734374020d2#diff-2f8c5c038fda47b9875e10785fdd2498 |  
spark.streaming.manualClock.jump | 0.7.0 | None | fc3d0b602a08fdd182c2138506d1cd9952631f95#diff-839f06302b2d648a85436486fc13c85d |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists UT

Closes #27898 from beliefer/add-version-to-dstream-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-23 13:01:44 +09:00
beliefer ae0699d4b5 [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core
### What changes were proposed in this pull request?
This PR follows up #27847, #27852 and https://github.com/apache/spark/pull/27913.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.storage.localDiskByExecutors.cacheSize | 3.0.0 | SPARK-27651 | fd2bf55abaab08798a428d4e47d4050ba2b82a95#diff-6bdad48cfc34314e89599655442ff210 |
spark.storage.memoryMapLimitForTests | 2.3.0 | SPARK-3151 | b8ffb51055108fd606b86f034747006962cd2df3#diff-abd96f2ae793cd6ea6aab5b96a3c1d7a |  
spark.barrier.sync.timeout | 2.4.0 | SPARK-24817 | 388f5a0635a2812cd71b08352e3ddc20293ec189#diff-6bdad48cfc34314e89599655442ff210 |
spark.scheduler.blacklist.unschedulableTaskSetTimeout | 2.4.1 | SPARK-22148 | 52e9711d01694158ecb3691f2ec25c0ebe4b0207#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.barrier.maxConcurrentTasksCheck.interval | 2.4.0 | SPARK-24819 | bfb74394a5513134ea1da9fcf4a1783b77dd64e4#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.barrier.maxConcurrentTasksCheck.maxFailures | 2.4.0 | SPARK-24819 | bfb74394a5513134ea1da9fcf4a1783b77dd64e4#diff-6bdad48cfc34314e89599655442ff210 |  
spark.unsafe.exceptionOnMemoryLeak | 1.4.0 | SPARK-7076 and SPARK-7077 and SPARK-7080 | f49284b5bf3a69ed91a5e3e6e0ed3be93a6ab9e4#diff-5a0de266c82b95adb47d9bca714e1f1b |  
spark.unsafe.sorter.spill.read.ahead.enabled | 2.3.0 | SPARK-21113 | 1e978b17d63d7ba20368057aa4e65f5ef6e87369#diff-93a086317cea72a113cf81056882c206 |  
spark.unsafe.sorter.spill.reader.buffer.size | 2.1.0 | SPARK-16862 | c1937dd19a23bd096a4707656c7ba19fb5c16966#diff-93a086317cea72a113cf81056882c206 |  
spark.plugins | 3.0.0 | SPARK-29397 | d51d228048d519a9a666f48dc532625de13e7587#diff-6bdad48cfc34314e89599655442ff210 |  
spark.cleaner.periodicGC.interval | 1.6.0 | SPARK-8414 | 72da2a21f0940b97757ace5975535e559d627688#diff-75141521b1d55bc32d72b70032ad96c0 |
spark.cleaner.referenceTracking | 1.0.0 | SPARK-1103 | 11eabbe125b2ee572fad359c33c93f5e6fdf0b2d#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.cleaner.referenceTracking.blocking | 1.0.0 | SPARK-1103 | 11eabbe125b2ee572fad359c33c93f5e6fdf0b2d#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.cleaner.referenceTracking.blocking.shuffle | 1.1.1 | SPARK-3139 | 5cf1e440137006eedd6846ac8fa57ccf9fd1958d#diff-75141521b1d55bc32d72b70032ad96c0 |  
spark.cleaner.referenceTracking.cleanCheckpoints | 1.4.0 | SPARK-2033 | 25998e4d73bcc95ac85d9af71adfdc726ec89568#diff-440e866c5df0b8386aff57f9f8bd8db1 |  
spark.executor.logs.rolling.strategy | 1.1.0 | SPARK-1940 | 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 |
spark.executor.logs.rolling.time.interval | 1.1.0 | SPARK-1940 | 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 |
spark.executor.logs.rolling.maxSize | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.executor.logs.rolling.maxRetainedFiles | 1.1.0 | SPARK-1940 | 4823bf470ec1b47a6f404834d4453e61d3dcbec9#diff-2b4575e096e4db7165e087f9429f2a02 |
spark.executor.logs.rolling.enableCompression | 2.0.2 | SPARK-17711 | 26e978a93f029e1a1b5c7524d0b52c8141b70997#diff-2b4575e096e4db7165e087f9429f2a02 |  
spark.master.rest.enabled | 1.3.0 | SPARK-5388 | 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.master.rest.port | 1.3.0 | SPARK-5388 | 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.master.ui.port | 1.1.0 | SPARK-2857 | 12f99cf5f88faf94d9dbfe85cb72d0010a3a25ac#diff-366c88f47e9b5cfa4d4305febeb8b026 |  
spark.io.compression.snappy.blockSize | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.io.compression.lz4.blockSize | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.io.compression.codec | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-df9e6118c481ceb27faa399114fac0a1 |  
spark.io.compression.zstd.bufferSize | 2.3.0 | SPARK-19112 | 444bce1c98c45147fe63e2132e9743a0c5e49598#diff-df9e6118c481ceb27faa399114fac0a1 |  
spark.io.compression.zstd.level | 2.3.0 | SPARK-19112 | 444bce1c98c45147fe63e2132e9743a0c5e49598#diff-df9e6118c481ceb27faa399114fac0a1 |  
spark.io.warning.largeFileThreshold | 3.0.0 | SPARK-28366 | 26d03b62e20d053943d03b5c5573dd349e49654c#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.compression.codec | 3.0.0 | SPARK-28118 | 47f54b1ec717d0d744bf3ad46bb1ed3542b667c8#diff-6bdad48cfc34314e89599655442ff210 |  
spark.buffer.size | 0.5.0 | None | 4b1646a25f7581cecae108553da13833e842e68a#diff-eaf125f56ce786d64dcef99cf446a751 |  
spark.locality.wait.process | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 |  
spark.locality.wait.node | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 |  
spark.locality.wait.rack | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 |  
spark.reducer.maxSizeInFlight | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.reducer.maxReqsInFlight | 2.0.0 | SPARK-6166 | 894921d813a259f2f266fde7d86d2ecb5a0af24b#diff-eb30a71e0d04150b8e0b64929852e38b |  
spark.broadcast.compress | 0.6.0 | None | efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 |  
spark.broadcast.blockSize | 0.5.0 | None | b8ab7862b8bd168bca60bd930cd97c1099fbc8a8#diff-271d7958e14cdaa46cf3737cfcf51341 |  
spark.broadcast.checksum | 2.1.1 | SPARK-18188 | 06a56df226aa0c03c21f23258630d8a96385c696#diff-4f43d14923008c6650a8eb7b40c07f74 |
spark.broadcast.UDFCompressionThreshold | 3.0.0 | SPARK-28355 | 79e204770300dab4a669b9f8e2421ef905236e7b#diff-6bdad48cfc34314e89599655442ff210 |
spark.rdd.compress | 0.6.0 | None | efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 |  
spark.rdd.parallelListingThreshold | 2.0.0 | SPARK-9926 | 80a4bfa4d1c86398b90b26c34d8dcbc2355f5a6a#diff-eaababfc87ea4949f97860e8b89b7586 |
spark.rdd.limit.scaleUpFactor | 2.1.0 | SPARK-16984 | 806d8a8e980d8ba2f4261bceb393c40bafaa2f73#diff-1d55e54678eff2076263f2fe36150c17 |  
spark.serializer | 0.5.0 | None | fd1d255821bde844af28e897fabd59a715659038#diff-b920b65c23bf3a1b3326325b0d6a81b2 |  
spark.serializer.objectStreamReset | 1.0.0 | SPARK-942 | 40566e10aae4b21ffc71ea72702b8df118ac5c8e#diff-6a59dfc43d1b31dc1c3072ceafa829f5 |  
spark.serializer.extraDebugInfo | 1.3.0 | SPARK-5307 | 636408311deeebd77fb83d2249e0afad1a1ba149#diff-6a59dfc43d1b31dc1c3072ceafa829f5 |  
spark.jars | 0.9.0 | None | f1d206c6b4c0a5b2517b05af05fdda6049e2f7c2#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.files | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.submit.deployMode | 1.5.0 | SPARK-6797 | 7f487c8bde14dbdd244a3493ad11a129ef2bb327#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
spark.submit.pyFiles | 1.0.1 | SPARK-1549 | d7ddb26e1fa02e773999cc4a97c48d2cd1723956#diff-4d2ab44195558d5a9d5f15b8803ef39d |
spark.scheduler.allocation.file | 0.8.1 | None | 976fe60f7609d7b905a34f18743efabd966407f0#diff-9bc0105ee454005379abed710cd20ced |  
spark.scheduler.minRegisteredResourcesRatio | 1.1.1 | SPARK-2635 | 3311da2f9efc5ff2c7d01273ac08f719b067d11d#diff-7d99a7c7a051e5e851aaaefb275a44a1 |  
spark.scheduler.maxRegisteredResourcesWaitingTime | 1.1.1 | SPARK-2635 | 3311da2f9efc5ff2c7d01273ac08f719b067d11d#diff-7d99a7c7a051e5e851aaaefb275a44a1 |  
spark.scheduler.mode | 0.8.0 | None | 98fb69822cf780160bca51abeaab7c82e49fab54#diff-cb7a25b3c9a7341c6d99bcb8e9780c92 |  
spark.scheduler.revive.interval | 0.8.1 | None | d0c9d41a061969d409715b86a91937d8de4c29f7#diff-7d99a7c7a051e5e851aaaefb275a44a1 |  
spark.speculation | 0.6.0 | None | e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-4e188f32951dc989d97fa7577858bc7c |  
spark.speculation.interval | 0.6.0 | None | e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-4e188f32951dc989d97fa7577858bc7c |  
spark.speculation.multiplier | 0.6.0 | None | e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-fff59f72dfe6ca4ccb607ad12535da07 |  
spark.speculation.quantile | 0.6.0 | None | e72afdb817bcc8388aeb8b8d31628fd5fd67acf1#diff-fff59f72dfe6ca4ccb607ad12535da07 |  
spark.speculation.task.duration.threshold | 3.0.0 | SPARK-29976 | ad238a2238a9d0da89be4424574436cbfaee579d#diff-6bdad48cfc34314e89599655442ff210 |
spark.yarn.stagingDir | 2.0.0 | SPARK-13063 | bc36df127d3b9f56b4edaeb5eca7697d4aef761a#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.buffer.pageSize | 1.5.0 | SPARK-9411 | 1b0099fc62d02ff6216a76fbfe17a4ec5b2f3536#diff-1b22e54318c04824a6d53ed3f4d1bb35 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27931 from beliefer/add-version-to-core-config-part-four.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-23 11:07:43 +09:00
yan ma fae981e5f3 [SPARK-30773][ML] Support NativeBlas for level-1 routines
### What changes were proposed in this pull request?
Change BLAS for part of level-1 routines(axpy, dot, scal(double, denseVector)) from java implementation to NativeBLAS when vector size>256

### Why are the changes needed?
In current ML BLAS.scala, all level-1 routines are fixed to use java
implementation. But NativeBLAS(intel MKL, OpenBLAS) can bring up to 11X
performance improvement based on performance test which apply direct
calls against these methods. We should provide a way to allow user take
advantage of NativeBLAS for level-1 routines. Here we do it through
switching to NativeBLAS for these methods from f2jBLAS.

### Does this PR introduce any user-facing change?
 Yes, methods axpy, dot, scal in level-1 routines will switch to NativeBLAS when it has more than nativeL1Threshold(fixed value 256) elements and will fallback to f2jBLAS if native BLAS is not properly configured in system.

### How was this patch tested?
Perf test direct calls level-1 routines

Closes #27546 from yma11/SPARK-30773.

Lead-authored-by: yan ma <yan.ma@intel.com>
Co-authored-by: Ma Yan <yan.ma@intel.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-20 10:32:58 -05:00
Kent Yao 88ae6c4481 [SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document
### What changes were proposed in this pull request?

Fix errors and missing parts for datetime pattern document
1. The pattern we use is similar to DateTimeFormatter and SimpleDateFormat but not identical. So we shouldn't use any of them in the API docs but use a link to the doc of our own.
2. Some pattern letters are missing
3. Some pattern letters are explicitly banned - Set('A', 'c', 'e', 'n', 'N')
4. the second fraction pattern different logic for parsing and formatting

### Why are the changes needed?

fix and improve doc
### Does this PR introduce any user-facing change?

yes, new and updated doc
### How was this patch tested?

pass Jenkins
viewed locally with `jekyll serve`
![image](https://user-images.githubusercontent.com/8326978/77044447-6bd3bb00-69fa-11ea-8d6f-7084166c5dea.png)

Closes #27956 from yaooqinn/SPARK-31189.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-20 21:59:26 +08:00
Wenchen Fan 8643e5d9c5 [SPARK-31171][SQL][FOLLOWUP] update document
### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27950 from cloud-fan/null.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-03-19 07:29:31 +09:00
Huaxin Gao d22c9f6c0d [SPARK-30933][ML][DOCS] ML, GraphX 3.0 QA: Update user guide for new features & APIs
### What changes were proposed in this pull request?
Change ml-tuning.html.

### Why are the changes needed?
Add description for ```MultilabelClassificationEvaluator``` and ```RankingEvaluator```.

### Does this PR introduce any user-facing change?
Yes

before:
![image](https://user-images.githubusercontent.com/13592258/76437013-2c5ffb80-6376-11ea-8946-f5c2e7379b7c.png)

after:
![image](https://user-images.githubusercontent.com/13592258/76437054-397cea80-6376-11ea-867f-fe8d8fa4e5b3.png)

### How was this patch tested?

Closes #27880 from huaxingao/spark-30933.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-18 13:21:24 -05:00
Kent Yao 57fcc49306 [SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar
### What changes were proposed in this pull request?

The meaning of 'u' was day number of the week in SimpleDateFormat, it was changed to year in DateTimeFormatter. Now we keep the old meaning of 'u' by substituting 'u' to 'e' internally and use DateTimeFormatter to parse the pattern string. In DateTimeFormatter, the 'e' and 'c' also represents day-of-week. e.g.

```sql
select date_format(timestamp '2019-10-06', 'yyyy-MM-dd uuuu');
select date_format(timestamp '2019-10-06', 'yyyy-MM-dd uuee');
select date_format(timestamp '2019-10-06', 'yyyy-MM-dd eeee');
```
Because of the substitution, they all goes to `.... eeee` silently. The users may congitive problems of their meanings, so we should mark them as illegal pattern characters to stay the same as before.

This pr move the method `convertIncompatiblePattern` from `DatetimeUtils` to `DateTimeFormatterHelper` object, since it is quite specific for `DateTimeFormatterHelper` class.
And 'e' and 'c' char checking in this method.

Besides,`convertIncompatiblePattern` has a bug that will lose the last `'` if it ends with it, this pr fixes this too. e.g.

```sql
spark-sql> select date_format(timestamp "2019-10-06", "yyyy-MM-dd'S'");
20/03/18 11:19:45 ERROR SparkSQLDriver: Failed in [select date_format(timestamp "2019-10-06", "yyyy-MM-dd'S'")]
java.lang.IllegalArgumentException: Pattern ends with an incomplete string literal: uuuu-MM-dd'S

spark-sql> select to_timestamp("2019-10-06S", "yyyy-MM-dd'S'");
NULL
```
### Why are the changes needed?

avoid vagueness
bug fix

### Does this PR introduce any user-facing change?

no, these are not  exposed yet

### How was this patch tested?

add ut

Closes #27939 from yaooqinn/SPARK-31176.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-18 20:19:50 +08:00
jiake 21c02ee5d0 [SPARK-30864][SQL][DOC] add the user guide for Adaptive Query Execution
### What changes were proposed in this pull request?
This PR will add the user guide for AQE and the detailed configurations about the three mainly features in AQE.

### Why are the changes needed?
Add the detailed configurations.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
only add doc no need ut.

Closes #27616 from JkSelf/aqeuserguide.

Authored-by: jiake <ke.a.jia@intel.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-16 23:33:56 +08:00
beliefer f4cd7495f1 [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core
### What changes were proposed in this pull request?
This PR follows up #27847 and https://github.com/apache/spark/pull/27852.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.metrics.namespace | 2.1.0 | SPARK-5847 | 70f846a313061e4db6174e0dc6c12c8c806ccf78#diff-6bdad48cfc34314e89599655442ff210 |
spark.metrics.conf | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-7ea2624e832b166ca27cd4baca8691d9 |  
spark.metrics.executorMetricsSource.enabled | 3.0.0 | SPARK-27189 | 729f43f499f3dd2718c0b28d73f2ca29cc811eac#diff-6bdad48cfc34314e89599655442ff210 |  
spark.metrics.staticSources.enabled | 3.0.0 | SPARK-30060 | 60f20e5ea2000ab8f4a593b5e4217fd5637c5e22#diff-6bdad48cfc34314e89599655442ff210 |  
spark.pyspark.driver.python | 2.1.0 | SPARK-13081 | 7a9e25c38380e6c62080d62ad38a4830e44fe753#diff-6bdad48cfc34314e89599655442ff210 |  
spark.pyspark.python | 2.1.0 | SPARK-13081 | 7a9e25c38380e6c62080d62ad38a4830e44fe753#diff-6bdad48cfc34314e89599655442ff210 |  
spark.history.ui.maxApplications | 2.0.1 | SPARK-17243 | 021aa28f439443cda1bc7c5e3eee7c85b40c1a2d#diff-6bdad48cfc34314e89599655442ff210 |  
spark.io.encryption.enabled | 2.1.0 | SPARK-5682 | 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.io.encryption.keygen.algorithm | 2.1.0 | SPARK-5682 | 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.io.encryption.keySizeBits | 2.1.0 | SPARK-5682 | 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.io.encryption.commons.config.* | 2.1.0 | SPARK-5682 | 4b4e329e49 |  
spark.io.crypto.cipher.transformation | 2.1.0 | SPARK-5682 | 4b4e329e49f8af28fa6301bd06c48d7097eaf9e6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.host | 0.7.0 | None | 02a6761589c35f15f1a6e3b63a7964ba057d3ba6#diff-eaf125f56ce786d64dcef99cf446a751 |  
spark.driver.port | 0.7.0 | None | 02a6761589c35f15f1a6e3b63a7964ba057d3ba6#diff-eaf125f56ce786d64dcef99cf446a751 |  
spark.driver.supervise | 1.3.0 | SPARK-5388 | 6ec0cdc14390d4dc45acf31040f21e1efc476fc0#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
spark.driver.bindAddress | 2.1.0 | SPARK-4563 | 2cd1bfa4f0c6625b0ab1dbeba2b9586b9a6a9f42#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blockManager.port | 1.1.0 | SPARK-2157 | 31090e43ca91f687b0bc6e25c824dc25bd7027cd#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.driver.blockManager.port | 2.1.0 | SPARK-4563 | 2cd1bfa4f0c6625b0ab1dbeba2b9586b9a6a9f42#diff-6bdad48cfc34314e89599655442ff210 |  
spark.files.ignoreCorruptFiles | 2.1.0 | SPARK-17850 | 47776e7c0c68590fe446cef910900b1aaead06f9#diff-6bdad48cfc34314e89599655442ff210 |  
spark.files.ignoreMissingFiles | 2.4.0 | SPARK-22676 | ed4101d29f50d54fd7846421e4c00e9ecd3599d0#diff-6bdad48cfc34314e89599655442ff210 |  
spark.log.callerContext | 2.2.0 | SPARK-16759 | 3af894511be6fcc17731e28b284dba432fe911f5#diff-6bdad48cfc34314e89599655442ff210 | In branch-2.2 but pom.xml is 2.1.0-SNAPSHOT
spark.files.maxPartitionBytes | 2.1.0 | SPARK-16575 | c8879bf1ee2af9ccd5d5656571d931d2fc1da024#diff-6bdad48cfc34314e89599655442ff210 |  
spark.files.openCostInBytes | 2.1.0 | SPARK-16575 | c8879bf1ee2af9ccd5d5656571d931d2fc1da024#diff-6bdad48cfc34314e89599655442ff210 |  
spark.hadoopRDD.ignoreEmptySplits | 2.3.0 | SPARK-22233 | 0fa10666cf75e3c4929940af49c8a6f6ea874759#diff-6bdad48cfc34314e89599655442ff210 |  
spark.redaction.regex | 2.1.2 | SPARK-18535 and SPARK-19720 | 444cca14d7ac8c5ab5d7e9d080b11f4d6babe3bf#diff-6bdad48cfc34314e89599655442ff210 |  
spark.redaction.string.regex | 2.2.0 | SPARK-20070 | 91fa80fe8a2480d64c430bd10f97b3d44c007bcc#diff-6bdad48cfc34314e89599655442ff210 |  
spark.authenticate.secret | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.authenticate.secretBitLength | 1.6.0 | SPARK-11073 | f8d93edec82eedab59d50aec06ca2de7e4cf14f6#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.authenticate | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.authenticate.enableSaslEncryption | 1.4.0 | SPARK-6229 | 38d4e9e446b425ca6a8fe8d8080f387b08683842#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |
spark.authenticate.secret.file | 3.0.0 | SPARK-26239 | 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 |  
spark.authenticate.secret.driver.file | 3.0.0 | SPARK-26239 | 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 |  
spark.authenticate.secret.executor.file | 3.0.0 | SPARK-26239 | 57d6fbfa8c803ce1791e7be36aba0219a1fcaa63#diff-6bdad48cfc34314e89599655442ff210 |  
spark.buffer.write.chunkSize | 2.3.0 | SPARK-21527 | 574ef6c987c636210828e96d2f797d8f10aff05e#diff-6bdad48cfc34314e89599655442ff210 |  
spark.checkpoint.compress | 2.2.0 | SPARK-19525 | 1405862382185e04b09f84af18f82f2f0295a755#diff-6bdad48cfc34314e89599655442ff210 |  
spark.rdd.checkpoint.cachePreferredLocsExpireTime | 3.0.0 | SPARK-29182 | 4ecbdbb6a7bd3908da32c82832e886b4f9f9e596#diff-6bdad48cfc34314e89599655442ff210 |
spark.shuffle.accurateBlockThreshold | 2.2.1 | SPARK-20801 | 81f63c8923416014d5c6bc227dd3c4e2a62bac8e#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.registration.timeout | 2.3.0 | SPARK-20640 | d107b3b910d8f434fb15b663a9db4c2dfe0a9f43#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.registration.maxAttempts | 2.3.0 | SPARK-20640 | d107b3b910d8f434fb15b663a9db4c2dfe0a9f43#diff-6bdad48cfc34314e89599655442ff210 |  
spark.reducer.maxBlocksInFlightPerAddress | 2.2.1 | SPARK-21243 | 88dccda393bc79dc6032f71b6acf8eb2b4b152be#diff-6bdad48cfc34314e89599655442ff210 |  
spark.network.maxRemoteBlockSizeFetchToMem | 3.0.0 | SPARK-26700 | d8613571bc1847775dd5c1945757279234cb388c#diff-6bdad48cfc34314e89599655442ff210 |
spark.taskMetrics.trackUpdatedBlockStatuses | 2.3.0 | SPARK-20923 | 5b5a69bea9de806e2c39b04b248ee82a7b664d7b#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.sort.io.plugin.class | 3.0.0 | SPARK-28209 | abef84a868e9e15f346eea315bbab0ec8ac8e389#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.file.buffer | 1.4.0 | SPARK-7081 | c53ebea9db418099df50f9adc1a18cee7849cd97#diff-ecdafc46b901740134261d2cab24ccd9 |  
spark.shuffle.unsafe.file.output.buffer | 2.3.0 | SPARK-20950 | 565e7a8d4ae7879ee704fb94ae9b3da31e202d7e#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.spill.diskWriteBufferSize | 2.3.0 | SPARK-20950 | 565e7a8d4ae7879ee704fb94ae9b3da31e202d7e#diff-6bdad48cfc34314e89599655442ff210 |  
spark.storage.unrollMemoryCheckPeriod | 2.3.0 | SPARK-21923 | a11db942aaf4c470a85f8a1b180f034f7a584254#diff-6bdad48cfc34314e89599655442ff210 |  
spark.storage.unrollMemoryGrowthFactor | 2.3.0 | SPARK-21923 | a11db942aaf4c470a85f8a1b180f034f7a584254#diff-6bdad48cfc34314e89599655442ff210 |  
spark.yarn.dist.forceDownloadSchemes | 2.3.0 | SPARK-21917 | 8319432af60b8e1dc00f08d794f7d80591e24d0c#diff-6bdad48cfc34314e89599655442ff210 |  
spark.extraListeners | 1.3.0 | SPARK-5411 | 47e4d579eb4a9aab8e0dd9c1400394d80c8d0388#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.shuffle.spill.numElementsForceSpillThreshold | 1.6.0 | SPARK-10708 | f6d06adf05afa9c5386dc2396c94e7a98730289f#diff-3eedc75de4787b842477138d8cc7f150 |  
spark.shuffle.mapOutput.parallelAggregationThreshold | 2.3.0 | SPARK-22537 | efd0036ec88bdc385f5a9ea568d2e2bbfcda2912#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.maxResultSize | 1.2.0 | SPARK-3466 | 6181577e9935f46b646ba3925b873d031aa3d6ba#diff-d239aee594001f8391676e1047a0381e |
spark.security.credentials.renewalRatio | 2.4.0 | SPARK-23361 | 5fa438471110afbf4e2174df449ac79e292501f8#diff-6bdad48cfc34314e89599655442ff210 |  
spark.security.credentials.retryWait | 2.4.0 | SPARK-23361 | 5fa438471110afbf4e2174df449ac79e292501f8#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.sort.initialBufferSize | 2.1.0 | SPARK-15958 | bf665a958631125a1670504ef5966ef1a0e14798#diff-a1d00506391c1c4b2209f9bbff590c5b | On branch-2.1, but in pom.xml it is 2.0.0-SNAPSHOT
spark.shuffle.compress | 0.6.0 | None | efc5423210d1aadeaea78273a4a8f10425753079#diff-76170a9c8f67b542bc58240a0a12fe08 |  
spark.shuffle.spill.compress | 0.9.0 | None | c3816de5040e3c48e58ed4762d2f4eb606812938#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.shuffle.mapStatus.compression.codec | 3.0.0 | SPARK-29939 | 456cfe6e4693efd26d64f089d53c4e01bf8150a2#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.spill.initialMemoryThreshold | 1.1.1 | SPARK-4480 | 16bf5f3d17624db2a96c921fe8a1e153cdafb06c#diff-31417c461d8901d8e08167b0cbc344c1 |  
spark.shuffle.spill.batchSize | 0.9.0 | None | c3816de5040e3c48e58ed4762d2f4eb606812938#diff-a470b9812a5ac8c37d732da7d9fbe39a |
spark.shuffle.sort.bypassMergeThreshold | 1.1.1 | SPARK-2787 | 0f2274f8ed6131ad17326e3fff7f7e093863b72d#diff-31417c461d8901d8e08167b0cbc344c1 |  
spark.shuffle.manager | 1.1.0 | SPARK-2044 | 508fd371d6dbb826fd8a00787d347235b549e189#diff-60df49b5d3c59f2c4540fa16a90033a1 |  
spark.shuffle.reduceLocality.enabled | 1.5.0 | SPARK-2774 | 96a7c888d806adfdb2c722025a1079ed7eaa2052#diff-6a9ff7fb74fd490a50462d45db2d5e11 |  
spark.shuffle.mapOutput.minSizeForBroadcast | 2.0.0 | SPARK-1239 | d98dd72e7baeb59eacec4fefd66397513a607b2f#diff-609c3f8c26150ca96a94cd27146a809b |  
spark.shuffle.mapOutput.dispatcher.numThreads | 2.0.0 | SPARK-1239 | d98dd72e7baeb59eacec4fefd66397513a607b2f#diff-609c3f8c26150ca96a94cd27146a809b |  
spark.shuffle.detectCorrupt | 2.2.0 | SPARK-4105 | cf33a86285629abe72c1acf235b8bfa6057220a8#diff-eb30a71e0d04150b8e0b64929852e38b |
spark.shuffle.detectCorrupt.useExtraMemory | 3.0.0 | SPARK-26089 | 688b0c01fac0db80f6473181673a89f1ce1be65b#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.sync | 0.8.0 | None | 31da065b1d08c1fad5283e4bcf8e0ed01818c03e#diff-ad46ed23fcc3fa87f30d05204917b917 |  
spark.shuffle.unsafe.fastMergeEnabled | 1.4.0 | SPARK-7081 | c53ebea9db418099df50f9adc1a18cee7849cd97#diff-642ce9f439435408382c3ac3b5c5e0a0 |  
spark.shuffle.sort.useRadixSort | 2.0.0 | SPARK-14724 | e2b5647ab92eb478b3f7b36a0ce6faf83e24c0e5#diff-3eedc75de4787b842477138d8cc7f150 |  
spark.shuffle.minNumPartitionsToHighlyCompress | 2.4.0 | SPARK-24519 | 39dfaf2fd167cafc84ec9cc637c114ed54a331e3#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.useOldFetchProtocol | 3.0.0 | SPARK-25341 | f725d472f51fb80c6ce1882ec283ff69bafb0de4#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.readHostLocalDisk | 3.0.0 | SPARK-30812 | 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27913 from beliefer/add-version-to-core-config-part-three.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-16 10:08:07 +09:00
gatorsmile 4d4c00c1b5 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
### What changes were proposed in this pull request?
The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components.

Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change.

![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png)

### Why are the changes needed?
The current migration guide of SQL is too long for most readers to find the needed info.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27909 from gatorsmile/migrationGuideReorg.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-03-15 07:35:20 +09:00
HyukjinKwon 9628aca68b
[MINOR][DOCS] Fix [[...]] to ... and <code>...</code> in documentation
### What changes were proposed in this pull request?

Before:

- ![Screen Shot 2020-03-13 at 1 19 12 PM](https://user-images.githubusercontent.com/6477701/76589452-7c34f300-652d-11ea-9da7-3754f8575796.png)
- ![Screen Shot 2020-03-13 at 1 19 24 PM](https://user-images.githubusercontent.com/6477701/76589455-7d662000-652d-11ea-9dbe-f5fe10d1e7ad.png)
- ![Screen Shot 2020-03-13 at 1 19 03 PM](https://user-images.githubusercontent.com/6477701/76589449-7b03c600-652d-11ea-8e99-dbe47f561f9c.png)

After:

- ![Screen Shot 2020-03-13 at 1 17 37 PM](https://user-images.githubusercontent.com/6477701/76589437-74754e80-652d-11ea-99f5-14fb4761f915.png)
- ![Screen Shot 2020-03-13 at 1 17 46 PM](https://user-images.githubusercontent.com/6477701/76589442-76d7a880-652d-11ea-8c10-53e595421081.png)
- ![Screen Shot 2020-03-13 at 1 18 15 PM](https://user-images.githubusercontent.com/6477701/76589443-7808d580-652d-11ea-9b1b-e5d11d638335.png)

### Why are the changes needed?
To render the code block properly in the documentation

### Does this PR introduce any user-facing change?
Yes, code rendering in documentation.

### How was this patch tested?

Manually built the doc via `SKIP_API=1 jekyll build`.

Closes #27899 from HyukjinKwon/minor-docss.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-13 16:44:23 -07:00
gatorsmile 1c8526dc87 [SPARK-28093][FOLLOW-UP] Remove migration guide of TRIM changes
### What changes were proposed in this pull request?
Since we reverted the original change in https://github.com/apache/spark/pull/27540, this PR is to remove the corresponding migration guide made in the commit https://github.com/apache/spark/pull/24948

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27896 from gatorsmile/SPARK-28093Followup.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-13 11:45:59 +09:00
Gabor Somogyi 231e65092f [SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector
### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added Postgres support (other supported databases will come in later PRs).

What this PR contains:
* Added `keytab` and `principal` JDBC options
* Added `ConnectionProvider` trait and it's impementations:
  * `BasicConnectionProvider` => unsecure connection
  * `PostgresConnectionProvider` => postgres secure connection
* Added `ConnectionProvider` tests
* Added `PostgresKrbIntegrationSuite` docker integration test
* Created `SecurityUtils` to concentrate re-usable security related functionalities
* Documentation

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce any user-facing change?
Yes, 2 additional JDBC options added:
* keytab
* principal

If both provided then Spark does kerberos authentication.

### How was this patch tested?
To demonstrate the functionality with a standalone application I've created this repository: https://github.com/gaborgsomogyi/docker-kerberos

* Additional + existing unit tests
* Additional docker integration test
* Test on cluster manually
* `SKIP_API=1 jekyll build`

Closes #27637 from gaborgsomogyi/SPARK-30874.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
2020-03-12 19:04:35 -07:00
Kent Yao 7b4b29e8d9
[SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled
### What changes were proposed in this pull request?

spark.sql.legacy.timeParser.enabled should be removed from SQLConf and the migration guide
spark.sql.legacy.timeParsePolicy is the right one

### Why are the changes needed?

fix doc

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

Pass the jenkins

Closes #27889 from yaooqinn/SPARK-31131.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-12 09:24:49 -07:00
beliefer bd2b3f9132 [SPARK-30911][CORE][DOC] Add version information to the configuration of Status
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Status`.
2.Update the docs of `Status`.
3.By the way supplementary documentation about https://github.com/apache/spark/pull/27847

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.appStateStore.asyncTracking.enable | 2.3.0 | SPARK-20653 | 772e4648d95bda3353723337723543c741ea8476#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c |  
spark.ui.liveUpdate.period | 2.3.0 | SPARK-20644 | c7f38e5adb88d43ef60662c5d6ff4e7a95bff580#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c |  
spark.ui.liveUpdate.minFlushPeriod | 2.4.2 | SPARK-27394 | a8a2ba11ac10051423e58920062b50f328b06421#diff-9ab674b7af7b2097f7d28cb6f5fd1e8c |  
spark.ui.retainedJobs | 1.2.0 | SPARK-2321 | 9530316887612dca060a128fca34dd5a6ab2a9a9#diff-1f32bcb61f51133bd0959a4177a066a5 |  
spark.ui.retainedStages | 0.9.0 | None | 112c0a1776bbc866a1026a9579c6f72f293414c4#diff-1f32bcb61f51133bd0959a4177a066a5 | 0.9.0-incubating-SNAPSHOT
spark.ui.retainedTasks | 2.0.1 | SPARK-15083 | 55db26245d69bb02b7d7d5f25029b1a1cd571644#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.retainedDeadExecutors | 2.0.0 | SPARK-7729 | 9f4263392e492b5bc0acecec2712438ff9a257b7#diff-a0ba36f9b1f9829bf3c4689b05ab6cf2 |  
spark.ui.dagGraph.retainedRootRDDs | 2.1.0 | SPARK-17171 | cc87280fcd065b01667ca7a59a1a32c7ab757355#diff-3f492c527ea26679d4307041b28455b8 |  
spark.metrics.appStatusSource.enabled | 3.0.0 | SPARK-30060 | 60f20e5ea2000ab8f4a593b5e4217fd5637c5e22#diff-9f796ae06b0272c1f0a012652a5b68d0 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27848 from beliefer/add-version-to-status-config.

Lead-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 11:03:47 +09:00
beliefer 1cd80fa9fa [SPARK-31109][MESOS][DOC] Add version information to the configuration of Mesos
### What changes were proposed in this pull request?
Add version information to the configuration of `Mesos`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.mesos.$taskType.secret.names | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.values | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.envkeys | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.filenames | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.principal | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.principal.file | 2.4.0 | SPARK-16501 | 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d |  
spark.mesos.secret | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.secret.file | 2.4.0 | SPARK-16501 | 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d |  
spark.shuffle.cleaner.interval | 2.0.0 | SPARK-12583 | 310981d49a332bd329303f610b150bbe02cf5f87#diff-2fafefee94f2a2023ea9765536870258 |  
spark.mesos.dispatcher.webui.url | 2.0.0 | SPARK-13492 | a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-f541460c7a74cee87cbb460b3b01665e |  
spark.mesos.dispatcher.historyServer.url | 2.1.0 | SPARK-16809 | 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-3779e2035d9a09fa5f6af903925b9512 |  
spark.mesos.driver.labels | 2.3.0 | SPARK-21000 | 8da3f7041aafa71d7596b531625edb899970fec2#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.webui.url | 2.0.0 | SPARK-13492 | a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.driver.failoverTimeout | 2.3.0 | SPARK-21456 | c42ef953343073a50ef04c5ce848b574ff7f2238#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.network.name | 2.1.0 | SPARK-18232 | d89bfc92302424406847ac7a9cfca714e6b742fc#diff-ab5bf34f1951a8f7ea83c9456a6c3ab7 |  
spark.mesos.network.labels | 2.3.0 | SPARK-21694 | ce0d3bb377766bdf4df7852272557ae846408877#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.constraints | 2.2.1 | SPARK-19606 | f6ee3d90d5c299e67ae6e2d553c16c0d9759d4b5#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.frameworkId | 2.1.0 | SPARK-16809 | 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.executor.uri | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-a885e7df97790e9b59c21c63353e7476 |  
spark.mesos.proxy.baseURL | 2.3.0 | SPARK-13041 | 663f30d14a0c9219e07697af1ab56e11a714d9a6#diff-0b9b4e122eb666155aa189a4321a6ca8 |  
spark.mesos.coarse | 0.6.0 | None | 63051dd2bcc4bf09d413ff7cf89a37967edc33ba#diff-eaf125f56ce786d64dcef99cf446a751 |  
spark.mesos.coarse.shutdownTimeout | 2.0.0 | SPARK-12330 | c756bda477f458ba4aad7fdb2026263507e0ad9b#diff-d425d35aa23c47a62fbb538554f2f2cf |  
spark.mesos.maxDrivers | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.retainedDrivers | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.cluster.retry.wait.max | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.fetcherCache.enable | 2.1.0 | SPARK-15994 | e34b4e12673fb76c92f661d7c03527410857a0f8#diff-772ea7311566edb25f11a4c4f882179a |  
spark.mesos.appJar.local.resolution.mode | 2.4.0 | SPARK-24326 | 22df953f6bb191858053eafbabaa5b3ebca29f56#diff-6e4d0a0445975f03f975fdc1e3d80e49 |  
spark.mesos.rejectOfferDuration | 2.2.0 | SPARK-19702 | 2e30c0b9bcaa6f7757bd85d1f1ec392d5f916f83#diff-daf48dabbe58afaeed8787751750b01d |  
spark.mesos.rejectOfferDurationForUnmetConstraints | 1.6.0 | SPARK-10471 | 74f50275e429e649212928a9f36552941b862edc#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.rejectOfferDurationForReachedMaxCores | 2.0.0 | SPARK-13001 | 1e7d9bfb5a41f5c2479ab3b4d4081f00bf00bd31#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.uris | 1.5.0 | SPARK-8798 | a2f805729b401c68b60bd690ad02533b8db57b58#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.executor.home | 1.1.1 | SPARK-3264 | 069ecfef02c4af69fc0d3755bd78be321b68b01d#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.mesosExecutor.cores | 1.4.0 | SPARK-6350 | 6fbeb82e13db7117d8f216e6148632490a4bc5be#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.extra.cores | 0.6.0 | None | 2d761e3353651049f6707c74bb5ffdd6e86f6f35#diff-37af8c6e3634f97410ade813a5172621 |  
spark.mesos.executor.memoryOverhead | 1.1.1 | SPARK-3535 | 6f150978477830bbc14ba983786dd2bce12d1fe2#diff-6b498f5407d10e848acac4a1b182457c |  
spark.mesos.executor.docker.image | 1.4.0 | SPARK-2691 | 8f50a07d2188ccc5315d979755188b1e5d5b5471#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.executor.docker.forcePullImage | 2.1.0 | SPARK-15271 | 978cd5f125eb5a410bad2e60bf8385b11cf1b978#diff-0dd025320c7ecda2ea310ed7172d7f5a |  
spark.mesos.executor.docker.portmaps | 1.4.0 | SPARK-7373 | 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.executor.docker.parameters | 2.2.0 | SPARK-19740 | a888fed3099e84c2cf45e9419f684a3658ada19d#diff-4139e6605a8c7f242f65cde538770c99 |  
spark.mesos.executor.docker.volumes | 1.4.0 | SPARK-7373 | 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.gpus.max | 2.1.0 | SPARK-14082 | 29f186bfdf929b1e8ffd8e33ee37b76d5dc5af53#diff-d427ee890b913c5a7056be21eb4f39d7 |  
spark.mesos.task.labels | 2.2.0 | SPARK-20085 | c8fc1f3badf61bcfc4bd8eeeb61f73078ca068d1#diff-387c5d0c916278495fc28420571adf9e |  
spark.mesos.constraints | 1.5.0 | SPARK-6707 | 1165b17d24cdf1dbebb2faca14308dfe5c2a652c#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.containerizer | 2.1.0 | SPARK-16637 | 266b92faffb66af24d8ed2725beb80770a2d91f8#diff-0dd025320c7ecda2ea310ed7172d7f5a |  
spark.mesos.role | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
The following appears in the document |   |   |   |  
spark.mesos.driverEnv.[EnvironmentVariableName] | 2.1.0 | SPARK-16194 | 235cb256d06653bcde4c3ed6b081503a94996321#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.dispatcher.driverDefault.[PropertyName] | 2.1.0 | SPARK-16927 and SPARK-16923 | eca58755fbbc11937b335ad953a3caff89b818e6#diff-b964c449b99c51f0a5fd77270b2951a4 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27863 from beliefer/add-version-to-mesos-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 11:02:29 +09:00
beliefer 1254c88034 [SPARK-31118][K8S][DOC] Add version information to the configuration of K8S
### What changes were proposed in this pull request?
Add version information to the configuration of `K8S`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.kubernetes.context | 3.0.0 | SPARK-25887 | c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.master | 3.0.0 | SPARK-30371 | f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.namespace | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image | 2.3.0 | SPARK-22994 | b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullPolicy | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullSecrets | 2.4.0 | SPARK-23668 | cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.driver
KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName | 3.1.0 | SPARK-30122 | f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.executor
spark.kubernetes.driver.limit.cores | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.request.cores | 3.0.0 | SPARK-27754 | 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submitInDriver | 2.4.0 | SPARK-22839 | f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.limit.cores | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.scheduler.name | 3.0.0 | SPARK-29436 | f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.request.cores | 2.4.0 | SPARK-23285 | fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.pod.name | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.resourceNamePrefix | 3.0.0 | SPARK-25876 | 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podNamePrefix | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.size | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.delay | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.lostCheck.maxAttempts | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.waitAppCompletion | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.report.interval | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.apiPollingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.eventProcessingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.memoryOverheadFactor | 2.4.0 | SPARK-23984 | 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.pyspark.pythonVersion | 2.4.0 | SPARK-23984 | a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.path | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.hadoop.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.name | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.itemKey | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.resource.type | 2.4.1 | SPARK-25021 | 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.local.dirs.tmpfs | 3.0.0 | SPARK-25262 | da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 | It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot
spark.kubernetes.driver.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.deleteOnTermination | 3.0.0 | SPARK-25515 | 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.dynamicAllocation.deleteGracePeriod | 3.0.0 | SPARK-28487 | 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.appKillPodDeletionGracePeriod | 3.0.0 | SPARK-24793 | 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.file.upload.path | 3.0.0 | SPARK-23153 | 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 |  
The following appears in the document |   |   |   |  
spark.kubernetes.authenticate.submission.caCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientKeyFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthToken | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthTokenFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthToken | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.caCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientKeyFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthToken | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthTokenFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.node.selector.[labelKey] | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driverEnv.[EnvironmentVariableName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists UT

Closes #27875 from beliefer/add-version-to-k8s-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:54:08 +09:00
beliefer 0722dc5fb8 [SPARK-31092][YARN][DOC] Add version information to the configuration of Yarn
### What changes were proposed in this pull request?
Add version information to the configuration of `Yarn`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.yarn.tags | 1.5.0 | SPARK-9782 | 9b731fad2b43ca18f3c5274062d4c7bc2622ab72#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.priority | 3.0.0 | SPARK-29603 | 4615769736f4c052ae1a2de26e715e229154cd2f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.attemptFailuresValidityInterval | 1.6.0 | SPARK-10739 | f97e9323b526b3d0b0fee0ca03f4276f37bb5750#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.executor.failuresValidityInterval | 2.0.0 | SPARK-6735 | 8b44bd52fa40c0fc7d34798c3654e31533fd3008#diff-14b8ed2ef4e3da985300b8d796a38fa9 |
spark.yarn.maxAppAttempts | 1.3.0 | SPARK-2165 | 8fdd48959c93b9cf809f03549e2ae6c4687d1fcd#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.user.classpath.first | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.gatewayPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.replacementPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.queue | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.historyServer.address | 1.0.0 | SPARK-1408 | 0058b5d2c74147d24b127a5432f89ebc7050dc18#diff-923ae58523a12397f74dd590744b8b41 |  
spark.yarn.historyServer.allowTracking | 2.2.0 | SPARK-19554 | 4661d30b988bf773ab45a15b143efb2908d33743#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.archive | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.jars | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.dist.archives | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.files | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.jars | 2.0.0 | SPARK-12343 | 8ba2b7f28fee39c4839e5ea125bd25f5091a3a1e#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.preserve.staging.files | 1.1.0 | SPARK-2933 | b92d823ad13f6fcc325eeb99563bea543871c6aa#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.submit.file.replication | 0.8.1 | None | 4668fcb9ff8f9c176c4866480d52dde5d67c8522#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.submit.waitAppCompletion | 1.4.0 | SPARK-3591 | b65bad65c3500475b974ca0219f218eef296db2c#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.report.interval | 0.9.0 | None | ebdfa6bb9766209bc5a3c4241fa47141c5e9c5cb#diff-e0a7ae95b6d8e04a67ebca0945d27b65 |  
spark.yarn.clientLaunchMonitorInterval | 2.3.0 | SPARK-16019 | 1cad31f00644d899d8e74d58c6eb4e9f72065473#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.waitTime | 1.3.0 | SPARK-3779 | 253b72b56fe908bbab5d621eae8a5f359c639dfd#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.metrics.namespace | 2.4.0 | SPARK-24594 | d2436a85294a178398525c37833dae79d45c1452#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.nodeLabelExpression | 1.6.0 | SPARK-7173 | 7db3610327d0725ec2ad378bc873b127a59bb87a#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.containerLauncherMaxThreads | 1.2.0 | SPARK-1713 | 1f4a648d4e30e837d6cf3ea8de1808e2254ad70b#diff-801a04f9e67321f3203399f7f59234c1 |  
spark.yarn.max.executor.failures | 1.0.0 | SPARK-1183 | 698373211ef3cdf841c82d48168cd5dbe00a57b4#diff-0c239e58b37779967e0841fb42f3415a |  
spark.yarn.scheduler.reporterThread.maxFailures | 1.2.0 | SPARK-3304 | 11c10df825419372df61a8d23c51e8c3cc78047f#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.scheduler.heartbeat.interval-ms | 0.8.1 | None | ee22be0e6c302fb2cdb24f83365c2b8a43a1baab#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.scheduler.initial-allocation.interval | 1.4.0 | SPARK-7533 | 3ddf051ee7256f642f8a17768d161c7b5f55c7e1#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.am.finalMessageLimit | 2.4.0 | SPARK-25174 | f8346d2fc01f1e881e4e3f9c4499bf5f9e3ceb3f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.cores | 1.3.0 | SPARK-1507 | 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.extraJavaOptions | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.extraLibraryPath | 1.4.0 | SPARK-7281 | 7b5dd3e3c0030087eea5a8224789352c03717c1d#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.memoryOverhead | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.memory | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.driver.appUIAddress | 1.1.0 | SPARK-1291 | 72ea56da8e383c61c6f18eeefef03b9af00f5158#diff-2b4617e158e9c5999733759550440b96 |  
spark.yarn.executor.nodeLabelExpression | 1.4.0 | SPARK-6470 | 82fee9d9aad2c9ba2fb4bd658579fe99218cafac#diff-d4620cf162e045960d84c88b2e0aa428 |  
spark.yarn.unmanagedAM.enabled | 3.0.0 | SPARK-22404 | f06bc0cd1dee2a58e04ebf24bf719a2f7ef2dc4e#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.rolledLog.includePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.rolledLog.excludePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.user.jar | 1.1.0 | SPARK-1395 | e380767de344fd6898429de43da592658fd86a39#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.secondary.jars | 0.9.2 | SPARK-1870 | 1d3aab96120c6770399e78a72b5692cf8f61a144#diff-50b743cff4885220c828b16c44eeecfd |  
spark.yarn.cache.filenames | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.sizes | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.timestamps | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.visibilities | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.types | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.confArchive | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.blacklist.executor.launch.blacklisting.enabled | 2.4.0 | SPARK-16630 | b56e9c613fb345472da3db1a567ee129621f6bf3#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.exclude.nodes | 3.0.0 | SPARK-26688 | caceaec93203edaea1d521b88e82ef67094cdea9#diff-4804e0f83ca7f891183eb0db229b4b9a |  
The following appears in the document |   |   |   |  
spark.yarn.am.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.driver.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.executor.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.appMasterEnv.[EnvironmentVariableName] | 1.1.0 | SPARK-1680 | 7b798e10e214cd407d3399e2cab9e3789f9a929e#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.kerberos.relogin.period | 2.3.0 | SPARK-22290 | dc2714da50ecba1bf1fdf555a82a4314f763a76e#diff-4804e0f83ca7f891183eb0db229b4b9a |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27856 from beliefer/add-version-to-yarn-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:52:57 +09:00
beliefer c1b2675f2e [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core
### What changes were proposed in this pull request?
This PR follows up https://github.com/apache/spark/pull/27847.
I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.yarn.isPython | 1.5.0 | SPARK-5479 | 38112905bc3b33f2ae75274afba1c30e116f6e46#diff-4d2ab44195558d5a9d5f15b8803ef39d |
spark.task.cpus | 0.5.0 | None | e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-391214d132a0fb4478f4f9c2313d8966 |  
spark.dynamicAllocation.enabled | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.testing | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.minExecutors | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.initialExecutors | 1.3.0 | SPARK-4585 | b2047b55c5fc85de6b63276d8ab9610d2496e08b#diff-b096353602813e47074ace09a3890d56 |  
spark.dynamicAllocation.maxExecutors | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.executorAllocationRatio | 2.4.0 | SPARK-22683 | 55c4ca88a3b093ee197a8689631be8d1fac1f10f#diff-6bdad48cfc34314e89599655442ff210 |  
spark.dynamicAllocation.cachedExecutorIdleTimeout | 1.4.0 | SPARK-7955 | 6faaf15ba311bc3a79aae40a6c9c4befabb6889f#diff-b096353602813e47074ace09a3890d56 |  
spark.dynamicAllocation.executorIdleTimeout | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.shuffleTracking.enabled | 3.0.0 | SPARK-27963 | 2ddeff97d7329942a98ef363991eeabc3fa71a76#diff-6bdad48cfc34314e89599655442ff210 |  
spark.dynamicAllocation.shuffleTimeout | 3.0.0 | SPARK-27963 | 2ddeff97d7329942a98ef363991eeabc3fa71a76#diff-6bdad48cfc34314e89599655442ff210 |  
spark.dynamicAllocation.schedulerBacklogTimeout | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout | 1.2.0 | SPARK-3795 | 8d59b37b02eb36f37bcefafb952519d7dca744ad#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.locality.wait | 0.5.0 | None | e5c4cd8a5e188592f8786a265c0cd073c69ac886#diff-391214d132a0fb4478f4f9c2313d8966 |  
spark.shuffle.service.enabled | 1.2.0 | SPARK-3796 | f55218aeb1e9d638df6229b36a59a15ce5363482#diff-2b643ea78c1add0381754b1f47eec132 |  
Constants.SHUFFLE_SERVICE_FETCH_RDD_ENABLED | 3.0.0 | SPARK-27677 | e9f3f62b2c0f521f3cc23fef381fc6754853ad4f#diff-6bdad48cfc34314e89599655442ff210 | spark.shuffle.service.fetch.rdd.enabled
spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59218c209d39cbba5959302d8668b885cf6#diff-6bdad48cfc34314e89599655442ff210 |  
spark.shuffle.service.port | 1.2.0 | SPARK-3796 | f55218aeb1e9d638df6229b36a59a15ce5363482#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.kerberos.keytab | 3.0.0 | SPARK-25372 | 51540c2fa677658be954c820bc18ba748e4c8583#diff-6bdad48cfc34314e89599655442ff210 |
spark.kerberos.principal | 3.0.0 | SPARK-25372 | 51540c2fa677658be954c820bc18ba748e4c8583#diff-6bdad48cfc34314e89599655442ff210 |
spark.kerberos.relogin.period | 3.0.0 | SPARK-23781 | 68dde3481ea458b0b8deeec2f99233c2d4c1e056#diff-6bdad48cfc34314e89599655442ff210 |
spark.kerberos.renewal.credentials | 3.0.0 | SPARK-26595 | 2a67dbfbd341af166b1c85904875f26a6dea5ba8#diff-6bdad48cfc34314e89599655442ff210 |  
spark.kerberos.access.hadoopFileSystems | 3.0.0 | SPARK-26766 | d0443a74d185ec72b747fa39994fa9a40ce974cf#diff-6bdad48cfc34314e89599655442ff210 |  
spark.executor.instances | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
spark.yarn.dist.pyFiles | 2.2.1 | SPARK-21714 | d10c9dc3f631a26dbbbd8f5c601ca2001a5d7c80#diff-6bdad48cfc34314e89599655442ff210 |  
spark.task.maxDirectResultSize | 2.0.0 | SPARK-13830 | 2ef4c5963bff3574fe17e669d703b25ddd064e5d#diff-5a0de266c82b95adb47d9bca714e1f1b |  
spark.task.maxFailures | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-264da78fe625d594eae59d1adabc8ae9 |  
spark.task.reaper.enabled | 2.0.3 | SPARK-18761 | 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b |
spark.task.reaper.killTimeout | 2.0.3 | SPARK-18761 | 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b |
spark.task.reaper.pollingInterval | 2.0.3 | SPARK-18761 | 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b |
spark.task.reaper.threadDump | 2.0.3 | SPARK-18761 | 678d91c1d2283d9965a39656af9d383bad093ba8#diff-5a0de266c82b95adb47d9bca714e1f1b |
spark.blacklist.enabled | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.task.maxTaskAttemptsPerExecutor | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.task.maxTaskAttemptsPerNode | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.application.maxFailedTasksPerExecutor | 2.2.0 | SPARK-8425 | 93cdb8a7d0f124b4db069fd8242207c82e263c52#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.stage.maxFailedTasksPerExecutor | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.application.maxFailedExecutorsPerNode | 2.2.0 | SPARK-8425 | 93cdb8a7d0f124b4db069fd8242207c82e263c52#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.stage.maxFailedExecutorsPerNode | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.timeout | 2.1.0 | SPARK-17675 | 9ce7d3e542e786c62f047c13f3001e178f76e06a#diff-6bdad48cfc34314e89599655442ff210 |  
spark.blacklist.killBlacklistedExecutors | 2.2.0 | SPARK-16554 | 6287c94f08200d548df5cc0a401b73b84f9968c4#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.executorTaskBlacklistTime | 1.0.0 | None | ab747d39ddc7c8a314ed2fb26548fc5652af0d74#diff-bad3987c83bd22d46416d3dd9d208e76 |
spark.blacklist.application.fetchFailure.enabled | 2.3.0 | SPARK-13669 and SPARK-20898 | 9e50a1d37a4cf0c34e20a7c1a910ceaff41535a2#diff-6bdad48cfc34314e89599655442ff210 |  
spark.files.fetchFailure.unRegisterOutputOnHost | 2.3.0 | SPARK-19753 | dccc0aa3cf957c8eceac598ac81ac82f03b52105#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.listenerbus.eventqueue.capacity | 2.3.0 | SPARK-20887 | 629f38e171409da614fd635bd8dd951b7fde17a4#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.listenerbus.metrics.maxListenerClassesTimed | 2.3.0 | SPARK-20863 | 2a23cdd078a7409d0bb92cf27718995766c41b1d#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.listenerbus.logSlowEvent | 3.0.0 | SPARK-30812 | 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 |  
spark.scheduler.listenerbus.logSlowEvent.threshold | 3.0.0 | SPARK-29001 | 0346afa8fc348aa1b3f5110df747a64e3b2da388#diff-6bdad48cfc34314e89599655442ff210 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27852 from beliefer/add-version-to-core-config-part-two.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:52:20 +09:00
Wenchen Fan 0f0ccdadb1
[SPARK-31110][DOCS][SQL] refine sql doc for SELECT
### What changes were proposed in this pull request?

A few improvements to the sql ref SELECT doc:
1. correct the syntax of SELECT query
2. correct the default of null sort order
3. correct the GROUP BY syntax
4. several minor fixes

### Why are the changes needed?

refine document

### Does this PR introduce any user-facing change?

N/A

### How was this patch tested?

N/A

Closes #27866 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-11 16:52:40 -07:00
Wenchen Fan 8efb71013d
[SPARK-31091] Revert SPARK-24640 Return NULL from size(NULL) by default
### What changes were proposed in this pull request?

This PR reverts https://github.com/apache/spark/pull/26051 and https://github.com/apache/spark/pull/26066

### Why are the changes needed?

There is no standard requiring that `size(null)` must return null, and returning -1 looks reasonable as well. This is kind of a cosmetic change and we should avoid it if it breaks existing queries. This is similar to reverting TRIM function parameter order change.

### Does this PR introduce any user-facing change?

Yes, change the behavior of `size(null)` back to be the same as 2.4.

### How was this patch tested?

N/A

Closes #27834 from cloud-fan/revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-11 09:55:24 -07:00
Yuanjian Li 3493162c78 [SPARK-31030][SQL] Backward Compatibility for Parsing and formatting Datetime
### What changes were proposed in this pull request?
In Spark version 2.4 and earlier, datetime parsing, formatting and conversion are performed by using the hybrid calendar (Julian + Gregorian).
Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as well as the chosen one in ANSI SQL standard, Spark 3.0 switches to it by using Java 8 API classes (the java.time packages that are based on ISO chronology ). The switching job is completed in SPARK-26651.
But after the switching, there are some patterns not compatible between Java 8 and Java 7, Spark needs its own definition on the patterns rather than depends on Java API.
In this PR, we achieve this by writing the document and shadow the incompatible letters. See more details in [SPARK-31030](https://issues.apache.org/jira/browse/SPARK-31030)

### Why are the changes needed?
For backward compatibility.

### Does this PR introduce any user-facing change?
No.
After we define our own datetime parsing and formatting patterns, it's same to old Spark version.

### How was this patch tested?
Existing and new added UT.
Locally document test:
![image](https://user-images.githubusercontent.com/4833765/76064100-f6acc280-5fc3-11ea-9ef7-82e7dc074205.png)

Closes #27830 from xuanyuanking/SPARK-31030.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-11 14:11:13 +08:00
Qianyang Yu 0f54dc7c03 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2
### What changes were proposed in this pull request?

### Why are the changes needed?

Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), we want to add all the support `Alter Table` syntax for V1 table.

### Does this PR introduce any user-facing change?

Yes

### How was this patch tested?

Before:
The documentation looks like
 [Alter Table](https://github.com/apache/spark/pull/25590)

After:
<img width="850" alt="Screen Shot 2020-03-03 at 2 02 23 PM" src="https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png">
<img width="977" alt="Screen Shot 2020-03-03 at 2 02 41 PM" src="https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png">
<img width="1028" alt="Screen Shot 2020-03-03 at 2 02 59 PM" src="https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png">
<img width="864" alt="Screen Shot 2020-03-03 at 2 03 14 PM" src="https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png">
<img width="823" alt="Screen Shot 2020-03-03 at 2 03 28 PM" src="https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png">
<img width="811" alt="Screen Shot 2020-03-03 at 2 03 42 PM" src="https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png">
<img width="827" alt="Screen Shot 2020-03-03 at 2 03 53 PM" src="https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png">

<img width="783" alt="Screen Shot 2020-03-03 at 2 04 03 PM" src="https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png">

Notes:
Those syntaxes are not supported by v1 Table.

- `ALTER TABLE .. RENAME COLUMN`
- `ALTER TABLE ... DROP (COLUMN | COLUMNS)`
- `ALTER TABLE ... (ALTER | CHANGE) COLUMN? alterColumnAction` only support change comments, not other actions: `datatype, position, (SET | DROP) NOT NULL`
- `ALTER TABLE .. CHANGE COLUMN?`
- `ALTER TABLE .... REPLACE COLUMNS`
- `ALTER TABLE ... RECOVER PARTITIONS`
-

Closes #27779 from kevinyu98/spark-30962-alterT.

Authored-by: Qianyang Yu <qyu@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-03-11 08:47:30 +09:00
beliefer bc490f383d [SPARK-31002][CORE][DOC] Add version information to the configuration of Core
### What changes were proposed in this pull request?
Add version information to the configuration of `Core`.
Note: Because `Core` has a lot of configuration items, I split the items into four PR. Other PR will follows this PR.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.resources.discoveryPlugin | 3.0.0 | SPARK-30689 | 742e35f1d48c2523dda2ce21d73b7ab5ade20582#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.resourcesFile | 3.0.0 | SPARK-27835 | 6748b486a9afe8370786efb64a8c9f3470c62dcf#diff-6bdad48cfc34314e89599655442ff210 |  
SparkLauncher.DRIVER_EXTRA_CLASSPATH | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.driver.extraClassPath
SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.driver.extraJavaOptions
SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.driver.extraLibraryPath
spark.driver.userClassPathFirst | 1.3.0 | SPARK-2996 | 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
spark.driver.cores | 1.3.0 | SPARK-1507 | 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-4d2ab44195558d5a9d5f15b8803ef39d |  
SparkLauncher.DRIVER_MEMORY | 1.1.1 | SPARK-3243 | c1ffa3e4cdfbd1f84b5c8d8de5d0fb958a19e211#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.driver.memory
spark.driver.memoryOverhead | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.log.dfsDir | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.log.layout | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.log.persistToDfs.enabled | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 |  
spark.driver.log.allowErasureCoding | 3.0.0 | SPARK-29105 | 276aaaae8d404975f8701089e9f4dfecd16e0d9f#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.enabled | 1.0.0 | SPARK-1132 | 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.eventLog.dir | 1.0.0 | SPARK-1132 | 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.eventLog.compress | 1.0.0 | SPARK-1132 | 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.eventLog.logBlockUpdates.enabled | 2.3.0 | SPARK-22050 | 1437e344ec0c29a44a19f4513986f5f184c44695#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.erasureCoding.enabled | 3.0.0 | SPARK-25855 | 35506dced739ef16136e9f3d5d48c638899d3cec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.testing | 1.0.1 | None | d4c8af87994acf3707027e6fab25363f51fd4615#diff-e4a5a68c15eed95d038acfed84b0b66a |  
spark.eventLog.buffer.kb | 1.0.0 | SPARK-1132 | 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.eventLog.logStageExecutorMetrics | 3.0.0 | SPARK-30812 | 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.gcMetrics.youngGenerationGarbageCollectors | 3.0.0 | SPARK-25865 | e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.gcMetrics.oldGenerationGarbageCollectors | 3.0.0 | SPARK-25865 | e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.overwrite | 1.0.0 | SPARK-1132 | 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.eventLog.longForm.enabled | 2.4.0 | SPARK-23820 | 71f70130f1b2b4ec70595627f0a02a88e2c0e27d#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.rolling.enabled | 3.0.0 | SPARK-28869 | 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 |  
spark.eventLog.rolling.maxFileSize | 3.0.0 | SPARK-28869 | 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 |  
spark.executor.id | 1.2.0 | SPARK-3377 | 79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-364713d7776956cb8b0a771e9b62f82d |  
SparkLauncher.EXECUTOR_EXTRA_CLASSPATH | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.executor.extraClassPath
spark.executor.heartbeat.dropZeroAccumulatorUpdates | 3.0.0 | SPARK-25449 | 9362c5cc273fdd09f9b3b512e2f6b64bcefc25ab#diff-6bdad48cfc34314e89599655442ff210 |  
spark.executor.heartbeatInterval | 1.1.0 | SPARK-2099 | 8d338f64c4eda45d22ae33f61ef7928011cc2846#diff-5a0de266c82b95adb47d9bca714e1f1b |  
spark.executor.heartbeat.maxFailures | 1.6.2 | SPARK-13522 | 86bf93e65481b8fe5d7532ca6d4cd29cafc9e9dd#diff-5a0de266c82b95adb47d9bca714e1f1b |  
spark.executor.processTreeMetrics.enabled | 3.0.0 | SPARK-27324 | 387ce89a0631f1a4c6668b90ff2a7bbcf11919cd#diff-6bdad48cfc34314e89599655442ff210 |  
spark.executor.metrics.pollingInterval | 3.0.0 | SPARK-26329 | 80ab19b9fd268adfc419457f12b99a5da7b6d1c7#diff-6bdad48cfc34314e89599655442ff210 |  
SparkLauncher.EXECUTOR_EXTRA_JAVA_OPTIONS | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.executor.extraJavaOptions
SparkLauncher.EXECUTOR_EXTRA_LIBRARY_PATH | 1.0.0 | None | 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.executor.extraLibraryPath
spark.executor.userClassPathFirst | 1.3.0 | SPARK-2996 | 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
SparkLauncher.EXECUTOR_CORES | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-4d2ab44195558d5a9d5f15b8803ef39d | spark.executor.cores
SparkLauncher.EXECUTOR_MEMORY | 0.7.0 | None | 696eec32c982ca516c506de33f383a173bcbd131#diff-4f50ad37deb6742ad45472636c9a870b | spark.executor.memory
spark.executor.memoryOverhead | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 |  
spark.cores.max | 0.6.0 | None | 0a472840030e4e7e84fe748f7bfa49f1ece599c5#diff-b6cc54c092b861f645c3cd69ea0f91e2 |  
spark.memory.offHeap.enabled | 1.6.0 | SPARK-12251 | 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.memory.offHeap.size | 1.6.0 | SPARK-12251 | 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.memory.storageFraction | 1.6.0 | SPARK-10983 | b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.memory.fraction | 1.6.0 | SPARK-10983 | b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.storage.safetyFraction | 1.1.0 | [SPARK-1777 | ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.storage.unrollMemoryThreshold | 1.1.0 | SPARK-1777 | ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-692a329b5a7fb4134c55d559457b94e4 |
spark.storage.replication.proactive | 2.2.0 | SPARK-15355 | fa7c582e9442b985a0493fb1dd15b3fb9b6031b4#diff-186864190089a718680accb51de5f0d4 |  
spark.storage.memoryMapThreshold | 0.9.2 | SPARK-1145 | 76339495153dd895667ad609815c887b2c8960ea#diff-abd96f2ae793cd6ea6aab5b96a3c1d7a |
spark.storage.replication.policy | 2.1.0 | SPARK-15353 | a26afd52198523dbd51dc94053424494638c7de5#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.storage.replication.topologyMapper | 2.1.0 | SPARK-15353 | a26afd52198523dbd51dc94053424494638c7de5#diff-186864190089a718680accb51de5f0d4 |
spark.storage.cachedPeersTtl | 1.1.1 | SPARK-3495 and SPARK-3496 | be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.storage.maxReplicationFailures | 1.1.1 | SPARK-3495 and SPARK-3496 | be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.storage.replication.topologyFile | 2.1.0 | SPARK-15353 | a26afd52198523dbd51dc94053424494638c7de5#diff-e550ce522c12a31d805a7d0f41e802af |  
spark.storage.exceptionOnPinLeak | 1.6.2 | SPARK-13566 | ab006523b840b1d2dbf3f5ff0a238558e7665a1e#diff-5a0de266c82b95adb47d9bca714e1f1b |  
spark.storage.blockManagerTimeoutIntervalMs | 0.7.3 | None | 9085ebf3750c7d9bb7c6b5f6b4bdc5b807af93c2#diff-76170a9c8f67b542bc58240a0a12fe08 |  
spark.storage.blockManagerSlaveTimeoutMs | 0.7.0 | None | 97434f49b8c029e9b78c91ec5f58557cd1b5c943#diff-2ce6374aac24d70c69182b067216e684 |
spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 |  
spark.diskStore.subDirectories | 0.6.0 | None | 815d6bd69a0c1ba0e94fc0785f5c3619b37f19c5#diff-e8b73c5b81c403a5e5d581f97624c510 |  
spark.block.failures.beforeLocationRefresh | 2.0.0 | SPARK-13328 | ff776b2fc1cd4c571fd542dbf807e6fa3373cb34#diff-2b643ea78c1add0381754b1f47eec132 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27847 from beliefer/add-version-to-core-config-part-one.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-08 12:31:57 +09:00
Huaxin Gao 513f76ac38 [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release
### What changes were proposed in this pull request?
Update ml-guide and ml-migration-guide for 3.0.

### Why are the changes needed?
This is required for each release.

### Does this PR introduce any user-facing change?
Yes.
![image](https://user-images.githubusercontent.com/13592258/75957386-c8699e80-5e6e-11ea-9dec-7295f8f0bf33.png)

![image](https://user-images.githubusercontent.com/13592258/75957406-cef81600-5e6e-11ea-921f-20509771b49b.png)

![image](https://user-images.githubusercontent.com/13592258/75957423-d4edf700-5e6e-11ea-8e75-d41c532c8ba9.png)

![image](https://user-images.githubusercontent.com/13592258/75957434-da4b4180-5e6e-11ea-899b-f4e080b318ff.png)

### How was this patch tested?
Manually build and check.

Closes #27785 from huaxingao/spark-30934.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-07 18:09:00 -06:00
Nicholas Chammas 7892f88f84 [SPARK-30879][DOCS] Refine workflow for building docs
### What changes were proposed in this pull request?

This PR makes the following refinements to the workflow for building docs:
* Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile.
* Pin the Python and Ruby versions we use.
* Pin all direct Python and Ruby dependency versions.
* Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`.

### Why are the changes needed?

This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code.

Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: https://github.com/apache/spark/pull/27459#discussion_r376135719

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manual tests:
* I was able to build the Docker image successfully, minus the final part about `RUN useradd`.
    * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key.
* I built the docs locally and viewed them in the browser.

I think I need a committer to more fully test out these changes.

Closes #27534 from nchammas/SPARK-30731-building-docs.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-07 11:43:32 -06:00
Huaxin Gao 4a64901ab7 [SPARK-31012][ML][PYSPARK][DOCS] Updating ML API docs for 3.0 changes
### What changes were proposed in this pull request?
Updating ML docs for 3.0 changes

### Why are the changes needed?
I am auditing 3.0 ML changes, found some docs are missing or not updated. Need to update these.

### Does this PR introduce any user-facing change?
Yes, doc changes

### How was this patch tested?
Manually build and check

Closes #27762 from huaxingao/spark-doc.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-07 11:42:05 -06:00
Takeshi Yamamuro 71c73d58f6 [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
### What changes were proposed in this pull request?

This pr intends to support 32 or more grouping attributes for GROUPING_ID. In the current master, an integer overflow can occur to compute grouping IDs;
e75d9afb2f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (L613)

For example, the query below generates wrong grouping IDs in the master;
```

scala> val numCols = 32 // or, 31
scala> val cols = (0 until numCols).map { i => s"c$i" }
scala> sql(s"create table test_$numCols (${cols.map(c => s"$c int").mkString(",")}, v int) using parquet")
scala> val insertVals = (0 until numCols).map { _ => 1 }.mkString(",")
scala> sql(s"insert into test_$numCols values ($insertVals,3)")
scala> sql(s"select grouping_id(), sum(v) from test_$numCols group by grouping sets ((${cols.mkString(",")}), (${cols.init.mkString(",")}))").show(10, false)
scala> sql(s"drop table test_$numCols")

// numCols = 32
+-------------+------+
|grouping_id()|sum(v)|
+-------------+------+
|0            |3     |
|0            |3     | // Wrong Grouping ID
+-------------+------+

// numCols = 31
+-------------+------+
|grouping_id()|sum(v)|
+-------------+------+
|0            |3     |
|1            |3     |
+-------------+------+
```
To fix this issue, this pr change code to use long values for `GROUPING_ID` instead of int values.
### Why are the changes needed?

To support more cases in `GROUPING_ID`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes #26918 from maropu/FixGroupingIdIssue.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-03-06 16:57:03 +09:00
beliefer e36227e2d9 [SPARK-30914][CORE][DOC] Add version information to the configuration of UI
### What changes were proposed in this pull request?
1.Add version information to the configuration of `UI`.
2.Update the docs of `UI`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.ui.showConsoleProgress | 1.2.1 | SPARK-4017 | 04b1bdbae31c3039125100e703121daf7d9dabf5#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.ui.consoleProgress.update.interval | 2.1.0 | SPARK-16919 | e076fb05ac83a3ed6995e29bb03ea07ea05e39db#diff-fbf4e388a66b6a37e984b91cd71a3e2c |  
spark.ui.enabled | 1.1.1 | SPARK-3490 | 937de93e80e6d299c4d08be426da2d5bc2d66f98#diff-364713d7776956cb8b0a771e9b62f82d |  
spark.ui.port | 0.7.0 | None | f03d9760fd8ac67fd0865cb355ba75d2eff507fe#diff-ed8dbcebe16fda5ecd6df1a981dc6fee |  
spark.ui.filters | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-f79a5ead735b3d0b34b6b94486918e1c |  
spark.ui.allowFramingFrom | 1.6.0 | SPARK-10589 | 5dbaf3d3911bbfa003bc75459aaad66b4f6e0c67#diff-f79a5ead735b3d0b34b6b94486918e1c |  
spark.ui.reverseProxy | 2.1.0 | SPARK-15487 | 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d |
spark.ui.reverseProxyUrl | 2.1.0 | SPARK-15487 | 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d |
spark.ui.killEnabled | 1.0.0 | SPARK-1202 | 211f97447b5f078afcb1619a08d2e2349325f61a#diff-a40023c80383451b6e29ee7a6e0593e9 |
spark.ui.threadDumpsEnabled | 1.2.0 | SPARK-611 | 866c7bbe56f9c7fd96d3f4afe8a76405dc877a6e#diff-5d18fb70c572369a0fff0b97de94f265 |  
spark.ui.prometheus.enabled | 3.0.0 | SPARK-29064 | bbfaadb280a80b511a98d18881641c6d9851dd51#diff-f70174ad0759db1fb4cb36a7ff9324a7 |  
spark.ui.xXssProtection | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.xContentTypeOptions.enabled | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.strictTransportSecurity | 2.3.0 | SPARK-22188 | 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.requestHeaderSize | 2.2.3 | SPARK-26118 | 9ceee6f188e6c3794d31ce15cc61d29f907bebf7#diff-6bdad48cfc34314e89599655442ff210 |  
spark.ui.timeline.tasks.maximum | 1.4.0 | SPARK-7296 | a5f7b3b9c7f05598a1cc8e582e5facee1029cd5e#diff-fa4cfb2cce1b925f55f41f2dfa8c8501 |  
spark.acls.enable | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.ui.view.acls | 1.0.0 | SPARK-1189 | 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.ui.view.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.admin.acls | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.admin.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.modify.acls | 1.1.0 | SPARK-1890 and SPARK-1891 | e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.modify.acls.groups | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.user.groups.mapping | 2.0.0 | SPARK-4224 | ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 |  
spark.ui.proxyRedirectUri | 3.0.0 | SPARK-30240 | a9fbd310300e57ed58818d7347f3c3172701c491#diff-f70174ad0759db1fb4cb36a7ff9324a7 |  
spark.ui.custom.executor.log.url | 3.0.0 | SPARK-26792 | d5bda2c9e8dde6afc075cc7f65b15fa9aa82231c#diff-f70174ad0759db1fb4cb36a7ff9324a7 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27806 from beliefer/add-version-to-UI-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-06 11:08:57 +09:00
Takeshi Yamamuro ffec7a1964 [SQL][DOCS][MINOR] Fix typos and wrong phrases in docs
### What changes were proposed in this pull request?

This PR intends to fix typos and phrases in the `/docs` directory. To find them, I run the Intellij typo checker.

### Why are the changes needed?

For better documents.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #27819 from maropu/TypoFix-20200306.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-03-05 16:54:59 -08:00
Wenchen Fan 807ea413b4 [SPARK-31019][SQL] make it clear that people can deduplicate map keys
### What changes were proposed in this pull request?

rename the config and make it non-internal.

### Why are the changes needed?

Now we fail the query if duplicated map keys are detected, and provide a legacy config to deduplicate it. However, we must provide a way to get users out of this situation, instead of just rejecting to run the query. This exit strategy should always be there, while legacy config indicates that it may be removed someday.

### Does this PR introduce any user-facing change?

no, just rename a config which was added in 3.0

### How was this patch tested?

add more tests for the fail behavior.

Closes #27772 from cloud-fan/map.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-05 20:43:52 +09:00
Kent Yao 3edab6cc1d [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
### What changes were proposed in this pull request?

-c is short for --conf, it was introduced since v1.1.0 but hidden from users until now

### Why are the changes needed?

### Does this PR introduce any user-facing change?

no

expose hidden feature

### How was this patch tested?

Nah

Closes #27802 from yaooqinn/conf.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-03-04 20:37:51 -08:00
beliefer ebcff675e0 [SPARK-30889][SPARK-30913][CORE][DOC] Add version information to the configuration of Tests.scala and Worker
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Tests` and `Worker`.
2.Update the docs of `Worker`.

I sorted out some information of `Tests` show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.testing.memory | 1.6.0 | SPARK-10983 | b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-395d07dcd46359cca610ce74357f0bb4 |  
spark.testing.dynamicAllocation.scheduleInterval | 2.3.0 | SPARK-22864 | 4e9e6aee44bb2ddb41b567d659358b22fd824222#diff-b096353602813e47074ace09a3890d56 |  
spark.testing | 1.0.1 | SPARK-1606 | ce57624b8232159fe3ec6db228afc622133df591#diff-d239aee594001f8391676e1047a0381e |  
spark.test.noStageRetry | 1.2.0 | SPARK-3796 | f55218aeb1e9d638df6229b36a59a15ce5363482#diff-6a9ff7fb74fd490a50462d45db2d5e11 |  
spark.testing.reservedMemory | 1.6.0 | SPARK-12081 | 84c44b500b5c90dffbe1a6b0aa86f01699b09b96#diff-395d07dcd46359cca610ce74357f0bb4 |
spark.testing.nHosts | 3.0.0 | SPARK-26491 | 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 |  
spark.testing.nExecutorsPerHost | 3.0.0 | SPARK-26491 | 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 |  
spark.testing.nCoresPerExecutor | 3.0.0 | SPARK-26491 | 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 |  
spark.resources.warnings.testing | 3.1.0 | SPARK-29148 | 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 |  
spark.testing.resourceProfileManager | 3.1.0 | SPARK-29148 | 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 |  

I sorted out some information of `Worker` show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |  
spark.worker.timeout | 0.6.2 | None | e395aa295aeec6767df798bf1002b1f30983c1cd#diff-776a630ac2b2ec5fe85c07ca20a58fc0 |  
spark.worker.driverTerminateTimeout | 2.1.2 | SPARK-20843 | ebd72f453aa0b4f68760d28b3e93e6dd33856659#diff-829a8674171f92acd61007bedb1bfa4f |  
spark.worker.cleanup.enabled | 1.0.0 | SPARK-1154 | 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 |  
spark.worker.cleanup.interval | 1.0.0 | SPARK-1154 | 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 |  
spark.worker.cleanup.appDataTtl | 1.0.0 | SPARK-1154 | 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 |  
spark.worker.preferConfiguredMasterAddress | 2.2.1 | SPARK-20529 | 75e5ea294c15ecfb7366ae15dce196aa92c87ca4#diff-916ca56b663f178f302c265b7ef38499 |  
spark.worker.ui.port | 1.1.0 | SPARK-2857 | 12f99cf5f88faf94d9dbfe85cb72d0010a3a25ac#diff-48ca297b6536cb92362bec1487581f05 |  
spark.worker.ui.retainedExecutors | 1.5.0 | SPARK-9202 | c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 |
spark.worker.ui.retainedDrivers | 1.5.0 | SPARK-9202 | c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 |
spark.worker.ui.compressedLogFileLengthCacheSize | 2.0.2 | SPARK-17711 | 26e978a93f029e1a1b5c7524d0b52c8141b70997#diff-d239aee594001f8391676e1047a0381e |  
spark.worker.decommission.enabled | 3.1.0 | SPARK-20628 | d273a2bb0fac452a97f5670edd69d3e452e3e57e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27783 from beliefer/add-version-to-tests-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-05 11:58:21 +09:00
Yuanjian Li f7f1948a8c [SPARK-30289][FOLLOWUP][DOC] Update the migration guide for spark.sql.legacy.ctePrecedencePolicy
### What changes were proposed in this pull request?
Fix the migration guide document for `spark.sql.legacy.ctePrecedence.enabled`, which is introduced in #27579.

### Why are the changes needed?
The config value changed.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Document only.

Closes #27782 from xuanyuanking/SPARK-30829-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-04 13:56:02 +09:00
roland-ondeviceresearch a4aaee01fa [MINOR][DOCS] ForeachBatch java example fix
### What changes were proposed in this pull request?
ForEachBatch Java example was incorrect

### Why are the changes needed?
Example did not compile

### Does this PR introduce any user-facing change?
Yes, to docs.

### How was this patch tested?
In IDE.

Closes #27740 from roland1982/foreachwriter_java_example_fix.

Authored-by: roland-ondeviceresearch <roland@ondeviceresearch.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-03-03 09:24:33 -06:00
yi.wu b517f991fe [SPARK-30969][CORE] Remove resource coordination support from Standalone
### What changes were proposed in this pull request?

Remove automatically resource coordination support from Standalone.

### Why are the changes needed?

Resource coordination is mainly designed for the scenario where multiple workers launched on the same host. However, it's, actually, a non-existed  scenario for today's Spark. Because, Spark now can start multiple executors in a single Worker, while it only allow one executor per Worker at very beginning. So, now, it really help nothing for user to launch multiple workers on the same host. Thus, it's not worth for us to bring over complicated implementation and potential high maintain cost for such an impossible scenario.

### Does this PR introduce any user-facing change?

No, it's Spark 3.0 feature.

### How was this patch tested?

Pass Jenkins.

Closes #27722 from Ngone51/abandon_coordination.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2020-03-02 11:23:07 -08:00
beliefer c63366a693 [SPARK-30891][CORE][DOC] Add version information to the configuration of History
### What changes were proposed in this pull request?
1.Add version information to the configuration of `History`.
2.Update the docs of `History`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.history.fs.logDirectory | 1.1.0 | SPARK-1768 | 21ddd7d1e9f8e2a726427f32422c31706a20ba3f#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.fs.safemodeCheck.interval | 1.6.0 | SPARK-11020 | cf04fdfe71abc395163a625cc1f99ec5e54cc07e#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.fs.update.interval | 1.4.0 | SPARK-6046 | 4527761bcd6501c362baf2780905a0018b9a74ba#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.fs.cleaner.enabled | 1.3.0 | SPARK-3562 | 8942b522d8a3269a2a357e3a274ed4b3e66ebdde#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e | Branch branch-1.3 does not exist, exists in branch-1.4, but it is 1.3.0-SNAPSHOT in pom.xml
spark.history.fs.cleaner.interval | 1.4.0 | SPARK-5933 | 1991337336596f94698e79c2366f065c374128ab#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |
spark.history.fs.cleaner.maxAge | 1.4.0 | SPARK-5933 | 1991337336596f94698e79c2366f065c374128ab#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |
spark.history.fs.cleaner.maxNum | 3.0.0 | SPARK-28294 | bbc2be4f425c4c26450e1bf21db407e81046ce21#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.store.path | 2.3.0 | SPARK-20642 | 74daf622de4e534d5a5929b424a6e836850eefad#diff-19f35f981fdc5b0a46f070b879a9a9fc |  
spark.history.store.maxDiskUsage | 2.3.0 | SPARK-20654 | 8b497046c647a21bbed1bdfbdcb176745a1d5cd5#diff-19f35f981fdc5b0a46f070b879a9a9fc |  
spark.history.ui.port | 1.0.0 | SPARK-1276 | 9ae80bf9bd3e4da7443af97b41fe26aa5d35d70b#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.fs.inProgressOptimization.enabled | 2.4.0 | SPARK-6951 | 653fe02415a537299e15f92b56045569864b6183#diff-19f35f981fdc5b0a46f070b879a9a9fc |  
spark.history.fs.endEventReparseChunkSize | 2.4.0 | SPARK-6951 | 653fe02415a537299e15f92b56045569864b6183#diff-19f35f981fdc5b0a46f070b879a9a9fc |  
spark.history.fs.eventLog.rolling.maxFilesToRetain | 3.0.0 | SPARK-30481 | a2fe73b83c0e7c61d1c83b236565a71e3d005a71#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.fs.eventLog.rolling.compaction.score.threshold | 3.0.0 | SPARK-30481 | a2fe73b83c0e7c61d1c83b236565a71e3d005a71#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.fs.driverlog.cleaner.enabled | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.fs.driverlog.cleaner.interval | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.fs.driverlog.cleaner.maxAge | 3.0.0 | SPARK-25118 | 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.ui.acls.enable | 1.0.1 | Spark 1489 | c8dd13221215275948b1a6913192d40e0c8cbadd#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.ui.admin.acls | 2.1.1 | SPARK-19033 | 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.ui.admin.acls.groups | 2.1.1 | SPARK-19033 | 4ca1788805e4a0131ba8f0ccb7499ee0e0242837#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.fs.numReplayThreads | 2.0.0 | SPARK-13988 | 6fdd0e32a6c3fdce1f3f7e1f8d252af05c419f7b#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.retainedApplications | 1.0.0 | SPARK-1276 | 9ae80bf9bd3e4da7443af97b41fe26aa5d35d70b#diff-b49b5b9c31ddb36a9061004b5b723058 |
spark.history.provider | 1.1.0 | SPARK-1768 | 21ddd7d1e9f8e2a726427f32422c31706a20ba3f#diff-a7befb99e7bd7e3ab5c46c2568aa5b3e |  
spark.history.kerberos.enabled | 1.0.1 | Spark-1490 | 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.kerberos.principal | 1.0.1 | Spark-1490 | 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.kerberos.keytab | 1.0.1 | Spark-1490 | 866b03ef4d27b2160563b58d577de29ba6eb4442#diff-b49b5b9c31ddb36a9061004b5b723058 |  
spark.history.custom.executor.log.url | 3.0.0 | SPARK-26311 | ae5b2a6a92be4986ef5b8062d7fb59318cff6430#diff-6bddeb5e25239974fc13db66266b167b |  
spark.history.custom.executor.log.url.applyIncompleteApplication | 3.0.0 | SPARK-26311 | ae5b2a6a92be4986ef5b8062d7fb59318cff6430#diff-6bddeb5e25239974fc13db66266b167b |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27751 from beliefer/add-version-to-history-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-02 15:15:49 +09:00
beliefer 3beb4f875d [SPARK-30908][CORE][DOC] Add version information to the configuration of Kryo
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Kryo`.
2.Update the docs of `Kryo`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.kryo.registrationRequired | 1.1.0 | SPARK-2102 | efdaeb111917dd0314f1d00ee8524bed1e2e21ca#diff-1f81c62dad0e2dfc387a974bb08c497c |  
spark.kryo.registrator | 0.5.0 | None | 91c07a33d90ab0357e8713507134ecef5c14e28a#diff-792ed56b3398163fa14e8578549d0d98 | This is not a release version, do we need to record it?
spark.kryo.classesToRegister | 1.2.0 | SPARK-1813 | 6bb56faea8d238ea22c2de33db93b1b39f492b3a#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.kryo.unsafe | 2.1.0 | SPARK-928 | bc167a2a53f5a795d089e8a884569b1b3e2cd439#diff-1f81c62dad0e2dfc387a974bb08c497c |  
spark.kryo.pool | 3.0.0 | SPARK-26466 | 38f030725c561979ca98b2a6cc7ca6c02a1f80ed#diff-a3c6b992784f9abeb9f3047d3dcf3ed9 |  
spark.kryo.referenceTracking | 0.8.0 | None | 0a8cc309211c62f8824d76618705c817edcf2424#diff-1f81c62dad0e2dfc387a974bb08c497c |  
spark.kryoserializer.buffer | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-1f81c62dad0e2dfc387a974bb08c497c |  
spark.kryoserializer.buffer.max | 1.4.0 | SPARK-5932 | 2d222fb39dd978e5a33cde6ceb59307cbdf7b171#diff-1f81c62dad0e2dfc387a974bb08c497c |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27734 from beliefer/add-version-to-kryo-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-02 15:14:47 +09:00
iRakson 92a5ae2ae4 [SPARK-30234][SQL][FOLLOWUP] Rename spark.sql.legacy.addDirectory.recursive.enabled to spark.sql.legacy.addSingleFileInAddFile
### What changes were proposed in this pull request?
Rename `spark.sql.legacy.addDirectory.recursive.enabled` to `spark.sql.legacy.addSingleFileInAddFile`

### Why are the changes needed?
To follow the naming convention

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UTs.

Closes #27725 from iRakson/SPARK-30234_CONFIG.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-01 10:55:41 +09:00
iRakson a40a2f8338 [SPARK-27619][SQL][FOLLOWUP] Rename 'spark.sql.legacy.useHashOnMapType' to 'spark.sql.legacy.allowHashOnMapType'
### What changes were proposed in this pull request?
Renamed configuration from `spark.sql.legacy.useHashOnMapType` to `spark.sql.legacy.allowHashOnMapType`.

### Why are the changes needed?
Better readability of configuration.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UTs.

Closes #27719 from iRakson/SPARK-27619_FOLLOWUP.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-28 22:57:50 +08:00
yi.wu 22dfd15a45 [SPARK-30937][DOC] Group Hive upgrade guides together
### What changes were proposed in this pull request?

This PR groups all hive upgrade related migration guides inside Spark 3.0 together.

Also add another behavior change of `ScriptTransform` in the new Hive section.

### Why are the changes needed?

Make the doc more clearly to user.

### Does this PR introduce any user-facing change?

No, new doc for Spark 3.0.

### How was this patch tested?

N/A.

Closes #27670 from Ngone51/hive_migration.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-27 21:29:42 +08:00
beliefer 325bf56e73 [SPARK-30888][CORE][DOC] Add version information to the configuration of Network
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Network`.
2.Update the docs of `Network`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.network.crypto.saslFallback | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-0ac65da2bc6b083fb861fe410c7688c2 |  
spark.network.crypto.enabled | 2.2.0 | SPARK-19139 | 8f3f73abc1fe62496722476460c174af0250e3fe#diff-6bdad48cfc34314e89599655442ff210 |  
spark.network.remoteReadNioBufferConversion | 2.4.0 | SPARK-24307 | 2c82745686f4456c4d5c84040a431dcb5b6cb60b#diff-2b643ea78c1add0381754b1f47eec132 |  
spark.network.timeout | 1.3.0 | SPARK-4688 | d3f07fd23cc26a70f44c52e24445974d4885d58a#diff-1df6b5af3d8f9f16255ff8c7a06f402f |  
spark.network.timeoutInterval | 1.3.2 | SPARK-5529 | ec196ab1c7569d7ab0a50c9d7338c2835f2c84d5#diff-47779b72f095f7e7f926898fa1a425ee |  
spark.rpc.askTimeout | 1.4.0 | SPARK-6490 | 8136810dfad12008ac300116df7bc8448740f1ae#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.rpc.connect.threads | 1.6.0 | SPARK-6028 | 084e4e126211d74a79e8dbd2d0e604dd3c650822#diff-0c89b4a60c30a7cd2224bb64d93da942 |  
spark.rpc.io.numConnectionsPerPeer | 1.6.0 | SPARK-10745 | 34a77679877bc40b58a10ec539a8da00fed7db39#diff-0c89b4a60c30a7cd2224bb64d93da942 |  
spark.rpc.io.threads | 1.6.0 | SPARK-6028 | 084e4e126211d74a79e8dbd2d0e604dd3c650822#diff-0c89b4a60c30a7cd2224bb64d93da942 |  
spark.rpc.lookupTimeout | 1.4.0 | SPARK-6490 | 8136810dfad12008ac300116df7bc8448740f1ae#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.rpc.message.maxSize | 2.0.0 | SPARK-7997 | bc1babd63da4ee56e6d371eb24805a5d714e8295#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.rpc.netty.dispatcher.numThreads | 1.6.0 | SPARK-11079 | 1797055dbf1d2fd7714d7c65c8d2efde2f15efc1#diff-05133dfc4bfdb6a27aa092d86ce24866 |  
spark.rpc.numRetries | 1.4.0 | SPARK-6490 | 8136810dfad12008ac300116df7bc8448740f1ae#diff-529fc5c06b9731c1fbda6f3db60b16aa |  
spark.rpc.retry.wait | 1.4.0 | SPARK-6490 | 8136810dfad12008ac300116df7bc8448740f1ae#diff-529fc5c06b9731c1fbda6f3db60b16aa |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27674 from beliefer/add-version-to-network-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-27 11:05:11 +09:00
beliefer c2857501d5 [SPARK-30909][CORE][DOC] Add version information to the configuration of Python
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Python`.
2.Update the docs of `Python`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.python.worker.reuse | 1.2.0 | SPARK-3030 | 2aea0da84c58a179917311290083456dfa043db7#diff-0a67bc4d171abe4df8eb305b0f4123a2 |  
spark.python.task.killTimeout | 2.2.2 | SPARK-22535 | be68f86e11d64209d9e325ce807025318f383bea#diff-0a67bc4d171abe4df8eb305b0f4123a2 |  
spark.python.use.daemon | 2.3.0 | SPARK-22554 | 57c5514de9dba1c14e296f85fb13fef23ce8c73f#diff-9008ad45db34a7eee2e265a50626841b |  
spark.python.daemon.module | 2.4.0 | SPARK-22959 | afae8f2bc82597593595af68d1aa2d802210ea8b#diff-9008ad45db34a7eee2e265a50626841b |  
spark.python.worker.module | 2.4.0 | SPARK-22959 | afae8f2bc82597593595af68d1aa2d802210ea8b#diff-9008ad45db34a7eee2e265a50626841b |  
spark.executor.pyspark.memory | 2.4.0 | SPARK-25004 | 7ad18ee9f26e75dbe038c6034700f9cd4c0e2baa#diff-6bdad48cfc34314e89599655442ff210 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27704 from beliefer/add-version-to-python-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-27 10:57:34 +09:00
beliefer 776e21af40 [SPARK-30910][CORE][DOC] Add version information to the configuration of R
### What changes were proposed in this pull request?
1.Add version information to the configuration of `R`.
2.Update the docs of `R`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.r.backendConnectionTimeout | 2.1.0 | SPARK-17919 | 2881a2d1d1a650a91df2c6a01275eba14a43b42a#diff-025470e1b7094d7cf4a78ea353fb3981 |  
spark.r.numRBackendThreads | 1.4.0 | SPARK-8282 | 28e8a6ea65fd08ab9cefc4d179d5c66ffefd3eb4#diff-697f7f2fc89808e0113efc71ed235db2 |  
spark.r.heartBeatInterval | 2.1.0 | SPARK-17919 | 2881a2d1d1a650a91df2c6a01275eba14a43b42a#diff-fe903bf14db371aa320b7cc516f2463c |  
spark.sparkr.r.command | 1.5.3 | SPARK-10971 | 9695f452e86a88bef3bcbd1f3c0b00ad9e9ac6e1#diff-025470e1b7094d7cf4a78ea353fb3981 |  
spark.r.command | 1.5.3 | SPARK-10971 | 9695f452e86a88bef3bcbd1f3c0b00ad9e9ac6e1#diff-025470e1b7094d7cf4a78ea353fb3981 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27708 from beliefer/add-version-to-R-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-27 10:56:38 +09:00
iRakson c913b9d8b5 [SPARK-27619][SQL] MapType should be prohibited in hash expressions
### What changes were proposed in this pull request?
`hash()` and `xxhash64()` cannot be used on elements of `Maptype`. A new configuration `spark.sql.legacy.useHashOnMapType` is introduced to allow users to restore the previous behaviour.

When `spark.sql.legacy.useHashOnMapType` is set to false:

```
scala> spark.sql("select hash(map())");
org.apache.spark.sql.AnalysisException: cannot resolve 'hash(map())' due to data type mismatch: input to function hash cannot contain elements of MapType; line 1 pos 7;
'Project [unresolvedalias(hash(map(), 42), None)]
+- OneRowRelation
```

when `spark.sql.legacy.useHashOnMapType` is set to true :

```
scala> spark.sql("set spark.sql.legacy.useHashOnMapType=true");
res3: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("select hash(map())").first()
res4: org.apache.spark.sql.Row = [42]

```

### Why are the changes needed?

As discussed in Jira, SparkSql's map hashcodes depends on their order of insertion which is not consistent with the normal scala behaviour which might confuse users.
Code snippet from JIRA :
```
val a = spark.createDataset(Map(1->1, 2->2) :: Nil)
val b = spark.createDataset(Map(2->2, 1->1) :: Nil)

// Demonstration of how Scala Map equality is unaffected by insertion order:
assert(Map(1->1, 2->2).hashCode() == Map(2->2, 1->1).hashCode())
assert(Map(1->1, 2->2) == Map(2->2, 1->1))
assert(a.first() == b.first())

// In contrast, this will print two different hashcodes:
println(Seq(a, b).map(_.selectExpr("hash(*)").first()))
```

Also `MapType` is prohibited for aggregation / joins / equality comparisons #7819 and set operations #17236.

### Does this PR introduce any user-facing change?
Yes. Now users cannot use hash functions on elements of `mapType`. To restore the previous behaviour set `spark.sql.legacy.useHashOnMapType` to true.

### How was this patch tested?
UT added.

Closes #27580 from iRakson/SPARK-27619.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-27 01:48:12 +08:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
yi.wu e9fd52282e [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin
### What changes were proposed in this pull request?

Rename config `spark.resources.discovery.plugin` to `spark.resources.discoveryPlugin`.

Also, as a side minor change: labeled `ResourceDiscoveryScriptPlugin` as `DeveloperApi` since it's not for end user.

### Why are the changes needed?

Discovery plugin doesn't need to reserve the "discovery" namespace here and it's more consistent with the interface name `ResourceDiscoveryPlugin` if we use `discoveryPlugin` instead.

### Does this PR introduce any user-facing change?

No, it's newly added in Spark3.0.

### How was this patch tested?

Pass Jenkins.

Closes #27689 from Ngone51/spark_30689_followup.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-26 11:55:05 +09:00
Jungtaek Lim (HeartSaVioR) 02f8165343 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md
### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : https://github.com/apache/spark/pull/27208#pullrequestreview-347451714

This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

<img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png">

> change on the table

<img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png">

Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 15:17:16 -08:00
beliefer 7911de9d10 [SPARK-30887][CORE][DOC] Add version information to the configuration of Deploy
### What changes were proposed in this pull request?
1.Add version information to the configuration of `Deploy`.
2.Update the docs of `Deploy`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.deploy.recoveryMode | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.deploy.recoveryMode.factory | 1.2.0 | SPARK-1830 |		deefd9d7377a8091a1d184b99066febd0e9f6afd#diff-29dffdccd5a7f4c8b496c293e87c8668 | This configuration appears in branch-1.3, but the version number in the pom.xml file corresponding to the commit is 1.2.0-SNAPSHOT
spark.deploy.recoveryDirectory | 0.8.1 | None |			d66c01f2b6defb3db6c1be99523b734a4d960532#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.deploy.zookeeper.url | 0.8.1 | None |			d66c01f2b6defb3db6c1be99523b734a4d960532#diff-4457313ca662a1cd60197122d924585c |
spark.deploy.zookeeper.dir | 0.8.1 | None | d66c01f2b6defb3db6c1be99523b734a4d960532#diff-a84228cb45c7d5bd93305a1f5bf720b6 |
spark.deploy.retainedApplications | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5ff93142d2dd5c79c63fa947f47a1d4db8b#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.dead.worker.persistence | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f0330f22463ecf7cbee7c0465e10fba8a8#diff-29dffdccd5a7f4c8b496c293e87c8668 |
spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff37cd2503cc6ea82c5dd395187b0910af0#diff-0e7ae91819fc8f7b47b0f97be7116325 |
spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e9a095c1b20dd7a17b6535800d39bff80e#diff-29dffdccd5a7f4c8b496c293e87c8668 |

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Exists UT

Closes #27668 from beliefer/add-version-to-deploy-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-25 11:39:11 +09:00
XU Duo 10fa71321f [SPARK-30901][DOCS] Fix doc exemple with deprecated codes
### What changes were proposed in this pull request?

Previous exemple given for spark-streaming-kinesis was true for Apache Spark < 2.3.0. After that the method used in exemple became deprecated:
deprecated("use initialPosition(initialPosition: KinesisInitialPosition)", "2.3.0")
def initialPositionInStream(initialPosition: InitialPositionInStream)

This PR updates the doc on rewriting exemple in Scala/Java (remain unchanged in Python) to adapt Apache Spark 2.4.0 + releases.

### Why are the changes needed?

It introduces some confusion for developers to test their spark-streaming-kinesis exemple.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

In my opinion, the change is only about the documentation level, so I did not add any special test.

Closes #27652 from supaggregator/SPARK-30901.

Authored-by: XU Duo <Duo.XU@canal-plus.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-02-24 20:16:00 -06:00
roland-ondeviceresearch 9a2cec9b1e [MINOR][DOCS] Fix ForEachWriter Java example
### What changes were proposed in this pull request?
Structured streaming documentation example fix

### Why are the changes needed?
Currently the java example uses incorrect syntax

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
In IDE

Closes #27671 from roland1982/foreachwriter_java_example_fix.

Authored-by: roland-ondeviceresearch <roland@ondeviceresearch.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-22 09:52:45 +09:00
yi.wu 82ce4753aa [SPARK-26580][SQL][ML][FOLLOW-UP] Throw exception when use untyped UDF by default
### What changes were proposed in this pull request?

This PR proposes to throw exception by default when user use untyped UDF(a.k.a `org.apache.spark.sql.functions.udf(AnyRef, DataType)`).

And user could still use it by setting `spark.sql.legacy.useUnTypedUdf.enabled` to `true`.

### Why are the changes needed?

According to #23498, since Spark 3.0, the untyped UDF will return the default value of the Java type if the input value is null. For example, `val f = udf((x: Int) => x, IntegerType)`, `f($"x")` will  return 0 in Spark 3.0 but null in Spark 2.4. And the behavior change is introduced due to Spark3.0 is built with Scala 2.12 by default.

As a result, this might change data silently and may cause correctness issue if user still expect `null` in some cases. Thus, we'd better to encourage user to use typed UDF to avoid this problem.

### Does this PR introduce any user-facing change?

Yeah. User will hit exception now when use untyped UDF.

### How was this patch tested?

Added test and updated some tests.

Closes #27488 from Ngone51/spark_26580_followup.

Lead-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: wuyi <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-21 14:46:54 +08:00
Gengliang Wang 2a695e6d15 [SPARK-30907][DOCS] Revise the doc of spark.ui.retainedTasks
### What changes were proposed in this pull request?

Revise the documentation of `spark.ui.retainedTasks` to make it clear that the configuration is for one stage.

### Why are the changes needed?

There are configurations for the limitation of UI data.
`spark.ui.retainedJobs`, `spark.ui.retainedStages` and `spark.worker.ui.retainedExecutors` are the total max number for one application, while the configuration `spark.ui.retainedTasks` is the max number for one stage.
### Does this PR introduce any user-facing change?

No

### How was this patch tested?

None, just doc.

Closes #27660 from gengliangwang/reviseRetainTask.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-21 10:06:45 +09:00
Wenchen Fan 704d249a56 [SPARK-26071][FOLLOWUP] Improve migration guide of disallowing map type map key
### What changes were proposed in this pull request?

mention the workaround if users do want to use map type as key, and add a test to demonstrate it.

### Why are the changes needed?

it's better to provide an alternative when we ban something.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27621 from cloud-fan/map.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-20 22:10:04 +08:00
Wenchen Fan ef90f1422f [SPARK-30878][SQL][DOC] Improve the CREATE TABLE document
### What changes were proposed in this pull request?

Improve the CREATE TABLE document:
1. mention that some clauses can come in as any order.
2. refine the description for some parameters.
3. mention how data source table interacts with data source
4. make the examples consistent between data source and hive serde tables.

### Why are the changes needed?

improve doc

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27638 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-20 13:07:52 +08:00
Kent Yao 46019b6e6c [MINOR][DOCS] Fix fabric8 version in documentation
### What changes were proposed in this pull request?

fix kubernetes-client version doc

### Why are the changes needed?

correct doc

### Does this PR introduce any user-facing change?

nah
### How was this patch tested?

nah

Closes #27605 from yaooqinn/k8s-version-update.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-02-19 10:47:59 -06:00
Wenchen Fan c7bece3541 [SPARK-27528][FOLLOWUP] improve migration guide
### What changes were proposed in this pull request?

mention that `INT96` timestamp is still useful for interoperability.

### Why are the changes needed?

Give users more context of the behavior changes.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27622 from cloud-fan/parquet.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-19 22:26:56 +08:00
yi.wu 68d7edf949 [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
### What changes were proposed in this pull request?

Revise below config names to comply with [new config naming policy](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-naming-policy-of-Spark-configs-td28875.html):

SQL:
* spark.sql.execution.subquery.reuse.enabled / [SPARK-27083](https://issues.apache.org/jira/browse/SPARK-27083)
* spark.sql.legacy.allowNegativeScaleOfDecimal.enabled / [SPARK-30252](https://issues.apache.org/jira/browse/SPARK-30252)
* spark.sql.adaptive.optimizeSkewedJoin.enabled / [SPARK-29544](https://issues.apache.org/jira/browse/SPARK-29544)
* spark.sql.legacy.property.nonReserved / [SPARK-30183](https://issues.apache.org/jira/browse/SPARK-30183)
* spark.sql.streaming.forceDeleteTempCheckpointLocation.enabled / [SPARK-26389](https://issues.apache.org/jira/browse/SPARK-26389)
* spark.sql.analyzer.failAmbiguousSelfJoin.enabled / [SPARK-28344](https://issues.apache.org/jira/browse/SPARK-28344)
* spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled / [SPARK-30074](https://issues.apache.org/jira/browse/SPARK-30074)
* spark.sql.execution.pandas.arrowSafeTypeConversion / [SPARK-25811](https://issues.apache.org/jira/browse/SPARK-25811)
* spark.sql.legacy.looseUpcast / [SPARK-24586](https://issues.apache.org/jira/browse/SPARK-24586)
* spark.sql.legacy.arrayExistsFollowsThreeValuedLogic / [SPARK-28052](https://issues.apache.org/jira/browse/SPARK-28052)
* spark.sql.sources.ignoreDataLocality.enabled / [SPARK-29189](https://issues.apache.org/jira/browse/SPARK-29189)
* spark.sql.adaptive.shuffle.fetchShuffleBlocksInBatch.enabled / [SPARK-9853](https://issues.apache.org/jira/browse/SPARK-9853)

CORE:
* spark.eventLog.erasureCoding.enabled / [SPARK-25855](https://issues.apache.org/jira/browse/SPARK-25855)
* spark.shuffle.readHostLocalDisk.enabled / [SPARK-30235](https://issues.apache.org/jira/browse/SPARK-30235)
* spark.scheduler.listenerbus.logSlowEvent.enabled / [SPARK-29001](https://issues.apache.org/jira/browse/SPARK-29001)
* spark.resources.coordinate.enable / [SPARK-27371](https://issues.apache.org/jira/browse/SPARK-27371)
* spark.eventLog.logStageExecutorMetrics.enabled / [SPARK-23429](https://issues.apache.org/jira/browse/SPARK-23429)

### Why are the changes needed?

To comply with the config naming policy.

### Does this PR introduce any user-facing change?

No. Configurations listed above are all newly added in Spark 3.0.

### How was this patch tested?

Pass Jenkins.

Closes #27563 from Ngone51/revise_boolean_conf_name.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-18 20:39:50 +08:00
Yuming Wang 76ddb6d835 [SPARK-30755][SQL] Update migration guide and add actionable exception for HIVE-15167
### What changes were proposed in this pull request?
[HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) removed the `SerDe` interface. This may break custom `SerDe` builds for Hive 1.2. This PR update the migration guide for this change.

### Why are the changes needed?

Otherwise:
```
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 05:11:20.446 - stderr>  at java.lang.ClassLoader.defineClass1(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
  2020-01-27 05:11:20.446 - stderr>  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  2020-01-27 05:11:20.446 - stderr>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  2020-01-27 05:11:20.446 - stderr>  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  2020-01-27 05:11:20.446 - stderr>  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  2020-01-27 05:11:20.446 - stderr>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  2020-01-27 05:11:20.446 - stderr>  at java.security.AccessController.doPrivileged(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.446 - stderr>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName(Class.java:348)
  2020-01-27 05:11:20.446 - stderr>  at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
.....
```

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Manual test

Closes #27492 from wangyum/SPARK-30755.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-17 09:26:56 -08:00
Yuanjian Li ab186e3659 [SPARK-25829][SQL] Add config spark.sql.legacy.allowDuplicatedMapKeys and change the default behavior
### What changes were proposed in this pull request?
This is a follow-up for #23124, add a new config `spark.sql.legacy.allowDuplicatedMapKeys` to control the behavior of removing duplicated map keys in build-in functions. With the default value `false`, Spark will throw a RuntimeException while duplicated keys are found.

### Why are the changes needed?
Prevent silent behavior changes.

### Does this PR introduce any user-facing change?
Yes, new config added and the default behavior for duplicated map keys changed to RuntimeException thrown.

### How was this patch tested?
Modify existing UT.

Closes #27478 from xuanyuanking/SPARK-25892-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-17 22:06:58 +08:00
Jungtaek Lim (HeartSaVioR) 5445fe9288 [SPARK-30827][DOCS] Document direct relationship among configurations in "spark.history.*" namespace
### What changes were proposed in this pull request?

This patch adds direct relationship among configurations under "spark.history" namespace.

### Why are the changes needed?

Refer the discussion thread: https://lists.apache.org/thread.html/r43c4e57cace116aca1f0f099e8a577cf202859e3671a04077867b84a%40%3Cdev.spark.apache.org%3E

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Locally ran jekyll and confirmed. Screenshots for the modified spots:

<img width="1159" alt="Screen Shot 2020-02-15 at 8 20 14 PM" src="https://user-images.githubusercontent.com/1317309/74587003-d5922b00-5030-11ea-954b-ee37fc08470a.png">
<img width="1158" alt="Screen Shot 2020-02-15 at 8 20 44 PM" src="https://user-images.githubusercontent.com/1317309/74587005-d62ac180-5030-11ea-98fc-98b1c9d83ff4.png">
<img width="1149" alt="Screen Shot 2020-02-15 at 8 19 56 PM" src="https://user-images.githubusercontent.com/1317309/74587002-d1660d80-5030-11ea-84b5-dec3d7f5c97c.png">

Closes #27575 from HeartSaVioR/SPARK-30827.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-17 20:45:24 +09:00
Jungtaek Lim (HeartSaVioR) 446b2d2653 [SPARK-28869][DOCS][FOLLOWUP] Add direct relationship between configs for rolling event log
### What changes were proposed in this pull request?

This patch addresses the post-hoc review comment linked here - https://github.com/apache/spark/pull/25670#discussion_r373304076

### Why are the changes needed?

We would like to explicitly document the direct relationship before we finish up structuring of configurations.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #27576 from HeartSaVioR/SPARK-28869-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-17 20:41:56 +09:00
Kent Yao 0353cbf092 [MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc
### What changes were proposed in this pull request?

fix style issue in the k8s document, please go to http://spark.apache.org/docs/3.0.0-preview2/running-on-kubernetes.html and search the keyword`spark.kubernetes.file.upload.path` to jump to the error context

### Why are the changes needed?

doc correctness

### Does this PR introduce any user-facing change?

Nah
### How was this patch tested?

Nah

Closes #27582 from yaooqinn/k8s-doc.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-17 12:06:25 +09:00
Bryan Cutler be3cb71e9c [SPARK-30834][DOCS][PYTHON] Add note for recommended pandas and pyarrow versions
### What changes were proposed in this pull request?

Add doc for recommended pandas and pyarrow versions.

### Why are the changes needed?

The recommended versions are those that have been thoroughly tested by Spark CI. Other versions may be used at the discretion of the user.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

NA

Closes #27587 from BryanCutler/python-doc-rec-pandas-pyarrow-SPARK-30834-3.0.

Lead-authored-by: Bryan Cutler <cutlerb@gmail.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-17 11:06:51 +09:00
Gengliang Wang da2ca85cee [SPARK-30703][SQL][DOCS][FOLLOWUP] Declare the ANSI SQL compliance options as experimental
### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/27489.
It declares the ANSI SQL compliance options as experimental in the documentation.

### Why are the changes needed?

The options are experimental. There can be new features/behaviors in future releases.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Generating doc

Closes #27590 from gengliangwang/ExperimentalAnsi.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-17 09:54:00 +09:00
Yuanjian Li 01cc852982 [SPARK-30803][DOCS] Fix the home page link for Scala API document
### What changes were proposed in this pull request?
Change the link to the Scala API document.

```
$ git grep "#org.apache.spark.package"
docs/_layouts/global.html:                                <li><a href="api/scala/index.html#org.apache.spark.package">Scala</a></li>
docs/index.md:* [Spark Scala API (Scaladoc)](api/scala/index.html#org.apache.spark.package)
docs/rdd-programming-guide.md:[Scala](api/scala/#org.apache.spark.package), [Java](api/java/), [Python](api/python/) and [R](api/R/).
```

### Why are the changes needed?
The home page link for Scala API document is incorrect after upgrade to 3.0

### Does this PR introduce any user-facing change?
Document UI change only.

### How was this patch tested?
Local test, attach screenshots below:
Before:
![image](https://user-images.githubusercontent.com/4833765/74335713-c2385300-4dd7-11ea-95d8-f5a3639d2578.png)
After:
![image](https://user-images.githubusercontent.com/4833765/74335727-cbc1bb00-4dd7-11ea-89d9-4dcc1310e679.png)

Closes #27549 from xuanyuanking/scala-doc.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-02-16 09:55:03 -06:00
Huaxin Gao 0a03e7e679 [SPARK-30691][SQL][DOC][FOLLOW-UP] Make link names exactly the same as the side bar names
### What changes were proposed in this pull request?
Make link names exactly the same as the side bar names

### Why are the changes needed?
Make doc look better

### Does this PR introduce any user-facing change?
before:

![image](https://user-images.githubusercontent.com/13592258/74578603-ad300100-4f4a-11ea-8430-11fccf31eab4.png)

after:

![image](https://user-images.githubusercontent.com/13592258/74578670-eff1d900-4f4a-11ea-97d8-5908c0e50e95.png)

### How was this patch tested?
Manually build and check the docs

Closes #27591 from huaxingao/spark-doc-followup.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-02-16 09:53:12 -06:00
HyukjinKwon b343757b1b
[SPARK-29748][DOCS][FOLLOW-UP] Add a note that the legacy environment variable to set in both executor and driver
### What changes were proposed in this pull request?

This PR address the comment at https://github.com/apache/spark/pull/26496#discussion_r379194091 and improves the migration guide to explicitly note that the legacy environment variable to set in both executor and driver.

### Why are the changes needed?

To clarify this env should be set both in driver and executors.

### Does this PR introduce any user-facing change?

Nope.

### How was this patch tested?

I checked it via md editor.

Closes #27573 from HyukjinKwon/SPARK-29748.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
2020-02-14 10:18:08 -08:00
Takeshi Yamamuro 3c4044ea77 [SPARK-30703][SQL][DOCS] Add a document for the ANSI mode
### What changes were proposed in this pull request?

This pr intends to add a document for the ANSI mode;

<img width="600" alt="Screen Shot 2020-02-13 at 8 08 52" src="https://user-images.githubusercontent.com/692303/74386041-5934f780-4e38-11ea-8162-26e524e11c65.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 13" src="https://user-images.githubusercontent.com/692303/74386040-589c6100-4e38-11ea-8a64-899788eaf55f.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 26" src="https://user-images.githubusercontent.com/692303/74386039-5803ca80-4e38-11ea-949f-049208d2203d.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 38" src="https://user-images.githubusercontent.com/692303/74386036-563a0700-4e38-11ea-9ec3-87a8f6771cf0.png">

### Why are the changes needed?

For better document coverage and usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #27489 from maropu/SPARK-30703.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-02-13 10:53:55 -08:00
iRakson 926e3a1efe [SPARK-30790] The dataType of map() should be map<null,null>
### What changes were proposed in this pull request?

`spark.sql("select map()")` returns {}.

After these changes it will return map<null,null>

### Why are the changes needed?
After changes introduced due to #27521, it is important to maintain consistency while using map().

### Does this PR introduce any user-facing change?
Yes. Now map() will give map<null,null> instead of {}.

### How was this patch tested?
UT added. Migration guide updated as well

Closes #27542 from iRakson/SPARK-30790.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-13 12:23:40 +08:00
turbofei 8b1839728a [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf
### What changes were proposed in this pull request?
This pr is a follow up of https://github.com/apache/spark/pull/26200.

In this PR, I modify the description of spark.sql.files.* in sql-performance-tuning.md to keep consistent with that in SQLConf.

### Why are the changes needed?

To keep consistent with the description in SQLConf.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existed UT.

Closes #27545 from turboFei/SPARK-29542-follow-up.

Authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-12 20:21:52 +09:00
HyukjinKwon aa6a60530e [SPARK-30722][PYTHON][DOCS] Update documentation for Pandas UDF with Python type hints
### What changes were proposed in this pull request?

This PR targets to document the Pandas UDF redesign with type hints introduced at SPARK-28264.
Mostly self-describing; however, there are few things to note for reviewers.

1. This PR replace the existing documentation of pandas UDFs to the newer redesign to promote the Python type hints. I added some words that Spark 3.0 still keeps the compatibility though.

2. This PR proposes to name non-pandas UDFs as "Pandas Function API"

3. SCALAR_ITER become two separate sections to reduce confusion:
  - `Iterator[pd.Series]` -> `Iterator[pd.Series]`
  - `Iterator[Tuple[pd.Series, ...]]` -> `Iterator[pd.Series]`

4. I removed some examples that look overkill to me.

5. I also removed some information in the doc, that seems duplicating or too much.

### Why are the changes needed?

To document new redesign in pandas UDF.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests should cover.

Closes #27466 from HyukjinKwon/SPARK-30722.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-12 10:49:46 +09:00
root1 b20754d9ee [SPARK-27545][SQL][DOC] Update the Documentation for CACHE TABLE and UNCACHE TABLE
### What changes were proposed in this pull request?
Document updated for `CACHE TABLE` & `UNCACHE TABLE`

### Why are the changes needed?
Cache table creates a temp view while caching data using `CACHE TABLE name AS query`. `UNCACHE TABLE` does not remove this temp view.

These things were not mentioned in the existing doc for `CACHE TABLE` & `UNCACHE TABLE`.

### Does this PR introduce any user-facing change?
Document updated for `CACHE TABLE` & `UNCACHE TABLE` command.

### How was this patch tested?
Manually

Closes #27090 from iRakson/SPARK-27545.

Lead-authored-by: root1 <raksonrakesh@gmail.com>
Co-authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-11 20:42:02 +08:00
HyukjinKwon 0045be766b [SPARK-29462][SQL] The data type of "array()" should be array<null>
### What changes were proposed in this pull request?

This brings https://github.com/apache/spark/pull/26324 back. It was reverted basically because, firstly Hive compatibility, and the lack of investigations in other DBMSes and ANSI.

- In case of PostgreSQL seems coercing NULL literal to TEXT type.
- Presto seems coercing `array() + array(1)` -> array of int.
- Hive seems  `array() + array(1)` -> array of strings

 Given that, the design choices have been differently made for some reasons. If we pick one of both, seems coercing to array of int makes much more sense.

Another investigation was made offline internally. Seems ANSI SQL 2011, section 6.5 "<contextually typed value specification>" states:

> If ES is specified, then let ET be the element type determined by the context in which ES appears. The declared type DT of ES is Case:
>
> a) If ES simply contains ARRAY, then ET ARRAY[0].
>
> b) If ES simply contains MULTISET, then ET MULTISET.
>
> ES is effectively replaced by CAST ( ES AS DT )

From reading other related context, doing it to `NullType`. Given the investigation made, choosing to `null` seems correct, and we have a reference Presto now. Therefore, this PR proposes to bring it back.

### Why are the changes needed?
When empty array is created, it should be declared as array<null>.

### Does this PR introduce any user-facing change?
Yes, `array()` creates `array<null>`. Now `array(1) + array()` can correctly create `array(1)` instead of `array("1")`.

### How was this patch tested?
Tested manually

Closes #27521 from HyukjinKwon/SPARK-29462.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-11 17:22:08 +09:00
Liang-Chi Hsieh acfdb46a60 [SPARK-27946][SQL][FOLLOW-UP] Change doc and error message for SHOW CREATE TABLE
### What changes were proposed in this pull request?

This is a follow-up for #24938 to tweak error message and migration doc.

### Why are the changes needed?

Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes #27505 from viirya/SPARK-27946-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
2020-02-10 10:45:00 -08:00
Nicholas Chammas 339c0f9a62 [SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options
### What changes were proposed in this pull request?

This PR adds a doc builder for Spark SQL's configuration options.

Here's what the new Spark SQL config docs look like ([configuration.html.zip](https://github.com/apache/spark/files/4172109/configuration.html.zip)):

![Screen Shot 2020-02-07 at 12 13 23 PM](https://user-images.githubusercontent.com/1039369/74050007-425b5480-49a3-11ea-818c-42700c54d1fb.png)

Compare this to the [current docs](http://spark.apache.org/docs/3.0.0-preview2/configuration.html#spark-sql):

![Screen Shot 2020-02-04 at 4 55 10 PM](https://user-images.githubusercontent.com/1039369/73790828-24a5a980-476f-11ea-998c-12cd613883e8.png)

### Why are the changes needed?

There is no visibility into the various Spark SQL configs on [the config docs page](http://spark.apache.org/docs/3.0.0-preview2/configuration.html#spark-sql).

### Does this PR introduce any user-facing change?

No, apart from new documentation.

### How was this patch tested?

I tested this manually by building the docs and reviewing them in my browser.

Closes #27459 from nchammas/SPARK-30510-spark-sql-options.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-09 19:20:47 +09:00
Yuanjian Li e1cd4d9dc2 [SPARK-29587][DOC][FOLLOWUP] Add SQL tab in the Data Types page
### What changes were proposed in this pull request?
Add the new tab `SQL` in the `Data Types` page.

### Why are the changes needed?
New type added in SPARK-29587.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Locally test by Jekyll.
![image](https://user-images.githubusercontent.com/4833765/73908593-2e511d80-48e5-11ea-85a7-6ee451e6b727.png)

Closes #27447 from xuanyuanking/SPARK-29587-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-08 14:28:15 -08:00
Yuanjian Li 3db3e39f11 [SPARK-28228][SQL] Change the default behavior for name conflict in nested WITH clause
### What changes were proposed in this pull request?
This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes #27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-08 14:10:28 -08:00
Yuanjian Li 4804445327 [MINOR][DOC] Fix document UI left menu broken
### What changes were proposed in this pull request?
Fix the left menu broken introduced in #25459.

### Why are the changes needed?
The `left-menu-wrapper` CSS reused for both ml-guide and sql-programming-guide, the before changes will break the UI.

Before:
![image](https://user-images.githubusercontent.com/4833765/73952563-1061d800-493a-11ea-8a75-d802a1534a44.png)
![image](https://user-images.githubusercontent.com/4833765/73952584-18217c80-493a-11ea-85a3-ce5f9875545f.png)
![image](https://user-images.githubusercontent.com/4833765/73952605-21124e00-493a-11ea-8d79-24f4dfec73d9.png)

After:
![image](https://user-images.githubusercontent.com/4833765/73952630-2a031f80-493a-11ea-80ff-4630801cfaf4.png)
![image](https://user-images.githubusercontent.com/4833765/73952652-30919700-493a-11ea-9db1-8bb4a3f913b4.png)
![image](https://user-images.githubusercontent.com/4833765/73952671-35eee180-493a-11ea-801b-d50c4397adf2.png)

### Does this PR introduce any user-facing change?
Document UI change only.

### How was this patch tested?
Local test, screenshot attached below.

Closes #27479 from xuanyuanking/doc-ui.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-06 14:58:53 -08:00
Yuanjian Li d8613571bc [SPARK-26700][CORE][FOLLOWUP] Add config spark.network.maxRemoteBlockSizeFetchToMem
### What changes were proposed in this pull request?
Add new config `spark.network.maxRemoteBlockSizeFetchToMem` fallback to the old config `spark.maxRemoteBlockSizeFetchToMem`.

### Why are the changes needed?
For naming consistency.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #27463 from xuanyuanking/SPARK-26700-follow.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-06 20:53:44 +08:00
WeichenXu ec70e0708f [MINOR][DOC] Add migration note for removing org.apache.spark.ml.image.ImageSchema.readImages
### What changes were proposed in this pull request?

Add migration note for removing `org.apache.spark.ml.image.ImageSchema.readImages`

### Why are the changes needed?

### Does this PR introduce any user-facing change?

### How was this patch tested?

Closes #27467 from WeichenXu123/SC-26286.

Authored-by: WeichenXu <weichen.xu@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 07:54:16 -08:00
Maxim Gekk 459e757ed4 [SPARK-30668][SQL] Support SimpleDateFormat patterns in parsing timestamps/dates strings
### What changes were proposed in this pull request?
In the PR, I propose to partially revert the commit 51a6ba0181, and provide a legacy parser based on `FastDateFormat` which is compatible to `SimpleDateFormat`.

To enable the legacy parser, set `spark.sql.legacy.timeParser.enabled` to `true`.

### Why are the changes needed?
To allow users to restore old behavior in parsing timestamps/dates using `SimpleDateFormat` patterns. The main reason for restoring is `DateTimeFormatter`'s patterns are not fully compatible to `SimpleDateFormat` patterns, see https://issues.apache.org/jira/browse/SPARK-30668

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
- Added new test to `DateFunctionsSuite`
- Restored additional test cases in `JsonInferSchemaSuite`.

Closes #27441 from MaxGekk/support-simpledateformat.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-05 18:48:45 +08:00
yi.wu 5983ad9cc4 [SPARK-30506][SQL][DOC] Document for generic file source options/configs
### What changes were proposed in this pull request?

Add a new document page named *Generic File Source Options* for *Data Sources* menu and added following sub items:

* spark.sql.files.ignoreCorruptFiles
* spark.sql.files.ignoreMissingFiles
* pathGlobFilter
* recursiveFileLookup

And here're snapshots of the generated document:
<img width="1080" alt="doc-1" src="https://user-images.githubusercontent.com/16397174/73816825-87a54800-4824-11ea-97da-e5c40c59a7d4.png">
<img width="1081" alt="doc-2" src="https://user-images.githubusercontent.com/16397174/73816827-8a07a200-4824-11ea-99ec-9c8b0286625e.png">
<img width="1080" alt="doc-3" src="https://user-images.githubusercontent.com/16397174/73816831-8c69fc00-4824-11ea-84f0-6c9e94c2f0e2.png">
<img width="1081" alt="doc-4" src="https://user-images.githubusercontent.com/16397174/73816834-8f64ec80-4824-11ea-9355-76ad45476634.png">

### Why are the changes needed?

Better guidance for end-user.

### Does this PR introduce any user-facing change?

No, added in Spark 3.0.

### How was this patch tested?

Pass Jenkins.

Closes #27302 from Ngone51/doc-generic-file-source-option.

Lead-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-02-05 17:16:38 +08:00
Dongjoon Hyun 898716980d Revert "[SPARK-28310][SQL] Support (FIRST_VALUE|LAST_VALUE)(expr[ (IGNORE|RESPECT) NULLS]?) syntax"
### What changes were proposed in this pull request?

This reverts commit b89c3de1a4.

### Why are the changes needed?

`FIRST_VALUE` is used only for window expression. Please see the discussion on https://github.com/apache/spark/pull/25082 .

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Pass the Jenkins.

Closes #27458 from dongjoon-hyun/SPARK-28310.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-04 17:26:46 -08:00
Liang-Chi Hsieh 7631275f97 [SPARK-25040][SQL][FOLLOWUP] Add legacy config for allowing empty strings for certain types in json parser
### What changes were proposed in this pull request?

This is a follow-up for #22787. In #22787 we disallowed empty strings for json parser except for string and binary types. This follow-up adds a legacy config for restoring previous behavior of allowing empty string.

### Why are the changes needed?

Adding a legacy config to make migration easy for Spark users.

### Does this PR introduce any user-facing change?

Yes. If set this legacy config to true, the users can restore previous behavior prior to Spark 3.0.0.

### How was this patch tested?

Unit test.

Closes #27456 from viirya/SPARK-25040-followup.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-04 17:22:23 -08:00
Maxim Gekk 0202b675af [SPARK-26618][SQL][FOLLOWUP] Describe the behavior change of typed TIMESTAMP/DATE literals
### What changes were proposed in this pull request?
In the PR, I propose to update the SQL migration guide, and clarify behavior change of typed `TIMESTAMP` and `DATE` literals for input strings without time zone information - local timestamp and date strings.

### Why are the changes needed?
To inform users that the typed literals may change their behavior in Spark 3.0 because of different sources of the default time zone - JVM system time zone in Spark 2.4 and earlier, and `spark.sql.session.timeZone` in Spark 3.0.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27435 from MaxGekk/timestamp-lit-migration-guide.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-04 16:33:34 +09:00
Yuming Wang cd5f03a3ba [SPARK-27686][DOC][SQL] Update migration guide for make Hive 2.3 dependency by default
### What changes were proposed in this pull request?

We have upgraded the built-in Hive from 1.2 to 2.3. This may need to set `spark.sql.hive.metastore.version` and `spark.sql.hive.metastore.jars` according to the version of your Hive metastore. Example:
```
--conf spark.sql.hive.metastore.version=1.2.1 --conf spark.sql.hive.metastore.jars=/root/hive-1.2.1-lib/*
```
Otherwise:
```
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table spark_27686. Invalid method name: 'get_table_req';
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
  at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:841)
  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:431)
  at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:52)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:226)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3487)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3485)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:226)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
  ... 47 elided
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table spark_27686. Invalid method name: 'get_table_req'
  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1282)
  at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:422)
  at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:436)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:322)
  at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:256)
  at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:255)
  at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
  at org.apache.spark.sql.hive.client.HiveClientImpl.tableExists(HiveClientImpl.scala:436)
  at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$tableExists$1(HiveExternalCatalog.scala:841)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:100)
  ... 63 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1567)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1554)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1350)
  at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:127)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
  at com.sun.proxy.$Proxy38.getTable(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
  at com.sun.proxy.$Proxy38.getTable(Unknown Source)
  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1274)
  ... 74 more
```

### Why are the changes needed?

Improve documentation.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?
```SKIP_API=1 jekyll build```:
![image](https://user-images.githubusercontent.com/5399861/73531432-67a50b80-4455-11ea-9401-5cad12fd3d14.png)

Closes #27161 from wangyum/SPARK-27686.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-01 20:50:47 -08:00
Thomas Graves 878094f972 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
### What changes were proposed in this pull request?

This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment.
This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource.

I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs.

I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment.

The user would configured this like:
--conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin"

Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission.

Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources.

### Why are the changes needed?

to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments.

### Does this PR introduce any user-facing change?

Yes a plugin api

### How was this patch tested?

Unit tests added and manual testing done on yarn and standalone modes.

Closes #27410 from tgravescs/hadoop27spark3.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 22:20:28 -06:00
Liang-Chi Hsieh 8eecc20b11 [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"
## What changes were proposed in this pull request?

This patch adds a DDL command `SHOW CREATE TABLE AS SERDE`. It is used to generate Hive DDL for a Hive table.

For original `SHOW CREATE TABLE`, it now shows Spark DDL always. If given a Hive table, it tries to generate Spark DDL.

For Hive serde to data source conversion, this uses the existing mapping inside `HiveSerDe`. If can't find a mapping there, throws an analysis exception on unsupported serde configuration.

It is arguably that some Hive fileformat + row serde might be mapped to Spark data source, e.g., CSV. It is not included in this PR. To be conservative, it may not be supported.

For Hive serde properties, for now this doesn't save it to Spark DDL because it may not useful to keep Hive serde properties in Spark table.

## How was this patch tested?

Added test.

Closes #24938 from viirya/SPARK-27946.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2020-01-31 19:55:25 -08:00
Wing Yew Poon 387ce89a06 [SPARK-27324][DOC][CORE] Document configurations related to executor metrics and modify a configuration
### What changes were proposed in this pull request?

Add a section to the Configuration page to document configurations for executor metrics.
At the same time, rename spark.eventLog.logStageExecutorProcessTreeMetrics.enabled to spark.executor.processTreeMetrics.enabled and make it independent of spark.eventLog.logStageExecutorMetrics.enabled.

### Why are the changes needed?

Executor metrics are new in Spark 3.0. They lack documentation.
Memory metrics as a whole are always collected, but the ones obtained from the process tree have to be optionally enabled. Making this depend on a single configuration makes for more intuitive behavior. Given this, the configuration property is renamed to better reflect its meaning.

### Does this PR introduce any user-facing change?

Yes, only in that the configurations are all new to 3.0.

### How was this patch tested?

Not necessary.

Closes #27329 from wypoon/SPARK-27324.

Authored-by: Wing Yew Poon <wypoon@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2020-01-31 14:28:02 -06:00
Huaxin Gao 5eac2dcbcd [SPARK-30691][SQL][DOC] Add a few main pages to SQL Reference
### What changes were proposed in this pull request?
Add  a few main pages

### Why are the changes needed?
To make SQL Reference complete.

### Does this PR introduce any user-facing change?
Yes

![image](https://user-images.githubusercontent.com/13592258/73563358-f859f800-4411-11ea-8bd9-27d4db784957.png)

![image](https://user-images.githubusercontent.com/13592258/73530590-a55e5180-43cd-11ea-81b9-0192ff990b96.png)

![image](https://user-images.githubusercontent.com/13592258/73530629-b909b800-43cd-11ea-91a9-cfc71e213c7a.png)

![image](https://user-images.githubusercontent.com/13592258/73530812-0be36f80-43ce-11ea-9151-efa4ab7f2105.png)

![image](https://user-images.githubusercontent.com/13592258/73530908-3e8d6800-43ce-11ea-9943-10f2bd2bb408.png)

![image](https://user-images.githubusercontent.com/13592258/73530916-451bdf80-43ce-11ea-83c2-c7a9b063add7.png)

![image](https://user-images.githubusercontent.com/13592258/73530927-4baa5700-43ce-11ea-963c-951c8820ff54.png)

![image](https://user-images.githubusercontent.com/13592258/73530963-5cf36380-43ce-11ea-8cb1-6064ba2992f3.png)

### How was this patch tested?
Manually build and check

Closes #27416 from huaxingao/spark-doc.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-31 12:52:22 -06:00
Shixiong Zhu f56ba37d8b
[SPARK-30656][SS] Support the "minPartitions" option in Kafka batch source and streaming source v1
### What changes were proposed in this pull request?

- Add `minPartitions` support for Kafka Streaming V1 source.
- Add `minPartitions` support for Kafka batch V1  and V2 source.
- There is lots of refactoring (moving codes to KafkaOffsetReader) to reuse codes.

### Why are the changes needed?

Right now, the "minPartitions" option only works in Kafka streaming source v2. It would be great that we can support it in batch and streaming source v1 (v1 is the fallback mode when a user hits a regression in v2) as well.

### Does this PR introduce any user-facing change?

Yep. The `minPartitions` options is supported in Kafka batch and streaming queries for both data source V1 and V2.

### How was this patch tested?

New unit tests are added to test "minPartitions".

Closes #27388 from zsxwing/kafka-min-partitions.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
2020-01-30 18:14:50 -08:00
Nicholas Chammas bda0669110 [SPARK-30665][DOCS][BUILD][PYTHON] Eliminate pypandoc dependency
### What changes were proposed in this pull request?

This PR removes any dependencies on pypandoc. It also makes related tweaks to the docs README to clarify the dependency on pandoc (not pypandoc).

### Why are the changes needed?

We are using pypandoc to convert the Spark README from Markdown to ReST for PyPI. PyPI now natively supports Markdown, so we don't need pypandoc anymore. The dependency on pypandoc also sometimes causes issues when installing Python packages that depend on PySpark, as described in #18981.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually:

```sh
python -m venv venv
source venv/bin/activate
pip install -U pip

cd python/
python setup.py sdist
pip install dist/pyspark-3.0.0.dev0.tar.gz
pyspark --version
```

I also built the PySpark and R API docs with `jekyll` and reviewed them locally.

It would be good if a maintainer could also test this by creating a PySpark distribution and uploading it to [Test PyPI](https://test.pypi.org) to confirm the README looks as it should.

Closes #27376 from nchammas/SPARK-30665-pypandoc.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-30 16:40:38 +09:00
angerszhu 246c398d59 [SPARK-30435][DOC] Update doc of Supported Hive Features
### What changes were proposed in this pull request?

add supported hive features

### Why are the changes needed?
update doc

### Does this PR introduce any user-facing change?
Before change UI info:

![image](https://user-images.githubusercontent.com/46485123/72592726-29302c80-393e-11ea-8f4d-76432d4cb658.png)

After this pr:
![image](https://user-images.githubusercontent.com/46485123/72593569-42d27380-3940-11ea-91c7-f2998d476364.png)

![image](https://user-images.githubusercontent.com/46485123/72962218-afd98380-3dee-11ea-82a1-0bf533ebfd9f.png)

### How was this patch tested?
For PR about Spark Doc Web UI, we need to show UI format before and after pr.
We can build our local web server about spark docs with reference `$SPARK_PROJECT/docs/README.md`

You should install python and ruby in your env and also install plugin like below
```sh
$ sudo gem install jekyll jekyll-redirect-from rouge
# Following is needed only for generating API docs
$ sudo pip install sphinx pypandoc mkdocs
$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "rmarkdown"), repos="https://cloud.r-project.org/")'
$ sudo Rscript -e 'devtools::install_version("roxygen2", version = "5.0.1", repos="https://cloud.r-project.org/")'
$ sudo Rscript -e 'devtools::install_version("testthat", version = "1.0.2", repos="https://cloud.r-project.org/")'
```

Then we call  `jekyll serve --watch` after build we see below message
```
~/Documents/project/AngersZhu/spark/sql
Moving back into docs dir.
Making directory api/sql
cp -r ../sql/site/. api/sql
            Source: /Users/angerszhu/Documents/project/AngersZhu/spark/docs
       Destination: /Users/angerszhu/Documents/project/AngersZhu/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 24.717 seconds.
 Auto-regeneration: enabled for '/Users/angerszhu/Documents/project/AngersZhu/spark/docs'
    Server address: http://127.0.0.1:4000
  Server running... press ctrl-c to stop.
```

Visit   http://127.0.0.1:4000 to get your newest change in doc web.

Closes #27106 from AngersZhuuuu/SPARK-30435.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-29 20:55:29 -08:00
Nicholas Chammas c228810edc [SPARK-30672][BUILD] Add numpy to API docs readme
### What changes were proposed in this pull request?

This PR adds `numpy` to the list of things that need to be installed in order to build the API docs. It doesn't add a new dependency; it just documents an existing dependency.

### Why are the changes needed?

You cannot build the PySpark API docs without numpy installed. Otherwise you get this series of errors:

```
$ SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve
Configuration file: .../spark/docs/_config.yml
Moving to python/docs directory and building sphinx.
sphinx-build -b html -d _build/doctrees   . _build/html
Running Sphinx v2.3.1
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 2 changed, 0 removed
reading sources... [100%] pyspark.mllib
WARNING: autodoc: failed to import module 'ml' from module 'pyspark'; the following exception was raised:
No module named 'numpy'
WARNING: autodoc: failed to import module 'ml.param' from module 'pyspark'; the following exception was raised:
No module named 'numpy'
...
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually, by building the API docs with and without numpy.

Closes #27390 from nchammas/SPARK-30672-numpy-pyspark-docs.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-30 13:04:53 +09:00
Dilip Biswal 3e203c985c [SPARK-28801][DOC][FOLLOW-UP] Setup links and address other review comments
### What changes were proposed in this pull request?

- Sets up links between related sections.
- Add "Related sections" for each section.
- Change to the left hand side menu to reflect the current status of the doc.
- Other minor cleanups.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

### How was this patch tested?
Tested using jykyll build --serve

Closes #27371 from dilipbiswal/select_finalization.

Authored-by: Dilip Biswal <dkbiswal@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-29 08:41:40 -06:00
Takeshi Yamamuro ec1fb6b4e1 [SPARK-30234][SQL][FOLLOWUP] Add .enabled in the suffix of the ADD FILE legacy option
### What changes were proposed in this pull request?

This pr intends to rename `spark.sql.legacy.addDirectory.recursive` into `spark.sql.legacy.addDirectory.recursive.enabled`.

### Why are the changes needed?

For consistent option names.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #27372 from maropu/SPARK-30234-FOLLOWUP.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-29 12:23:59 +09:00
zero323 298d0a5102 [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
### What changes were proposed in this pull request?

- Update `testthat` to >= 2.0.0
- Replace of `testthat:::run_tests` with `testthat:::test_package_dir`
- Add trivial assertions for tests, without any expectations, to avoid skipping.
- Update related docs.

### Why are the changes needed?

`testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / https://github.com/apache/spark/pull/20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

- Existing CI pipeline:
     - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
     - Linux build on Jenkins, R 3.1.x, testthat 1.0.2

- Additional builds with thesthat 2.3.1  using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64af9e697b3619779857dd820832176b3be3

   R 3.4.4  (image digest ec9032f8cf98)
   ```
   docker pull zero323/sparkr-build-sandbox:3.4.4
   docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
    ```
    3.5.3 (image digest 0b1759ee4d1d)

    ```
    docker pull zero323/sparkr-build-sandbox:3.5.3
    docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit
    c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
    ```

   and 3.6.2 (image digest 6594c8ceb72f)
    ```
   docker pull zero323/sparkr-build-sandbox:3.6.2
   docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc
   ````

   Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431

     [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431)

   (a bit to large to burden asciinema.org, but can run locally via `asciinema play`).

----------------------------

Continued from #27328

Closes #27359 from zero323/SPARK-23435.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-29 10:37:08 +09:00