### What changes were proposed in this pull request?
These changes implement an application wait mechanism which will allow spark-submit to wait until the application finishes in Standalone Spark Mode. This will delay the exit of spark-submit JVM until the job is completed. This implementation will keep monitoring the application until it is either finished, failed or killed. This will be controlled via a flag (spark.submit.waitForCompletion) which will be set to false by default.
### Why are the changes needed?
Currently, Livy API for Standalone Cluster Mode doesn't know when the job has finished. If this flag is enabled, this can be used by Livy API (/batches/{batchId}/state) to find out when the application has finished/failed. This flag is Similar to spark.yarn.submit.waitAppCompletion.
### Does this PR introduce any user-facing change?
Yes, this PR introduces a new flag but it will be disabled by default.
### How was this patch tested?
Couldn't implement unit tests since the pollAndReportStatus method has System.exit() calls. Please provide any suggestions.
Tested spark-submit locally for the following scenarios:
1. With the flag enabled, spark-submit exits once the job is finished.
2. With the flag enabled and job failed, spark-submit exits when the job fails.
3. With the flag disabled, spark-submit exists post submitting the job (existing behavior).
4. Existing behavior is unchanged when the flag is not added explicitly.
Closes#28258 from akshatb1/master.
Lead-authored-by: Akshat Bordia <akshat.bordia31@gmail.com>
Co-authored-by: Akshat Bordia <akshat.bordia@citrix.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0.
While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc.
### Why are the changes needed?
SPARK-28199 is technically backward incompatible change and we should kindly guide the change.
### Does this PR introduce _any_ user-facing change?
Doc change.
### How was this patch tested?
N/A, as it's just a doc change.
Closes#28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
`fileNameOnly` parameter is split to 2 pieces in [this](dbb8143501) commit. This PR re-unites it.
### Why are the changes needed?
Parameter description split in doc.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
```
cd docs/
SKIP_API=1 jekyll build
```
Manual webpage check.
Closes#28739 from gaborgsomogyi/datasettxtfix.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
# What changes were proposed in this pull request?
After all these attempts https://github.com/apache/spark/pull/28692 and https://github.com/apache/spark/pull/28719 an https://github.com/apache/spark/pull/28727.
they all have limitations as mentioned in their discussions.
Maybe the only way is to forbid them all
### Why are the changes needed?
These week-based fields need Locale to express their semantics, the first day of the week varies from country to country.
From the Java doc of WeekFields
```java
/**
* Gets the first day-of-week.
* <p>
* The first day-of-week varies by culture.
* For example, the US uses Sunday, while France and the ISO-8601 standard use Monday.
* This method returns the first day using the standard {code DayOfWeek} enum.
*
* return the first day-of-week, not null
*/
public DayOfWeek getFirstDayOfWeek() {
return firstDayOfWeek;
}
```
But for the SimpleDateFormat, the day-of-week is not localized
```
u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1
```
Currently, the default locale we use is the US, so the result moved a day or a year or a week backward.
e.g.
For the date `2019-12-29(Sunday)`, in the Sunday Start system(e.g. en-US), it belongs to 2020 of week-based-year, in the Monday Start system(en-GB), it goes to 2019. the week-of-week-based-year(w) will be affected too
```sql
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY', 'locale', 'en-US'));
2020
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY', 'locale', 'en-GB'));
2019
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-US'));
2020-01-01
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2019-12-29', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-GB'));
2019-52-07
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2020-01-05', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-US'));
2020-02-01
spark-sql> SELECT to_csv(named_struct('time', to_timestamp('2020-01-05', 'yyyy-MM-dd')), map('timestampFormat', 'YYYY-ww-uu', 'locale', 'en-GB'));
2020-01-07
```
For other countries, please refer to [First Day of the Week in Different Countries](http://chartsbin.com/view/41671)
### Does this PR introduce _any_ user-facing change?
With this change, user can not use 'YwuW', but 'e' for 'u' instead. This can at least turn this not to be a silent data change.
### How was this patch tested?
add unit tests
Closes#28728 from yaooqinn/SPARK-31879-NEW2.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.
### Why are the changes needed?
In SPARK-29093, all setters have been removed from `Params` mixins in `pyspark.ml.param.shared`. Those setters had been part of the public pyspark ML API, hence this is a breaking change.
### Does this PR introduce _any_ user-facing change?
Only documentation.
### How was this patch tested?
Visually.
Closes#28663 from EnricoMi/branch-pyspark-migration-guide-setters.
Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR disables week-based date filed for parsing
closes#28674
### Why are the changes needed?
1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at https://github.com/apache/spark/pull/28674
2. The existing behavior itself in 2.4 is confusing, e.g.
```sql
spark-sql> select to_timestamp('1', 'w');
1969-12-28 00:00:00
spark-sql> select to_timestamp('1', 'u');
1970-01-05 00:00:00
```
The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form.
And, e.g.
```sql
spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy');
2020-01-01 00:00:00
spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY');
2019-12-29 00:00:00
spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w');
NULL
spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w');
2019-12-29 00:00:00
```
I think we don't need to introduce all the weird behavior from Java
3. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it.
4. Avoiding JDK bugs
https://issues.apache.org/jira/browse/SPARK-31880
### Does this PR introduce _any_ user-facing change?
Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions.
### How was this patch tested?
more tests added
Closes#28706 from yaooqinn/SPARK-31892.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
SQL Rest API exposes query execution details and metrics as Public API. Its documentation will be useful for the end-users.
### Why are the changes needed?
SQL Rest API does not exist under Spark Rest API.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually build and check
Closes#28354 from erenavsarogullari/SPARK-31566.
Lead-authored-by: Eren Avsarogullari <eren.avsarogullari@gmail.com>
Co-authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Co-authored-by: Eren Avsarogullari <erenavsarogullari@gmail.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request?
We should use dataType.catalogString to unified the data type mismatch message.
Before:
```sql
spark-sql> create table SPARK_31834(a int) using parquet;
spark-sql> insert into SPARK_31834 select '1';
Error in query: Cannot write incompatible data to table '`default`.`spark_31834`':
- Cannot safely cast 'a': StringType to IntegerType;
```
After:
```sql
spark-sql> create table SPARK_31834(a int) using parquet;
spark-sql> insert into SPARK_31834 select '1';
Error in query: Cannot write incompatible data to table '`default`.`spark_31834`':
- Cannot safely cast 'a': string to int;
```
### How was this patch tested?
UT.
Closes#28654 from lipzhu/SPARK-31834.
Authored-by: lipzhu <lipzhu@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
As mentioned in https://github.com/apache/spark/pull/28673 and suggested via cloud-fan at https://github.com/apache/spark/pull/28673#discussion_r432817075
In this PR, we disable datetime pattern in the form of `y..y` and `Y..Y` whose lengths are greater than 10 to avoid sort of JDK bug as described below
he new datetime formatter introduces silent data change like,
```sql
spark-sql> select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');
NULL
spark-sql> set spark.sql.legacy.timeParserPolicy=legacy;
spark.sql.legacy.timeParserPolicy legacy
spark-sql> select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');
00000001970-01-01
spark-sql>
```
For patterns that support `SignStyle.EXCEEDS_PAD`, e.g. `y..y`(len >=4), when using the `NumberPrinterParser` to format it
```java
switch (signStyle) {
case EXCEEDS_PAD:
if (minWidth < 19 && value >= EXCEED_POINTS[minWidth]) {
buf.append(decimalStyle.getPositiveSign());
}
break;
....
```
the `minWidth` == `len(y..y)`
the `EXCEED_POINTS` is
```java
/**
* Array of 10 to the power of n.
*/
static final long[] EXCEED_POINTS = new long[] {
0L,
10L,
100L,
1000L,
10000L,
100000L,
1000000L,
10000000L,
100000000L,
1000000000L,
10000000000L,
};
```
So when the `len(y..y)` is greater than 10, ` ArrayIndexOutOfBoundsException` will be raised.
And at the caller side, for `from_unixtime`, the exception will be suppressed and silent data change occurs. for `date_format`, the `ArrayIndexOutOfBoundsException` will continue.
### Why are the changes needed?
fix silent data change
### Does this PR introduce _any_ user-facing change?
Yes, SparkUpgradeException will take place of `null` result when the pattern contains 10 or more continuous 'y' or 'Y'
### How was this patch tested?
new tests
Closes#28684 from yaooqinn/SPARK-31867-2.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message.
### Why are the changes needed?
to keep backward compatibility with Spark 2.4
### Does this PR introduce _any_ user-facing change?
Yes
Spark 2.4
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
| Jan|
+---------------------------------------------+
```
Spark 3.0 with Java 11
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
| Jan|
+---------------------------------------------+
```
Spark 3.0 with Java 8
```
// before this PR
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
| 1|
+---------------------------------------------+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException
```
### How was this patch tested?
manual test with java 8 and 11
Closes#28646 from cloud-fan/format.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR adds the structured streaming UI introduction to the Web UI doc.
![image](https://user-images.githubusercontent.com/1452518/82642209-92b99380-9bdb-11ea-9a0d-cbb26040b0ef.png)
### Why are the changes needed?
The structured streaming web UI introduced before was missing from the Web UI documentation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
N.A.
Closes#28609 from xccui/ss-ui-doc.
Authored-by: Xingcan Cui <xccui@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Five continuous pattern characters with 'G/M/L/E/u/Q/q' means Narrow-Text Style while we turn to use `java.time.DateTimeFormatterBuilder` since 3.0.0, which output the leading single letter of the value, e.g. `December` would be `D`. In Spark 2.4 they mean Full-Text Style.
In this PR, we explicitly disable Narrow-Text Style for these pattern characters.
### Why are the changes needed?
Without this change, there will be a silent data change.
### Does this PR introduce _any_ user-facing change?
Yes, queries with datetime operations using datetime patterns, e.g. `G/M/L/E/u` will fail if the pattern length is 5 and other patterns, e,g. 'k', 'm' also can accept a certain number of letters.
1. datetime patterns that are not supported by the new parser but the legacy will get SparkUpgradeException, e.g. "GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "aa", "aaa". 2 options are given to end-users, one is to use legacy mode, and the other is to follow the new online doc for correct datetime patterns
2, datetime patterns that are not supported by both the new parser and the legacy, e.g. "QQQQQ", "qqqqq", will get IllegalArgumentException which is captured by Spark internally and results NULL to end-users.
### How was this patch tested?
add unit tests
Closes#28592 from yaooqinn/SPARK-31771.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
add docs for sql migration-guide
### Why are the changes needed?
let user know more about the cast scenarios in which Hive and Spark generate different results
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
no need to test
Closes#28605 from GuoPhilipse/spark-docs.
Lead-authored-by: GuoPhilipse <guofei_ok@126.com>
Co-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Added MDC support in all thread pools.
ThreaddUtils create new pools that pass over MDC.
### Why are the changes needed?
In many cases, it is very hard to understand from which actions the logs in the executor come from.
when you are doing multi-thread work in the driver and send actions in parallel.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
No test added because no new functionality added it is thread pull change and all current tests pass.
Closes#26624 from igreenfield/master.
Authored-by: Izek Greenfield <igreenfield@axiomsl.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
1. Describe standard 'M' and stand-alone 'L' text forms
2. Add examples for all supported number of month letters
<img width="1047" alt="Screenshot 2020-05-18 at 08 57 31" src="https://user-images.githubusercontent.com/1580697/82178856-b16f1000-98e5-11ea-87c0-456ef94dcd43.png">
### Why are the changes needed?
To improve docs and show how to use month patterns.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By building docs and checking by eyes.
Closes#28558 from MaxGekk/describe-L-M-date-pattern.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This patch effectively reverts SPARK-30098 via below changes:
* Removed the config
* Removed the changes done in parser rule
* Removed the usage of config in tests
* Removed tests which depend on the config
* Rolled back some tests to before SPARK-30098 which were affected by SPARK-30098
* Reflect the change into docs (migration doc, create table syntax)
### Why are the changes needed?
SPARK-30098 brought confusion and frustration on using create table DDL query, and we agreed about the bad effect on the change.
Please go through the [discussion thread](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html) to see the details.
### Does this PR introduce _any_ user-facing change?
No, compared to Spark 2.4.x. End users tried to experiment with Spark 3.0.0 previews will see the change that the behavior is going back to Spark 2.4.x, but I believe we won't guarantee compatibility in preview releases.
### How was this patch tested?
Existing UTs.
Closes#28517 from HeartSaVioR/revert-SPARK-30098.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
# What changes were proposed in this pull request?
This PR is a follow-up to fix a version of configuration document.
### Why are the changes needed?
The original PR is backported to branch-3.0.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Manual.
Closes#28530 from dongjoon-hyun/SPARK-31696-2.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`.
### Why are the changes needed?
Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation.
- https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations
### Does this PR introduce _any_ user-facing change?
Yes. The documentation is added.
### How was this patch tested?
Pass Jenkins with the updated unit tests.
Closes#28518 from dongjoon-hyun/SPARK-31696.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to set the minimum Arrow version as 0.15.1 to be consistent with PySpark side at.
### Why are the changes needed?
It will reduce the maintenance overhead to match the Arrow versions, and minimize the supported range. SparkR Arrow optimization is experimental yet.
### Does this PR introduce _any_ user-facing change?
No, it's the change in unreleased branches only.
### How was this patch tested?
0.15.x was already tested at SPARK-29378, and we're testing the latest version of SparkR currently in AppVeyor. I already manually tested too.
Closes#28520 from HyukjinKwon/SPARK-31701.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This changes the docs to make it clearer that order preservation is not guaranteed when saving a RDD to disk and reading it back ([SPARK-5300](https://issues.apache.org/jira/browse/SPARK-5300)).
I added two sentences about this in the RDD Programming Guide.
The issue was discussed on the dev mailing list:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-order-guarantees-td10142.html
### Why are the changes needed?
Because RDDs are order-aware collections, it is natural to expect that if I use `saveAsTextFile` and then load the resulting file with `sparkContext.textFile`, I obtain a RDD in the same order.
This is unfortunately not the case at the moment and there is no agreed upon way to fix this in Spark itself (see PR #4204 which attempted to fix this). Users should be aware of this.
### Does this PR introduce _any_ user-facing change?
Yes, two new sentences in the documentation.
### How was this patch tested?
By checking that the documentation looks good.
Closes#28465 from wetneb/SPARK-5300-docs.
Authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR aims to new Prometheus-format metric endpoints experimental in Apache Spark 3.0.0.
### Why are the changes needed?
Although the new metrics are disabled by default, we had better make it experimental explicitly in Apache Spark 3.0.0 since the output format is still not fixed. We can finalize it in Apache Spark 3.1.0.
### Does this PR introduce _any_ user-facing change?
Only doc-change is visible to the users.
### How was this patch tested?
Manually check the code since this is a documentation and class annotation change.
Closes#28495 from dongjoon-hyun/SPARK-31674.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414
### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually build and check
Closes#28451 from huaxingao/html_cleanup.
Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR adds `spark.yarn.applicationType` to identify the application type
### Why are the changes needed?
The current application defaults to the SPARK type.
In fact, different types of applications have different characteristics and are suitable for different scenarios.For example: SPAKR-SQL, SPARK-STREAMING.
I recommend distinguishing them by the parameter `spark.yarn.applicationType` so that we can more easily manage and maintain different types of applications.
### How was this patch tested?
1.add UT
2.Tested by verifying Yarn-UI `ApplicationType` in the following cases:
- client and cluster mode
Need additional explanation:
limit cannot exceed 20 characters, can be empty or space
The reasons are as follows:
```
// org.apache.hadoop.yarn.server.resourcemanager.submitApplication.
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
// APPLICATION_TYPE_LENGTH = 20
if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
}
```
Closes#28009 from wang-zhun/SPARK-31235.
Authored-by: wang-zhun <wangzhun6103@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
### What changes were proposed in this pull request?
This reverts commit 43a73e387c. It sets `INT96` as the timestamp type while saving timestamps to parquet files.
### Why are the changes needed?
To be compatible with Hive and Presto that don't support the `TIMESTAMP_MICROS` type in current stable releases.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By existing test suites.
Closes#28450 from MaxGekk/parquet-int96.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Fixed typo in `docs` directory and in `project/MimaExcludes.scala`
### Why are the changes needed?
Better readability of documents
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No test needed
Closes#28447 from kiszk/typo_20200504.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
The "spark.dynamicAllocation.shuffleTimeout" configuration only takes effect if "spark.dynamicAllocation.shuffleTracking.enabled" is true, so we should re-namespace that configuration so that it's nested under the "shuffleTracking" one.
### How was this patch tested?
Covered by current existing test cases.
Closes#28426 from jiangxb1987/confName.
Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR is to clean up the markdown file in SHOW COLUMNS page.
- remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
- use the ``` sql for highlighting the SQL syntax.
### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
**Before**
![Screen Shot 2020-04-29 at 5 20 11 PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png)
**After**
![Screen Shot 2020-04-29 at 6 03 50 PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png)
Closes#28414 from gatorsmile/cleanupShowColumns.
Lead-authored-by: Xiao Li <gatorsmile@gmail.com>
Co-authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
### What changes were proposed in this pull request?
- Rephrase the API doc for `Column.as`
- Simplify the UTs
### Why are the changes needed?
Address comments in https://github.com/apache/spark/pull/28326
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
New UT added.
Closes#28390 from xuanyuanking/SPARK-27340-follow.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
We are adding a new Spark Yarn configuration, `spark.yarn.populateHadoopClasspath` to not populate Hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath`.
### Why are the changes needed?
Spark Yarn client populates extra Hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to a Yarn Hadoop cluster.
However, for `with-hadoop` Spark build that embeds Hadoop runtime, it can cause jar conflicts because Spark distribution can contain different version of Hadoop jars.
One case we have is when a user uses an Apache Spark distribution with its-own embedded hadoop, and submits a job to Cloudera or Hortonworks Yarn clusters, because of two different incompatible Hadoop jars in the classpath, it runs into errors.
By not populating the Hadoop classpath from the clusters can address this issue.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
An UT is added, but very hard to add a new integration test since this requires using different incompatible versions of Hadoop.
We also manually tested this PR, and we are able to submit a Spark job using Spark distribution built with Apache Hadoop 2.10 to CDH 5.6 without populating CDH classpath.
Closes#28376 from dbtsai/yarn-classpath.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request?
This PR addresses two things:
- `SHOW TBLPROPERTIES` should supports view (a regression introduced by #26921)
- `SHOW TBLPROPERTIES` on a temporary view should return empty result (2.4 behavior instead of throwing `AnalysisException`.
### Why are the changes needed?
It's a bug.
### Does this PR introduce any user-facing change?
Yes, now `SHOW TBLPROPERTIES` works on views:
```
scala> sql("CREATE VIEW view TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES view").show(truncate=false)
+---------------------------------+-------------+
|key |value |
+---------------------------------+-------------+
|view.catalogAndNamespace.numParts|2 |
|view.query.out.col.0 |c1 |
|view.query.out.numCols |1 |
|p2 |v2 |
|view.catalogAndNamespace.part.0 |spark_catalog|
|p1 |v1 |
|view.catalogAndNamespace.part.1 |default |
+---------------------------------+-------------+
```
And for a temporary view:
```
scala> sql("CREATE TEMPORARY VIEW tview TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES tview").show(truncate=false)
+---+-----+
|key|value|
+---+-----+
+---+-----+
```
### How was this patch tested?
Added tests.
Closes#28375 from imback82/show_tblproperties_followup.
Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR adds `-Phive` profile to the pre-build phase to build the hive module to dev classpath.
Then reflect the HiveUtils object to dump all configurations in the class.
### Why are the changes needed?
supply SQL configurations from hive module to doc
### Does this PR introduce any user-facing change?
NO
### How was this patch tested?
passing Jenkins
add verified locally
![image](https://user-images.githubusercontent.com/8326978/80492333-6fae1200-8996-11ea-99fd-595ee18c67e5.png)
Closes#28394 from yaooqinn/SPARK-31596.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR proposes to use a different approach instead of breaking it per Micheal's rubric added at https://spark.apache.org/versioning-policy.html. It deprecates the behaviour for now. It will be gradually removed in the future releases.
After this change,
```python
import warnings
warnings.simplefilter("always")
from pyspark.sql.functions import *
df = spark.range(2)
map_col = create_map(lit(0), lit(100), lit(1), lit(200))
df.withColumn("mapped", map_col.getItem(col('id'))).show()
```
```
/.../python/pyspark/sql/column.py:311: DeprecationWarning: A column as 'key' in getItem is
deprecated as of Spark 3.0, and will not be supported in the future release. Use `column[key]`
or `column.key` syntax instead.
DeprecationWarning)
...
```
```python
import warnings
warnings.simplefilter("always")
from pyspark.sql.functions import *
df = spark.range(2)
struct_col = struct(lit(0), lit(100), lit(1), lit(200))
df.withColumn("struct", struct_col.getField(lit("col1"))).show()
```
```
/.../spark/python/pyspark/sql/column.py:336: DeprecationWarning: A column as 'name'
in getField is deprecated as of Spark 3.0, and will not be supported in the future release. Use
`column[name]` or `column.name` syntax instead.
DeprecationWarning)
```
### Why are the changes needed?
To prevent the radical behaviour change after the amended versioning policy.
### Does this PR introduce any user-facing change?
Yes, it will show the deprecated warning message.
### How was this patch tested?
Manually tested.
Closes#28327 from HyukjinKwon/SPARK-29664.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to remove the non-existed `hiveClientCalls.count` metric documentation of `CodeGenerator` of the Spark metrics system in the monitoring guide.
There is a duplicated `hiveClientCalls.count` metric in both `namespace=HiveExternalCatalog` and `namespace=CodeGenerator` bullet lists, but there is only one defined inside object `HiveCatalogMetrics`.
Closes#28292 from wezhang/monitoringdoc.
Authored-by: Wei Zhang <wezhang@outlook.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Give more friendly warning message/migration guide of deprecated scala udf to users.
### Why are the changes needed?
User can not distinguish function signature between typed and untyped scala udf. Instead, we shall tell user what to do directly.
### Does this PR introduce any user-facing change?
No, it's newly added in Spark 3.0.
### How was this patch tested?
Pass Jenkins.
Closes#28311 from Ngone51/update_udf_doc.
Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Add migration guide for removed accumulator v1 APIs.
### Why are the changes needed?
Provide better guidance for users' migration.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass Jenkins.
Closes#28309 from Ngone51/SPARK-16775-migration-guide.
Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Need to address a few more comments
### Why are the changes needed?
Fix a few problems
### Does this PR introduce any user-facing change?
Yes
### How was this patch tested?
Manually build and check
Closes#28306 from huaxingao/literal-folllowup.
Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>