Commit graph

25642 commits

Author SHA1 Message Date
Wenchen Fan 1f3863c856 [SPARK-29759][SQL] LocalShuffleReaderExec.outputPartitioning should use the corrected attributes
### What changes were proposed in this pull request?

Update `LocalShuffleReaderExec.outputPartitioning` to use attributes from `ReusedQueryStage`.

This also removes the override `doCanonicalize` in local/coalesced shuffle reader, as these 2 operators change the output partitioning. It's not safe to strip them in the canonicalized query plan.

### Why are the changes needed?

We will have an invalid output partitioning if we don fix it.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #26400 from cloud-fan/aqe.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-11-06 14:33:52 -08:00
Jungtaek Lim (HeartSaVioR) 782992c7ed [SPARK-29642][SS] Change the element type of underlying array to UnsafeRow for ContinuousRecordEndpoint
### What changes were proposed in this pull request?

This patch fixes the bug that `ContinuousMemoryStream[String]` throws error regarding ClassCastException - cast String to UTFString. This is because ContinuousMemoryStream and ContinuousRecordEndpoint uses origin input as it is for underlying data structure of Row, and encoding is missing here.

To force encoding, this patch changes the element type of underlying array to UnsafeRow instead of Any for ContinuousRecordEndpoint - ContinuousMemoryStream and TextSocketContinuousStream are modified to reflect the change.

### Why are the changes needed?

Above section describes the bug.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Add new UT to check for availability on couple of types.

Closes #26300 from HeartSaVioR/SPARK-29642.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-06 10:37:00 -08:00
Wenchen Fan 411015300e [SPARK-29752][SQL][TEST] make AdaptiveQueryExecSuite more robust
### What changes were proposed in this pull request?

instead of checking the exact number of local shuffle readers, we should check whether the number of shuffles is equal to the number of local readers.

### Why are the changes needed?

AQE is known to have randomness. We may pick different build side for broadcast join depending on which query stage finishes first. The decision to build side may add/remove shuffles downstream, so it's flaky to check the exact number of local shuffle readers.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

test only PR.

Closes #26394 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2019-11-06 10:27:39 -08:00
Kent Yao 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling
### What changes were proposed in this pull request?

Priority for YARN to define pending applications ordering policy, those with higher priority have a better opportunity to be activated. YARN CapacityScheduler only.

### Why are the changes needed?

Ordering pending spark apps
### Does this PR introduce any user-facing change?

add a conf
### How was this patch tested?

add ut

Closes #26255 from yaooqinn/SPARK-29603.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-06 10:12:27 -08:00
Yuming Wang e5c176a243 [MINOR][INFRA] Change the Github Actions build command to mvn install
### What changes were proposed in this pull request?

This PR change the Github Actions build command from `mvn package` to `mvn install` to build Scaladoc jars.

### Why are the changes needed?

Sometimes `mvn install` build failure with error: `not found: type ClassName...`.
More details:  https://github.com/apache/spark/pull/24628#issuecomment-495655747

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
N/A

Closes #26414 from wangyum/github-action-install.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-06 09:16:50 -08:00
zhengruifeng 854f30ffa8 [SPARK-29751][ML] Scalers use Summarizer instead of MultivariateOnlineSummarizer
### What changes were proposed in this pull request?
use `ml.Summarizer` instead of `mllib.MultivariateOnlineSummarizer`

### Why are the changes needed?
1, I found that using `ml.Summarizer` is faster than current impl;
2, `mllib.MultivariateOnlineSummarizer` maintain all arrays, while `ml.Summarizer` only maintain necessary arrays
3, using `ml.Summarizer` will avoid vector conversions to `mlllib.Vector`

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #26393 from zhengruifeng/maxabs_opt.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-06 08:57:21 -06:00
Huaxin Gao 8353000b47 [SPARK-29746][ML] Implement validateInputType in Normalizer/ElementwiseProduct/PolynomialExpansion
### What changes were proposed in this pull request?
This PR implements ```validateInput``` in ```ElementwiseProduct```, ```Normalizer``` and ```PolynomialExpansion```.

### Why are the changes needed?
```UnaryTransformer``` has abstract method ```validateInputType``` and call it in ```transformSchema```, but this method is not implemented in ```ElementwiseProduct```, ```Normalizer``` and ```PolynomialExpansion```.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests

Closes #26388 from huaxingao/spark-29746.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-06 07:21:36 -06:00
shahid 90df858a26 [SPARK-29725][SQL][TESTS] Add ThriftServerPageSuite
### What changes were proposed in this pull request?
Added UT for the classes `ThriftServerPage.scala` and `ThriftServerSessionPage.scala`

### Why are the changes needed?

Currently, there are no UTs for testing Thriftserver UI page
### Does this PR introduce any user-facing change?

No

### How was this patch tested?

UT

Closes #26403 from shahidki31/ut.

Authored-by: shahid <shahidki31@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-06 20:59:45 +09:00
zhengruifeng 5853e8b330 [SPARK-29754][ML] LoR/AFT/LiR/SVC use Summarizer instead of MultivariateOnlineSummarizer
### What changes were proposed in this pull request?
1, change the scope of `ml.SummarizerBuffer` and add a method `createSummarizerBuffer` for it, so it can be used as an aggregator like `MultivariateOnlineSummarizer`;
2, In LoR/AFT/LiR/SVC, use Summarizer instead of MultivariateOnlineSummarizer

### Why are the changes needed?
The computation of summary before learning iterations is a bottleneck in high-dimension cases, since `MultivariateOnlineSummarizer` compute much more than needed.
In the [ticket](https://issues.apache.org/jira/browse/SPARK-29754) is an example, with `--driver-memory=4G` LoR will always fail on KDDA dataset. If we swith to `ml.Summarizer`, then `--driver-memory=3G` is enough to train a model.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites & manual test in REPL

Closes #26396 from zhengruifeng/using_SummarizerBuffer.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2019-11-06 18:19:39 +08:00
Aman Omer 0dcd739534 [SPARK-29462] The data type of "array()" should be array<null>
### What changes were proposed in this pull request?
During creation of array, if CreateArray does not gets any children to set data type for array, it will create an array of null type .

### Why are the changes needed?
When empty array is created, it should be declared as array<null>.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested manually

Closes #26324 from amanomer/29462.

Authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-06 18:39:46 +09:00
Liang-Chi Hsieh 6233958ab6 [SPARK-29680][SQL] Remove ALTER TABLE CHANGE COLUMN syntax
### What changes were proposed in this pull request?

This patch removes v1 ALTER TABLE CHANGE COLUMN syntax.

### Why are the changes needed?

Since in v2 we have ALTER TABLE CHANGE COLUMN and ALTER TABLE RENAME COLUMN, this old syntax is not necessary now and can be confusing.

The v2 ALTER TABLE CHANGE COLUMN should fallback to v1 AlterTableChangeColumnCommand (#26354).

### Does this PR introduce any user-facing change?

Yes, the old v1 ALTER TABLE CHANGE COLUMN syntax is removed.

### How was this patch tested?

Unit tests.

Closes #26338 from viirya/SPARK-29680.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-06 10:42:44 +08:00
zhengruifeng ed12b61784 [SPARK-29656][ML][PYSPARK] ML algs expose aggregationDepth
### What changes were proposed in this pull request?
expose expert param `aggregationDepth` in algs: GMM/GLR

### Why are the changes needed?
SVC/LoR/LiR/AFT had exposed expert param aggregationDepth to end users. It should be nice to expose it in similar algs.

### Does this PR introduce any user-facing change?
yes, expose new param

### How was this patch tested?
added pytext tests

Closes #26322 from zhengruifeng/agg_opt.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2019-11-06 10:34:53 +08:00
Takeshi Yamamuro 20b9d8259b [SPARK-29714][SQL][TESTS] Port insert.sql
### What changes were proposed in this pull request?

This PR ports insert.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/insert.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/insert.out

### Why are the changes needed?

To check behaviour differences between Spark and PostgreSQL

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins. And, Comparison with PgSQL results

Closes #26360 from maropu/InsertTest.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-05 16:44:54 -08:00
Thomas Graves 075cd557f1 [SPARK-29763] Fix Stage UI Page not showing all accumulators in Task Table
### What changes were proposed in this pull request?
Fix the task table UI to show all accumulators.

Below example was creating 2 accumulators
scala> val accum = sc.longAccumulator("My Accumulator")
scala> val accum2 = sc.longAccumulator("My Accumulator")
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => {
     accum2.add(x)
     accum.add(x)
     })

Before this change, only shows a single on in task table:
![beforefixtaskui](https://user-images.githubusercontent.com/4563792/68225858-b0fcd080-ffb6-11e9-8561-3dc25a81a106.png)

After this change you can see all of them:
![tasktablegood](https://user-images.githubusercontent.com/4563792/68225911-c5d96400-ffb6-11e9-952a-18d3738711d1.png)

### Why are the changes needed?

Its not showing all accumulators now.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Manual testing the UI.

Closes #26402 from tgravescs/SPARK-29763.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-05 14:15:14 -08:00
Maxim Gekk 4c53ac1822 [SPARK-29387][SQL] Support * and / operators for intervals
### What changes were proposed in this pull request?
Added new expressions `MultiplyInterval` and `DivideInterval` to multiply/divide an interval by a numeric. Updated `TypeCoercion.DateTimeOperations` to turn the `Multiply`/`Divide` expressions of `CalendarIntervalType` and `NumericType` to `MultiplyInterval`/`DivideInterval`.

To support new operations, added new methods `multiply()` and `divide()` to `CalendarInterval`.

### Why are the changes needed?
- To maintain feature parity with PostgreSQL which supports multiplication and division of intervals by doubles:
```sql
# select interval '1 hour' / double precision '1.5';
 ?column?
----------
 00:40:00
```
- To conform the SQL standard which defines those operations: `numeric * interval`, `interval * numeric` and `interval / numeric`. See [4.5.3  Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt).
- Improve Spark SQL UX and allow users to adjust interval columns. For example:
```sql
spark-sql> select (timestamp'now' - timestamp'yesterday') * 1.3;
interval 2 days 10 hours 39 minutes 38 seconds 568 milliseconds 900 microseconds
```

### Does this PR introduce any user-facing change?
Yes, previously the following query fails with the error:
```sql
spark-sql> select interval 1 hour 30 minutes * 1.5;
Error in query: cannot resolve '(interval 1 hours 30 minutes * 1.5BD)' due to data type mismatch: differing types in '(interval 1 hours 30 minutes * 1.5BD)' (interval and decimal(2,1)).; line 1 pos 7;
```
After:
```sql
spark-sql> select interval 1 hour 30 minutes * 1.5;
interval 2 hours 15 minutes
```

### How was this patch tested?
- Added tests for the `multiply()` and `divide()` methods to `CalendarIntervalSuite.java`
- New test suite `IntervalExpressionsSuite`
- by tests for `Multiply` -> `MultiplyInterval` and `Divide` -> `DivideInterval` in `TypeCoercionSuite`
- updated `datetime.sql`

Closes #26132 from MaxGekk/interval-mul-div.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-06 00:37:43 +08:00
Alessandro Bellina 3cb18d90c4 [SPARK-29151][CORE] Support fractional resources for task resource scheduling
### What changes were proposed in this pull request?
This PR adds the ability for tasks to request fractional resources, in order to be able to execute more than 1 task per resource. For example, if you have 1 GPU in the executor, and the task configuration is 0.5 GPU/task, the executor can schedule two tasks to run on that 1 GPU.

### Why are the changes needed?
Currently there is no good way to share a resource such that multiple tasks can run on a single unit. This allows multiple tasks to share an executor resource.

### Does this PR introduce any user-facing change?
Yes: There is a configuration change where `spark.task.resource.[resource type].amount` can now be fractional.

### How was this patch tested?
Unit tests and manually on standalone mode, and yarn.

Closes #26078 from abellina/SPARK-29151.

Authored-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-11-05 08:57:43 -06:00
Takeshi Yamamuro 41be5125a1 [SPARK-29648][SQL][TESTS] Port limit.sql
### What changes were proposed in this pull request?

This PR ports limit.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/limit.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/limit.out

### Why are the changes needed?

To check behaviour differences between Spark and PostgreSQL

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins. And, Comparison with PgSQL results

Closes #26311 from maropu/SPARK-29648.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 22:12:27 -08:00
Huaxin Gao 02eecfec99 [SPARK-29695][SQL] ALTER TABLE (SerDe properties) should look up catalog/table like v2 commands
### What changes were proposed in this pull request?
Add AlterTableSerDePropertiesStatement and make ALTER TABLE ... SET SERDE/SERDEPROPERTIES go through the same catalog/table resolution framework of v2 commands.

### Why are the changes needed?
It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g.
```
USE my_catalog
DESC t // success and describe the table t from my_catalog
ALTER TABLE t SET SERDE 'org.apache.class' // report table not found as there is no table t in the session catalog
```

### Does this PR introduce any user-facing change?
Yes. When running ALTER TABLE ... SET SERDE/SERDEPROPERTIES, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog.

### How was this patch tested?
Unit tests.

Closes #26374 from huaxingao/spark_29695.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 21:42:39 -08:00
Terry Kim 66619b84d8 [SPARK-29630][SQL] Disallow creating a permanent view that references a temporary view in an expression
### What changes were proposed in this pull request?

Disallow creating a permanent view that references a temporary view in **expressions**.

### Why are the changes needed?

Creating a permanent view that references a temporary view is currently disallowed. For example,
```SQL
# The following throws org.apache.spark.sql.AnalysisException
# Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`;
CREATE VIEW per_view AS SELECT t1.a, t2.b FROM base_table t1, (SELECT * FROM tmp) t2"
```
However, the following is allowed.
```SQL

CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp);
```
This PR fixes the bug where temporary views used inside expressions are not checked.

### Does this PR introduce any user-facing change?

Yes. Now the following SQL query throws an exception as expected:
```SQL
# The following throws org.apache.spark.sql.AnalysisException
# Not allowed to create a permanent view `per_view` by referencing a temporary view `tmp`;
CREATE VIEW per_view AS SELECT * FROM base_table WHERE EXISTS (SELECT * FROM tmp);
```

### How was this patch tested?

Added new unit tests.

Closes #26361 from imback82/spark-29630.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-05 13:19:46 +08:00
Takeshi Yamamuro 942a057934 [SPARK-29696][SQL][TESTS] Port groupingsets.sql
### What changes were proposed in this pull request?

This PR ports groupingsets.sql from PostgreSQL regression tests https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/sql/groupingsets.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/groupingsets.out

### Why are the changes needed?

To check behaviour differences between Spark and PostgreSQL

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Pass the Jenkins. And, Comparison with PgSQL results

Closes #26352 from maropu/GgroupingSets.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 19:06:28 -08:00
Terry Kim bc65c54f6b [SPARK-29734][SQL] Datasource V2: Support SHOW CURRENT NAMESPACE
### What changes were proposed in this pull request?

This PR introduces a new SQL command: `SHOW CURRENT NAMESPACE`.

### Why are the changes needed?

Datasource V2 supports multiple catalogs/namespaces and having `SHOW CURRENT NAMESPACE` to retrieve the current catalog/namespace info would be useful.

### Does this PR introduce any user-facing change?

Yes, the user can perform the following:
```
scala> spark.sql("SHOW CURRENT NAMESPACE").show
+-------------+---------+
|      catalog|namespace|
+-------------+---------+
|spark_catalog|  default|
+-------------+---------+

scala> spark.sql("USE testcat.ns1.ns2").show
scala> spark.sql("SHOW CURRENT NAMESPACE").show
+-------+---------+
|catalog|namespace|
+-------+---------+
|testcat|  ns1.ns2|
+-------+---------+
```

### How was this patch tested?

Added unit tests.

Closes #26379 from imback82/show_current_catalog.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 18:05:10 -08:00
Liang-Chi Hsieh ef1abf2e2c [SPARK-29747][BUILD] Bump joda-time version to 2.10.5
### What changes were proposed in this pull request?

This upgrades joda-time from 2.9 to 2.10.5.

### Why are the changes needed?

Joda 2.9 is almost 4 yrs ago and there are bugs fix and tz database updates.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #26389 from viirya/upgrade-joda.

Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-05 10:08:19 +09:00
Jungtaek Lim (HeartSaVioR) ba2bc4b0e0 [SPARK-20568][SS] Provide option to clean up completed files in streaming query
## What changes were proposed in this pull request?

This patch adds the option to clean up files which are completed in previous batch.

`cleanSource` -> "archive" / "delete" / "off"

The default value is "off", which Spark will do nothing.

If "delete" is specified, Spark will simply delete input files. If "archive" is specified, Spark will require additional config `sourceArchiveDir` which will be used to move input files to there. When archiving (via move) the path of input files are retained to the archived paths as sub-path.

Note that it is only applied to "micro-batch", since for batch all input files must be kept to get same result across multiple query executions.

## How was this patch tested?

Added UT. Manual test against local disk as well as HDFS.

Closes #22952 from HeartSaVioR/SPARK-20568.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Jungtaek Lim <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-04 15:16:10 -08:00
yong.tian1 04536b21db [SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect
## What changes were proposed in this pull request?
This pr proposes to be case insensitive when matching dialects via jdbc url prefix.

When I use jdbc url such as: ```jdbc: MySQL://localhost/db``` to query data through sparksql, the result is wrong, but MySQL supports such url writing.

because sparksql matches MySQLDialect by prefix ```jdbc:mysql```, so ```jdbc: MySQL``` is not matched with the correct dialect. Therefore, it should be case insensitive when identifying the corresponding dialect through jdbc url

https://issues.apache.org/jira/browse/SPARK-28552
## How was this patch tested?
UT.

Closes #25287 from teeyog/sql_dialect.

Lead-authored-by: yong.tian1 <yong.tian1@dmall.com>
Co-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Kent Yao <yaooqinn@hotmail.com>
Co-authored-by: teeyog <teeyog@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-11-05 08:15:29 +09:00
Marcelo Vanzin d51d228048 [SPARK-29397][CORE] Extend plugin interface to include the driver
Spark 2.4 added the ability for executor plugins to be loaded into
Spark (see SPARK-24918). That feature intentionally skipped the
driver to keep changes small, and also because it is possible to
load code into the Spark driver using listeners + configuration.

But that is a bit awkward, because the listener interface does not
provide hooks into a lot of Spark functionality. This change reworks
the executor plugin interface to also extend to the driver.

- there's a "SparkPlugin" main interface that provides APIs to
  load driver and executor components.
- custom metric support (added in SPARK-28091) can be used by
  plugins to register metrics both in the driver process and in
  executors.
- a communication channel now exists that allows the plugin's
  executor components to send messages to the plugin's driver
  component easily, using the existing Spark RPC system.

The latter was a feature intentionally left out of the original
plugin design (also because it didn't include a driver component).

To avoid polluting the "org.apache.spark" namespace, I added the new
interfaces to the "org.apache.spark.api" package, which seems like
a better place in any case. The actual implementation is kept in
an internal package.

The change includes unit tests for the new interface and features,
but I've also been running a custom plugin that extends the new
API in real applications.

Closes #26170 from vanzin/SPARK-29397.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-04 14:33:17 -08:00
Maxim Gekk 441d4c953e [SPARK-29723][SQL] Get date and time parts of an interval as java classes
### What changes were proposed in this pull request?
I propose 2 new methods for `CalendarInterval`:
- `extractAsPeriod()` returns the date part of an interval as an instance of `java.time.Period`
- `extractAsDuration()` returns the time part of an interval as an instance of `java.time.Duration`

For example:
```scala
scala> import org.apache.spark.unsafe.types.CalendarInterval
scala> import java.time._
scala> val i = spark.sql("select interval 1 year 3 months 4 days 10 hours 30 seconds").collect()(0).getAs[CalendarInterval](0)
scala> LocalDate.of(2019, 11, 1).plus(i.period())
res8: java.time.LocalDate = 2021-02-05
scala> ZonedDateTime.parse("2019-11-01T12:13:14Z").plus(i.extractAsPeriod()).plus(i.extractAsDuration())
res9: java.time.ZonedDateTime = 2021-02-05T22:13:44Z
```

### Why are the changes needed?
Taking into account that `CalendarInterval` has been already partially exposed to users via the collect operation, and probably it will be fully exposed in the future, it could be convenient for users to get the date and time parts of intervals as java classes:
- to avoid unnecessary dependency from Spark's classes in user code
- to easily use external libraries that accept standard Java classes.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By new test in `CalendarIntervalSuite`.

Closes #26368 from MaxGekk/interval-java-period-duration.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 11:07:54 -08:00
Wenchen Fan 326b789340 [SPARK-29743][SQL] sample should set needCopyResult to true if its child is
### What changes were proposed in this pull request?

`SampleExec` has a bug that it sets `needCopyResult` to false as long as the `withReplacement` parameter is false. This causes problems if its child needs to copy the result, e.g. a join.

### Why are the changes needed?

to fix a correctness issue

### Does this PR introduce any user-facing change?

Yes, the result will be corrected.

### How was this patch tested?

a new test

Closes #26387 from cloud-fan/sample-bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 10:56:37 -08:00
shivusondur eee45f83c6 [SPARK-28809][DOC][SQL] Document SHOW TABLE in SQL Reference
### What changes were proposed in this pull request?
Added the document reference for SHOW TABLE EXTENDED sql command

### Why are the changes needed?
For User reference

### Does this PR introduce any user-facing change?
yes, it provides document reference for SHOW TABLE EXTENDED sql command

### How was this patch tested?
verified in snap
<details>
<summary> Attached the Snap</summary>

![image](https://user-images.githubusercontent.com/7912929/68142029-b4f80680-ff54-11e9-99a0-f39f2dac09e4.png)
![image](https://user-images.githubusercontent.com/7912929/64019738-95f08900-cb4d-11e9-9769-ee2be926fdc1.png)
![image](https://user-images.githubusercontent.com/7912929/64019775-ab65b300-cb4d-11e9-9e7e-140616af7790.png)
![image](https://user-images.githubusercontent.com/7912929/67963910-65000380-fc25-11e9-9cd0-8ee43bf206b1.png)
</details>

Closes #25632 from shivusondur/jiraSHOWTABLE.

Authored-by: shivusondur <shivusondur@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-04 11:58:41 -06:00
shivusondur f29a979e42 [SPARK-28798][DOC][SQL] Document DROP TABLE/VIEW statement in SQL Reference
### What changes were proposed in this pull request?
Added doc for DROP TABLE and DROP VIEW sql command

### Why are the changes needed?
For reference DROP TABLE  or DROP VIEW in spark-sql

### Does this PR introduce any user-facing change?
It updates DROP TABLE  or DROP VIEW reference doc

### How was this patch tested?
<details>
<summary> Attached the Snap</summary>

DROP TABLE

![image](https://user-images.githubusercontent.com/7912929/67884038-2443b400-fb6b-11e9-9773-b21dae398789.png)
![image](https://user-images.githubusercontent.com/7912929/67797387-aa96c200-faa7-11e9-90d4-fa8b7c6a4ec7.png)

DROP VIEW
![image](https://user-images.githubusercontent.com/7912929/67797463-c306dc80-faa7-11e9-96ec-e2f2e89d0db8.png)
![image](https://user-images.githubusercontent.com/7912929/67797648-1ed16580-faa8-11e9-9d32-19106326e3d9.png)

</details>

Closes #25533 from shivusondur/jiraUSEDB.

Authored-by: shivusondur <shivusondur@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-04 11:52:19 -06:00
angerszhu e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope
### What changes were proposed in this pull request?
Current checkstyle checking folder can't cover all folder.
Since for support multi version hive, we have some divided hive folder.
We should check it too.

### Why are the changes needed?
Fix build bug

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?
NO

Closes #26385 from AngersZhuuuu/SPARK-29742.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 09:08:47 -08:00
Kent Yao 44b8fbcc58 [SPARK-29663][SQL] Support sum with interval type values
### What changes were proposed in this pull request?

sum support interval values

### Why are the changes needed?

Part of SPARK-27764 Feature Parity between PostgreSQL and Spark

### Does this PR introduce any user-facing change?

yes, sum can evaluate intervals
### How was this patch tested?

add ut

Closes #26325 from yaooqinn/SPARK-29663.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-05 01:05:07 +08:00
Terry Kim d4ea211187 [SPARK-29678][SQL] ALTER TABLE (ADD PARTITION) should look up catalog/table like v2 commands
### What changes were proposed in this pull request?

Add AlterTableAddPartitionStatement and make ALTER TABLE ... ADD PARTITION go through the same catalog/table resolution framework of v2 commands.

### Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users. e.g.
```
USE my_catalog
DESC t // success and describe the table t from my_catalog
ALTER TABLE t ADD PARTITION (id=1) // report table not found as there is no table t in the session catalog
```

### Does this PR introduce any user-facing change?

Yes. When running ALTER TABLE ... ADD PARTITION, Spark fails the command if the current catalog is set to a v2 catalog, or the table name specified a v2 catalog.

### How was this patch tested?

Unit tests

Closes #26369 from imback82/spark-29678.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-04 23:56:47 +08:00
shahid 9023c69db8 [SPARK-29590][WEBUI] JDBC/ODBC tab in the spark UI support hide tables, to make it consistent with other tabs
### What changes were proposed in this pull request?

Currently, JDBC/ODBC tab in the WEBUI doesn't support hiding table. Other tabs in the web ui like, Jobs, stages, SQL etc supports hiding table (refer https://github.com/apache/spark/pull/22592).
In this PR, added the support for hide table in the jdbc/odbc tab also.

### Why are the changes needed?
Spark ui about the contents of the form need to have hidden and show features, when the table records very much. Because sometimes you do not care about the record of the table, you just want to see the contents of the next table, but you have to scroll the scroll bar for a long time to see the contents of the next table.

### Does this PR introduce any user-facing change?
No, except support of hide table

### How was this patch tested?
Manually tested
 ![Screenshot 2019-11-01 at 12 10 05 PM](https://user-images.githubusercontent.com/23054875/68007364-61aa5d80-fca1-11e9-841e-c5a7382871fa.png)
![Screenshot 2019-11-01 at 12 10 43 PM](https://user-images.githubusercontent.com/23054875/68007355-5a834f80-fca1-11e9-844a-f4ba1a333db7.png)

Closes #26353 from shahidki31/hideTable.

Authored-by: shahid <shahidki31@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-04 09:44:10 -06:00
Kent Yao 8cf76f8d61 [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures
### What changes were proposed in this pull request?

The `getFile` method in `DiskBlockManager` may return a file with an existing subdirectory. But when a disk failure occurs on that subdirectory. this file is inaccessible.
Then the FileNotFoundException like the following usually tear down the entire task, which is a bit heavy.
```
java.io.FileNotFoundException: /mnt/dfs/4/yarn/local/usercache/da_haitao/appcache/application_1568691584183_1953115/blockmgr-cc4689f5-eddd-4b99-8af4-4166a86ec30b/10/temp_shuffle_79be5049-d1d5-4a81-8e67-4ef236d3834f (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:249)
	at org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:209)
	at org.apache.spark.shuffle.sort.ShuffleExternalSorter.closeAndGetSpills(ShuffleExternalSorter.java:416)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:230)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:190)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
```
This change pre-touch the temporary file to check whether the parent directory is available or not. If NOT, we may try another possibly heathy disk util we reach the max attempts.
### Why are the changes needed?

Re-running the whole task is much heavier than pick another heathy disk to output the temporary results.

### Does this PR introduce any user-facing change?

NO

### How was this patch tested?

ADD UT

Closes #25962 from yaooqinn/SPARK-29285.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-04 18:21:57 +08:00
Maxim Gekk 50538600ec [SPARK-29736][TESTS] Improve stability of tests for special datetime values
### What changes were proposed in this pull request?
- Retry the tests for special date-time values on failure. The tests can potentially fail when reference values were taken before midnight and test code resolves special values after midnight. The retry can guarantees that the tests run during the same day.
- Simplify getting of the current timestamp via `Instant.now()`. This should avoid any issues of converting current local datetime to an instance. For example, the same local time can be mapped to 2 instants when clocks are turned backward 1 hour on daylight saving date.
- Extract common code to SQLHelper
- Set the tested zoneId to the session time zone in `DateTimeUtilsSuite`.

### Why are the changes needed?
To make the tests more stable.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing test suites `Date`/`TimestampFormatterSuite` and `DateTimeUtilsSuite`.

Closes #26380 from MaxGekk/retry-on-fail.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-04 16:59:32 +08:00
Dongjoon Hyun c55265cd2d [SPARK-29739][PYSPARK][TESTS] Use java instead of cc in test_pipe_functions
### What changes were proposed in this pull request?

This PR aims to replace `cc` with `java` in `test_pipe_functions` of `test_rdd.py`.

### Why are the changes needed?

Currently, `test_rdd.py` assumes `cc` installation during `rdd.pipe` tests.
This requires us to install `gcc` for python testing. If we use `java`, we can have the same test coverage and we don't need to install it because it's already installed in `PySpark` test environment.

This will be helpful when we build a dockerized parallel testing environment.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the existing PySpark tests.

Closes #26383 from dongjoon-hyun/SPARK-29739.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 23:03:38 -08:00
Liang-Chi Hsieh afb055ba19 [SPARK-29353][SQL] Fallback AlterTableAlterColumnStatement to v1 AlterTableChangeColumnCommand
### What changes were proposed in this pull request?

If the resolved table is v1 table, AlterTableAlterColumnStatement fallbacks to v1 AlterTableChangeColumnCommand.

### Why are the changes needed?

To make the catalog/table lookup logic consistent.

### Does this PR introduce any user-facing change?

Yes, a ALTER TABLE ALTER COLUMN command previously fails on v1 tables. After this, it falls back to v1 AlterTableChangeColumnCommand.

### How was this patch tested?

Unit test.

Closes #26354 from viirya/SPARK-29353.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-04 15:02:27 +08:00
Maxim Gekk fb60c2a170 [SPARK-29671][SQL] Simplify string representation of intervals
### What changes were proposed in this pull request?
In the PR, I propose to changed `CalendarInterval.toString`:
- to skip the `week` unit
- to convert `milliseconds` and `microseconds` as the fractional part of the `seconds` unit.

### Why are the changes needed?
To improve readability.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
- By `CalendarIntervalSuite` and `IntervalUtilsSuite`
- `literals.sql`, `datetime.sql` and `interval.sql`

Closes #26367 from MaxGekk/interval-to-string-format.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 22:56:59 -08:00
wangguangxin.cn 83c39d15e1 [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation
### What changes were proposed in this pull request?
This is somewhat a complement of https://github.com/apache/spark/pull/21853.
The `Sort` without `Limit` operator in `Join` subquery is useless, it's the same case in `GroupBy` when the aggregation function is order irrelevant, such as `count`, `sum`.
This PR try to remove this kind of `Sort` operator in `SQL Optimizer`.

### Why are the changes needed?
For example,  `select count(1) from (select a from test1 order by a)` is equal to `select count(1) from (select a from test1)`.
'select * from (select a from test1 order by a) t1 join (select b from test2) t2 on t1.a = t2.b' is equal to `select * from (select a from test1) t1 join (select b from test2) t2 on t1.a = t2.b`.

Remove useless `Sort` operator can improve performance.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Adding new UT `RemoveSortInSubquerySuite.scala`

Closes #26011 from WangGuangxin/remove_sorts.

Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-04 14:52:19 +08:00
Kent Yao 5ba17d09ac [SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions
### What changes were proposed in this pull request?

Support non-reversed keywords to be used in high order functions.

### Why are the changes needed?

the keywords are non-reversed.

### Does this PR introduce any user-facing change?

yes, all non-reversed keywords can be used in high order function correctly

### How was this patch tested?

add uts

Closes #26366 from yaooqinn/SPARK-29722.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-04 14:52:14 +09:00
Liang-Chi Hsieh e7263242bd Revert "[SPARK-24152][R][TESTS] Disable check-cran from run-tests.sh"
### What changes were proposed in this pull request?

This reverts commit 91d990162f.

### Why are the changes needed?

CRAN check is pretty important for R package, we should enable it.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests.

Closes #26381 from viirya/revert-SPARK-24152.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 15:14:58 -08:00
Sean Owen 19b8c71436 [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+
### What changes were proposed in this pull request?

Update the version of dropwizard metrics that Spark uses for metrics to 4.1.x, from 3.2.x.

### Why are the changes needed?

This helps JDK 9+ support, per for example https://github.com/dropwizard/metrics/pull/1236

### Does this PR introduce any user-facing change?

No, although downstream users with custom metrics may be affected.

### How was this patch tested?

Existing tests.

Closes #26332 from srowen/SPARK-29674.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 15:13:06 -08:00
Maxim Gekk 80a89873b2 [SPARK-29733][TESTS] Fix wrong order of parameters passed to assertEquals
### What changes were proposed in this pull request?
The `assertEquals` method of JUnit Assert requires the first parameter to be the expected value. In this PR, I propose to change the order of parameters when the expected value is passed as the second parameter.

### Why are the changes needed?
Wrong order of assert parameters confuses when the assert fails and the parameters have special string representation. For example:
```java
assertEquals(input1.add(input2), new CalendarInterval(5, 5, 367200000000L));
```
```
java.lang.AssertionError:
Expected :interval 5 months 5 days 101 hours
Actual   :interval 5 months 5 days 102 hours
```

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing tests.

Closes #26377 from MaxGekk/fix-order-in-assert-equals.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 11:21:28 -08:00
Dongjoon Hyun 4bcfe5033c [SPARK-29731][INFRA] Use public JIRA REST API to read-only access
### What changes were proposed in this pull request?

This PR replaces `jira_client` API call for read-only access with public Apache JIRA REST API invocation.

### Why are the changes needed?

This will reduce the number of authenticated API invocations. I hope this will reduce the chance of CAPCHAR from Apache JIRA site.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manual.
```
$ echo 26375 > .github-jira-max
$ dev/github_jira_sync.py
Read largest PR number previously seen: 26375
Retrieved 100 JIRA PR's from Github
1 PR's remain after excluding visted ones
Checking issue SPARK-29731
Writing largest PR number seen: 26376
Build PR dictionary
SPARK-29731
26376
Set 26376 with labels "PROJECT INFRA"
```

Closes #26376 from dongjoon-hyun/SPARK-29731.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 11:17:53 -08:00
Dongjoon Hyun 1ac6bd9f79 [SPARK-29729][BUILD] Upgrade ASM to 7.2
### What changes were proposed in this pull request?

This PR aims to upgrade ASM to 7.2.
- https://issues.apache.org/jira/browse/XBEAN-322 (Upgrade to ASM 7.2)
- https://asm.ow2.io/versions.html

### Why are the changes needed?

This will bring the following patches.
- 317875: Infinite loop when parsing invalid method descriptor
- 317873: Add support for RET instruction in AdviceAdapter
- 317872: Throw an exception if visitFrame used incorrectly
- add support for Java 14

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing UTs.

Closes #26373 from dongjoon-hyun/SPARK-29729.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 10:42:38 -08:00
Dongjoon Hyun 91d990162f [SPARK-24152][R][TESTS] Disable check-cran from run-tests.sh
### What changes were proposed in this pull request?

This PR aims to remove `check-cran` from `run-tests.sh`.
We had better add an independent Jenkins job to run `check-cran`.

### Why are the changes needed?

CRAN instability has been a blocker for our daily dev process.
The following simple check causes consecutive failures in 4 of 9 Jenkins
jobs + PR builder.

```
* checking CRAN incoming feasibility ...Error in
.check_package_CRAN_incoming(pkgdir) :
  dims [product 24] do not match the length of object [0]
```

- spark-branch-2.4-test-sbt-hadoop-2.6
- spark-branch-2.4-test-sbt-hadoop-2.7
- spark-master-test-sbt-hadoop-2.7
- spark-master-test-sbt-hadoop-3.2
- PRBuilder

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Currently, PR builder is failing due to the above issue. This PR should pass the Jenkins.

Closes #26375 from dongjoon-hyun/SPARK-24152.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-02 21:37:40 -07:00
Eric Meisel be022d9aee [SPARK-29677][DSTREAMS] amazon-kinesis-client 1.12.0
### What changes were proposed in this pull request?
Upgrading the amazon-kinesis-client dependency to 1.12.0.

### Why are the changes needed?
The current amazon-kinesis-client version is 1.8.10. This version depends on the use of `describeStream`, which has a hard limit on an AWS account (10 reqs / second). Versions 1.9.0 and up leverage `listShards`, which has no such limit. For large customers, this can be a major problem.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests

Closes #26333 from etspaceman/kclUpgrade.

Authored-by: Eric Meisel <eric.steven.meisel@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-02 16:42:49 -05:00
Wenchen Fan 31ae446e9c [SPARK-29623][SQL] do not allow multiple unit TO unit statements in interval literal syntax
### What changes were proposed in this pull request?

re-arrange the parser rules to make it clear that multiple unit TO unit statement like `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH` is not allowed.

### Why are the changes needed?

This is definitely an accident that we support such a weird syntax in the past. It's not supported by any other DBs and I can't think of any use case of it. Also no test covers this syntax in the current codebase.

### Does this PR introduce any user-facing change?

Yes, and a migration guide item is added.

### How was this patch tested?

new tests.

Closes #26285 from cloud-fan/syntax.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-02 21:35:56 +08:00
dengziming 28ccd31aee [SPARK-29611][WEBUI] Sort Kafka metadata by the number of messages
### What changes were proposed in this pull request?

Sort metadata by the number of messages in each Kafka partition

### Why are the changes needed?

help to find the data skewness problem.

### Does this PR introduce any user-facing change?

Yes, add a column count to the metadata and sort by count
![image](https://user-images.githubusercontent.com/26023240/67617886-63e06800-f81a-11e9-8718-be3a0100952e.png)

If you set `minPartitions` configurations with structure structured-streaming which doesn't have the Streaming page, my code changes in `DirectKafkaInputDStream` won't affect the WEB UI page just as it shows in the follow image

![image](https://user-images.githubusercontent.com/26023240/68020762-79520800-fcda-11e9-96cd-f0c64a36f505.png)

### How was this patch tested?

Manual test

Closes #26266 from dengziming/feature_ui_optimize.

Lead-authored-by: dengziming <dengziming@growingio.com>
Co-authored-by: dengziming <swzmdeng@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-01 22:46:34 -07:00
Matt Stillwell 1e1b7302f4 [MINOR][PYSPARK][DOCS] Fix typo in example documentation
### What changes were proposed in this pull request?

I propose that we change the example code documentation to call the proper function .
For example, under the `foreachBatch` function, the example code was calling the `foreach()` function by mistake.

### Why are the changes needed?

I suppose it could confuse some people, and it is a typo

### Does this PR introduce any user-facing change?

No, there is no "meaningful" code being change, simply the documentation

### How was this patch tested?

I made the change on a fork and it still worked

Closes #26299 from mstill3/patch-1.

Authored-by: Matt Stillwell <18670089+mstill3@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-01 11:55:29 -07:00