spark-instrumented-optimizer/sql/core
Wenchen Fan 05498af72e [SPARK-31201][SQL] Add an individual config for skewed partition threshold
### What changes were proposed in this pull request?

Skew join handling comes with an overhead: we need to read some data repeatedly. We should treat a partition as skewed if it's large enough so that it's beneficial to do so.

Currently the size threshold is the advisory partition size, which is 64 MB by default. This is not large enough for the skewed partition size threshold.

This PR adds a new config for the threshold and set default value as 256 MB.

### Why are the changes needed?

Avoid skew join handling that may introduce a  perf regression.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27967 from cloud-fan/aqe.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-26 22:57:01 +09:00
..
benchmarks [SPARK-31119][SQL] Add interval value support for extract expression as extract source 2020-03-18 12:29:39 +08:00
src [SPARK-31201][SQL] Add an individual config for skewed partition threshold 2020-03-26 22:57:01 +09:00
v1.2/src [SPARK-31076][SQL] Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time 2020-03-11 20:53:56 +08:00
v2.3/src [SPARK-31076][SQL] Convert Catalyst's DATE/TIMESTAMP to Java Date/Timestamp via local date-time 2020-03-11 20:53:56 +08:00
pom.xml [SPARK-30984][SS] Add UI test for Structured Streaming UI 2020-03-04 13:55:34 +08:00