spark-instrumented-optimizer/sql/catalyst
Wenchen Fan 9a539d5846 [SPARK-36430][SQL] Adaptively calculate the target size when coalescing shuffle partitions in AQE
### What changes were proposed in this pull request?

This PR fixes a performance regression introduced in https://github.com/apache/spark/pull/33172

Before #33172 , the target size is adaptively calculated based on the default parallelism of the spark cluster. Sometimes it's very small and #33172 sets a min partition size to fix perf issues. Sometimes the calculated size is reasonable, such as dozens of MBs.

After #33172 , we no longer calculate the target size adaptively, and by default always coalesce the partitions into 1 MB. This can cause perf regression if the adaptively calculated size is reasonable.

This PR brings back the code that adaptively calculate the target size based on the default parallelism of the spark cluster.

### Why are the changes needed?

fix perf regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33655 from cloud-fan/minor.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-08-09 17:25:55 +08:00
..
benchmarks [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
src [SPARK-36430][SQL] Adaptively calculate the target size when coalescing shuffle partitions in AQE 2021-08-09 17:25:55 +08:00
pom.xml [SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache 2021-08-04 12:01:44 -07:00