spark-instrumented-optimizer/R/pkg/tests/fulltests
Wenchen Fan c1d8178817 [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing
### What changes were proposed in this pull request?

By default, AQE will set `COALESCE_PARTITIONS_MIN_PARTITION_NUM` to the spark default parallelism, which is usually quite big. This is to keep the parallelism on par with non-AQE, to avoid perf regressions.

However, this usually leads to many small/empty partitions, and hurts performance (although not worse than non-AQE). Users usually blindly set `COALESCE_PARTITIONS_MIN_PARTITION_NUM` to 1, which makes this config quite useless.

This PR adds a new config to set the min partition size, to avoid too small partitions after coalescing. By default, Spark will not respect the target size, and only respect this min partition size, to maximize the parallelism and avoid perf regression in AQE. This PR also adds a bool config to respect the target size when coalescing partitions, and it's recommended to set it to get better overall performance. This PR also deprecates the `COALESCE_PARTITIONS_MIN_PARTITION_NUM` config.

### Why are the changes needed?

AQE is default on now, we should make the perf better in the default case.

### Does this PR introduce _any_ user-facing change?

yes, a new config.

### How was this patch tested?

new tests

Closes #33172 from cloud-fan/aqe2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 0c9c8ff569)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-02 16:07:46 +08:00
..
data [SPARK-30645][SPARKR][TESTS][WINDOWS] Move Unicode test data to external file 2020-01-26 12:59:53 +09:00
jarTest.R [SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN 2017-06-11 00:00:33 -07:00
packageInAJarTest.R [SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN 2017-06-11 00:00:33 -07:00
test_binary_function.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
test_binaryFile.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
test_broadcast.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
test_client.R [SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 2019-03-25 10:46:42 -05:00
test_context.R [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature 2020-07-15 11:40:55 -05:00
test_includePackage.R [SPARK-30733][R][HOTFIX] Fix SparkR tests per testthat and R version upgrade, and disable CRAN 2020-02-05 16:45:54 +09:00
test_jvm_api.R Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
test_mllib_classification.R [SPARK-35024][ML] Refactor LinearSVC - support virtual centering 2021-04-25 13:16:46 +08:00
test_mllib_clustering.R [SPARK-31918][R] Ignore S4 generic methods under SparkR namespace in closure cleaning to support R 4.0.0+ 2020-06-24 11:03:05 +09:00
test_mllib_fpm.R [SPARK-19939][ML] Add support for association rules in ML 2020-06-26 12:55:38 -05:00
test_mllib_recommendation.R [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE 2020-01-13 22:55:19 +08:00
test_mllib_regression.R [SPARK-31918][R] Ignore S4 generic methods under SparkR namespace in closure cleaning to support R 4.0.0+ 2020-06-24 11:03:05 +09:00
test_mllib_stat.R [SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN 2017-06-11 00:00:33 -07:00
test_mllib_tree.R [SPARK-30543][ML][PYSPARK][R] RandomForest add Param bootstrap to control sampling method 2020-01-23 16:44:13 +08:00
test_parallelize_collect.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
test_rdd.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
test_Serde.R Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
test_shuffle.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
test_sparkR.R [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
test_sparkSQL.R [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing 2021-07-02 16:07:46 +08:00
test_sparkSQL_arrow.R [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+ 2021-06-01 10:35:52 +09:00
test_sparkSQL_eager.R [SPARKR] found some extra whitespace in the R tests 2018-10-31 10:32:26 +08:00
test_streaming.R [SPARK-26120][TESTS][SS][SPARKR] Fix a streaming query leak in Structured Streaming R tests 2018-11-21 09:31:12 +08:00
test_take.R [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move 2017-06-11 03:00:44 -07:00
test_textFile.R [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 2020-01-29 10:37:08 +09:00
test_utils.R [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+ 2021-06-01 10:35:52 +09:00
test_Windows.R [SPARK-21616][SPARKR][DOCS] update R migration guide and vignettes 2018-01-02 07:00:31 +09:00