spark-instrumented-optimizer/sql/core
Yuming Wang 1e0c006748 [SPARK-33617][SQL] Add default parallelism configuration for Spark SQL queries
### What changes were proposed in this pull request?

This pr add default parallelism configuration(`spark.sql.default.parallelism`) for Spark SQL and make it effective for `LocalTableScan`.

### Why are the changes needed?

Avoid generating small files for INSERT INTO TABLE from VALUES, for example:
```sql
CREATE TABLE t1(id int) USING parquet;
INSERT INTO TABLE t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8);
```

Before this pr:
```
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00000-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00001-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00002-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00003-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00004-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00005-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00006-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00007-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root   0 Dec  1 01:54 _SUCCESS
```

After this pr and set `spark.sql.files.minPartitionNum` to 1:
```
-rw-r--r-- 1 root root 452 Dec  1 01:59 part-00000-6de50c79-e305-4f8d-b6ae-39f46b2619c6-c000.snappy.parquet
-rw-r--r-- 1 root root   0 Dec  1 01:59 _SUCCESS
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30559 from wangyum/SPARK-33617.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <yumwang@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-07 21:36:52 +09:00
..
benchmarks [SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in SubExprEliminationBenchmark 2020-11-25 15:22:47 -08:00
src [SPARK-33617][SQL] Add default parallelism configuration for Spark SQL queries 2020-12-07 21:36:52 +09:00
pom.xml [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00