1e0c006748
### What changes were proposed in this pull request? This pr add default parallelism configuration(`spark.sql.default.parallelism`) for Spark SQL and make it effective for `LocalTableScan`. ### Why are the changes needed? Avoid generating small files for INSERT INTO TABLE from VALUES, for example: ```sql CREATE TABLE t1(id int) USING parquet; INSERT INTO TABLE t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8); ``` Before this pr: ``` -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00000-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00001-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00002-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00003-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00004-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00005-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00006-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 421 Dec 1 01:54 part-00007-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet -rw-r--r-- 1 root root 0 Dec 1 01:54 _SUCCESS ``` After this pr and set `spark.sql.files.minPartitionNum` to 1: ``` -rw-r--r-- 1 root root 452 Dec 1 01:59 part-00000-6de50c79-e305-4f8d-b6ae-39f46b2619c6-c000.snappy.parquet -rw-r--r-- 1 root root 0 Dec 1 01:59 _SUCCESS ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #30559 from wangyum/SPARK-33617. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <yumwang@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |