spark-instrumented-optimizer/sql/core/benchmarks/InsertTableWithDynamicPartitionsBenchmark-results.txt
yangjie01 b38f3a5557 [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct
### What changes were proposed in this pull request?

The purpose of this pr is to resolve SPARK-32978.

The main reason of bad case describe in SPARK-32978 is the `BasicWriteTaskStatsTracker` directly reports the new added partition number of each task, which makes it impossible to remove duplicate data in driver side.

The main of this pr is change to report partitionValues to driver and remove duplicate data at driver side to make sure the number of dynamic part metric is correct.

### Why are the changes needed?
The the number of dynamic part metric we display on the UI should be correct.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add a new test case refer to described in SPARK-32978

Closes #30026 from LuciferYang/SPARK-32978.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-10-22 14:01:07 +00:00

9 lines
757 B
Plaintext

OpenJDK 64-Bit Server VM 1.8.0_232-b18 on Mac OS X 10.15.7
Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
dynamic insert table benchmark, totalRows = 200000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
one partition column, 100 partitions 23370 23588 309 0.0 116848.3 1.0X
two partition columns, 500 partitions 37686 38079 555 0.0 188432.2 0.6X
three partition columns, 2000 partitions 112489 113049 792 0.0 562446.1 0.2X