spark-instrumented-optimizer/sql/core/benchmarks/RangeBenchmark-results.txt
Wenchen Fan 34f229bc21 [SPARK-25710][SQL] range should report metrics correctly
## What changes were proposed in this pull request?

Currently `Range` reports metrics in batch granularity. This is acceptable, but it's better if we can make it row granularity without performance penalty.

Before this PR,  the metrics are updated when preparing the batch, which is before we actually consume data. In this PR, the metrics are updated after the data are consumed. There are 2 different cases:
1. The data processing loop has a stop check. The metrics are updated when we need to stop.
2. no stop check. The metrics are updated after the loop.

## How was this patch tested?

existing tests and a new benchmark

Closes #22698 from cloud-fan/range.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-10-13 13:55:28 +08:00

17 lines
988 B
Plaintext

================================================================================================
range
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
range: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
full scan 12674 / 12840 41.4 24.2 1.0X
limit after range 33 / 37 15900.2 0.1 384.4X
filter after range 969 / 985 541.0 1.8 13.1X
count after range 42 / 42 12510.5 0.1 302.4X
count after limit after range 32 / 33 16337.0 0.1 394.9X