spark-instrumented-optimizer/sql/core/benchmarks/WideSchemaBenchmark-results.txt
Maxim Gekk f5118f81e3 [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks
### What changes were proposed in this pull request?
In the PR, I propose to replace `.collect()`, `.count()` and `.foreach(_ => ())` in SQL benchmarks and use the `NoOp` datasource. I added an implicit class to `SqlBasedBenchmark` with the `.noop()` method. It can be used in benchmark like: `ds.noop()`. The last one is unfolded to `ds.write.format("noop").mode(Overwrite).save()`.

### Why are the changes needed?
To avoid additional overhead that `collect()` (and other actions) has. For example, `.collect()` has to convert values according to external types and pull data to the driver. This can hide actual performance regressions or improvements of benchmarked operations.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Re-run all modified benchmarks using Amazon EC2.

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge (spot instance) |
| AMI | ami-06f2f779464715dc5 (ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1) |
| Java | OpenJDK8/10 |

- Run `TPCDSQueryBenchmark` using instructions from the PR #26049
```
# `spark-tpcds-datagen` needs this. (JDK8)
$ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 spark-2.4
$ export SPARK_HOME=$PWD
$ ./build/mvn clean package -DskipTests

# Generate data. (JDK8)
$ git clone gitgithub.com:maropu/spark-tpcds-datagen.git
$ cd spark-tpcds-datagen/
$ build/mvn clean package
$ mkdir -p /data/tpcds
$ ./bin/dsdgen --output-location /data/tpcds/s1  // This need `Spark 2.4`
```
- Other benchmarks ran by the script:
```
#!/usr/bin/env python3

import os
from sparktestsupport.shellutils import run_cmd

benchmarks = [
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.AggregateBenchmark'],
    ['avro/test', 'org.apache.spark.sql.execution.benchmark.AvroReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.BloomFilterBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.DateTimeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.ExtractBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.InExpressionBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.IntervalBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.JoinBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.MiscBenchmark'],
    ['hive/test', 'org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.OrcNestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.OrcV2NestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.ParquetNestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.RangeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.UDFBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.WideTableBenchmark'],
    ['hive/test', 'org.apache.spark.sql.hive.orc.OrcReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.datasources.csv.CSVBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.datasources.json.JsonBenchmark']
]

print('Set SPARK_GENERATE_BENCHMARK_FILES=1')
os.environ['SPARK_GENERATE_BENCHMARK_FILES'] = '1'

for b in benchmarks:
    print("Run benchmark: %s" % b[1])
    run_cmd(['build/sbt', '%s:runMain %s' % (b[0], b[1])])
```

Closes #27078 from MaxGekk/noop-in-benchmarks.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-12 13:18:19 -08:00

146 lines
13 KiB
Plaintext

================================================================================================
parsing large select expressions
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
parsing large select: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 select expressions 5 13 8 0.0 5370143.0 1.0X
100 select expressions 12 16 6 0.0 11995425.0 0.4X
2500 select expressions 211 214 4 0.0 210927791.0 0.0X
================================================================================================
many column field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
many column field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 cols x 100000 rows (read in-mem) 44 53 6 2.3 440.3 1.0X
1 cols x 100000 rows (exec in-mem) 44 54 9 2.3 437.0 1.0X
1 cols x 100000 rows (read parquet) 53 61 10 1.9 532.4 0.8X
1 cols x 100000 rows (write parquet) 129 142 36 0.8 1291.6 0.3X
100 cols x 1000 rows (read in-mem) 49 55 7 2.0 494.9 0.9X
100 cols x 1000 rows (exec in-mem) 69 73 5 1.4 693.2 0.6X
100 cols x 1000 rows (read parquet) 60 67 8 1.7 596.3 0.7X
100 cols x 1000 rows (write parquet) 142 156 31 0.7 1417.8 0.3X
2500 cols x 40 rows (read in-mem) 391 399 13 0.3 3912.6 0.1X
2500 cols x 40 rows (exec in-mem) 743 749 8 0.1 7432.5 0.1X
2500 cols x 40 rows (read parquet) 297 310 10 0.3 2972.8 0.1X
2500 cols x 40 rows (write parquet) 485 492 16 0.2 4848.1 0.1X
================================================================================================
wide shallowly nested struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide shallowly nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 43 48 6 2.3 427.0 1.0X
1 wide x 100000 rows (exec in-mem) 56 63 8 1.8 557.8 0.8X
1 wide x 100000 rows (read parquet) 82 88 10 1.2 818.6 0.5X
1 wide x 100000 rows (write parquet) 134 145 21 0.7 1344.6 0.3X
100 wide x 1000 rows (read in-mem) 55 61 16 1.8 553.1 0.8X
100 wide x 1000 rows (exec in-mem) 94 101 17 1.1 941.4 0.5X
100 wide x 1000 rows (read parquet) 151 179 29 0.7 1511.7 0.3X
100 wide x 1000 rows (write parquet) 147 157 9 0.7 1470.0 0.3X
2500 wide x 40 rows (read in-mem) 66 69 9 1.5 658.9 0.6X
2500 wide x 40 rows (exec in-mem) 853 871 30 0.1 8525.7 0.1X
2500 wide x 40 rows (read parquet) 1158 1296 195 0.1 11577.8 0.0X
2500 wide x 40 rows (write parquet) 157 173 23 0.6 1569.6 0.3X
================================================================================================
deeply nested struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 deep x 100000 rows (read in-mem) 37 41 6 2.7 374.5 1.0X
1 deep x 100000 rows (exec in-mem) 47 50 6 2.1 466.9 0.8X
1 deep x 100000 rows (read parquet) 58 61 7 1.7 577.7 0.6X
1 deep x 100000 rows (write parquet) 128 134 18 0.8 1282.2 0.3X
100 deep x 1000 rows (read in-mem) 345 350 5 0.3 3447.8 0.1X
100 deep x 1000 rows (exec in-mem) 1283 1283 0 0.1 12830.5 0.0X
100 deep x 1000 rows (read parquet) 1201 1205 7 0.1 12005.2 0.0X
100 deep x 1000 rows (write parquet) 436 443 9 0.2 4361.4 0.1X
250 deep x 400 rows (read in-mem) 1882 1883 1 0.1 18819.9 0.0X
250 deep x 400 rows (exec in-mem) 7705 7709 5 0.0 77054.4 0.0X
250 deep x 400 rows (read parquet) 7052 7087 50 0.0 70517.1 0.0X
250 deep x 400 rows (write parquet) 1978 1979 1 0.1 19780.3 0.0X
================================================================================================
bushy struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
bushy struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 x 1 deep x 100000 rows (read in-mem) 34 39 7 2.9 341.5 1.0X
1 x 1 deep x 100000 rows (exec in-mem) 42 45 5 2.4 423.4 0.8X
1 x 1 deep x 100000 rows (read parquet) 42 45 6 2.4 423.8 0.8X
1 x 1 deep x 100000 rows (write parquet) 124 132 19 0.8 1240.4 0.3X
128 x 8 deep x 1000 rows (read in-mem) 39 42 6 2.6 387.3 0.9X
128 x 8 deep x 1000 rows (exec in-mem) 134 138 6 0.7 1342.5 0.3X
128 x 8 deep x 1000 rows (read parquet) 147 164 27 0.7 1468.2 0.2X
128 x 8 deep x 1000 rows (write parquet) 130 142 34 0.8 1297.7 0.3X
1024 x 11 deep x 100 rows (read in-mem) 64 68 11 1.6 639.3 0.5X
1024 x 11 deep x 100 rows (exec in-mem) 642 652 14 0.2 6416.9 0.1X
1024 x 11 deep x 100 rows (read parquet) 527 531 5 0.2 5268.1 0.1X
1024 x 11 deep x 100 rows (write parquet) 155 166 28 0.6 1545.0 0.2X
================================================================================================
wide array field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide array field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 36 39 5 2.7 364.2 1.0X
1 wide x 100000 rows (exec in-mem) 46 50 7 2.2 460.4 0.8X
1 wide x 100000 rows (read parquet) 75 78 8 1.3 749.8 0.5X
1 wide x 100000 rows (write parquet) 127 133 19 0.8 1266.0 0.3X
100 wide x 1000 rows (read in-mem) 31 33 4 3.2 309.9 1.2X
100 wide x 1000 rows (exec in-mem) 40 42 4 2.5 397.3 0.9X
100 wide x 1000 rows (read parquet) 49 52 7 2.0 488.6 0.7X
100 wide x 1000 rows (write parquet) 122 135 23 0.8 1216.2 0.3X
2500 wide x 40 rows (read in-mem) 31 32 3 3.3 305.7 1.2X
2500 wide x 40 rows (exec in-mem) 39 42 5 2.6 391.9 0.9X
2500 wide x 40 rows (read parquet) 48 51 7 2.1 482.9 0.8X
2500 wide x 40 rows (write parquet) 120 130 22 0.8 1203.6 0.3X
================================================================================================
wide map field read and write
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide map field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 35 40 8 2.9 348.8 1.0X
1 wide x 100000 rows (exec in-mem) 46 47 2 2.2 461.8 0.8X
1 wide x 100000 rows (read parquet) 124 127 7 0.8 1236.1 0.3X
1 wide x 100000 rows (write parquet) 125 138 26 0.8 1245.4 0.3X
100 wide x 1000 rows (read in-mem) 26 35 8 3.8 263.1 1.3X
100 wide x 1000 rows (exec in-mem) 35 41 10 2.8 351.8 1.0X
100 wide x 1000 rows (read parquet) 59 62 8 1.7 586.7 0.6X
100 wide x 1000 rows (write parquet) 116 125 32 0.9 1158.2 0.3X
2500 wide x 40 rows (read in-mem) 27 30 5 3.7 270.2 1.3X
2500 wide x 40 rows (exec in-mem) 37 38 3 2.7 366.4 1.0X
2500 wide x 40 rows (read parquet) 58 62 8 1.7 584.3 0.6X
2500 wide x 40 rows (write parquet) 118 126 24 0.9 1176.1 0.3X