spark-instrumented-optimizer/sql/core/benchmarks/WideSchemaBenchmark-jdk11-results.txt
Maxim Gekk f5118f81e3 [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks
### What changes were proposed in this pull request?
In the PR, I propose to replace `.collect()`, `.count()` and `.foreach(_ => ())` in SQL benchmarks and use the `NoOp` datasource. I added an implicit class to `SqlBasedBenchmark` with the `.noop()` method. It can be used in benchmark like: `ds.noop()`. The last one is unfolded to `ds.write.format("noop").mode(Overwrite).save()`.

### Why are the changes needed?
To avoid additional overhead that `collect()` (and other actions) has. For example, `.collect()` has to convert values according to external types and pull data to the driver. This can hide actual performance regressions or improvements of benchmarked operations.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Re-run all modified benchmarks using Amazon EC2.

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge (spot instance) |
| AMI | ami-06f2f779464715dc5 (ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1) |
| Java | OpenJDK8/10 |

- Run `TPCDSQueryBenchmark` using instructions from the PR #26049
```
# `spark-tpcds-datagen` needs this. (JDK8)
$ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 spark-2.4
$ export SPARK_HOME=$PWD
$ ./build/mvn clean package -DskipTests

# Generate data. (JDK8)
$ git clone gitgithub.com:maropu/spark-tpcds-datagen.git
$ cd spark-tpcds-datagen/
$ build/mvn clean package
$ mkdir -p /data/tpcds
$ ./bin/dsdgen --output-location /data/tpcds/s1  // This need `Spark 2.4`
```
- Other benchmarks ran by the script:
```
#!/usr/bin/env python3

import os
from sparktestsupport.shellutils import run_cmd

benchmarks = [
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.AggregateBenchmark'],
    ['avro/test', 'org.apache.spark.sql.execution.benchmark.AvroReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.BloomFilterBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.DateTimeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.ExtractBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.InExpressionBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.IntervalBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.JoinBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.MiscBenchmark'],
    ['hive/test', 'org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.OrcNestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.OrcV2NestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.ParquetNestedSchemaPruningBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.RangeBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.UDFBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.benchmark.WideTableBenchmark'],
    ['hive/test', 'org.apache.spark.sql.hive.orc.OrcReadBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.datasources.csv.CSVBenchmark'],
    ['sql/test', 'org.apache.spark.sql.execution.datasources.json.JsonBenchmark']
]

print('Set SPARK_GENERATE_BENCHMARK_FILES=1')
os.environ['SPARK_GENERATE_BENCHMARK_FILES'] = '1'

for b in benchmarks:
    print("Run benchmark: %s" % b[1])
    run_cmd(['build/sbt', '%s:runMain %s' % (b[0], b[1])])
```

Closes #27078 from MaxGekk/noop-in-benchmarks.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-12 13:18:19 -08:00

146 lines
13 KiB
Plaintext

================================================================================================
parsing large select expressions
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
parsing large select: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 select expressions 8 15 8 0.0 8028037.0 1.0X
100 select expressions 15 18 3 0.0 14899892.0 0.5X
2500 select expressions 237 243 8 0.0 237252523.0 0.0X
================================================================================================
many column field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
many column field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 cols x 100000 rows (read in-mem) 59 72 8 1.7 591.0 1.0X
1 cols x 100000 rows (exec in-mem) 57 81 15 1.8 566.0 1.0X
1 cols x 100000 rows (read parquet) 61 78 13 1.6 614.8 1.0X
1 cols x 100000 rows (write parquet) 147 158 10 0.7 1468.5 0.4X
100 cols x 1000 rows (read in-mem) 57 62 6 1.8 565.8 1.0X
100 cols x 1000 rows (exec in-mem) 76 83 10 1.3 758.7 0.8X
100 cols x 1000 rows (read parquet) 70 79 10 1.4 700.8 0.8X
100 cols x 1000 rows (write parquet) 150 162 11 0.7 1498.8 0.4X
2500 cols x 40 rows (read in-mem) 413 424 15 0.2 4134.4 0.1X
2500 cols x 40 rows (exec in-mem) 753 772 23 0.1 7528.2 0.1X
2500 cols x 40 rows (read parquet) 304 312 8 0.3 3044.6 0.2X
2500 cols x 40 rows (write parquet) 507 520 11 0.2 5069.3 0.1X
================================================================================================
wide shallowly nested struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide shallowly nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 54 63 8 1.8 540.7 1.0X
1 wide x 100000 rows (exec in-mem) 67 77 11 1.5 671.8 0.8X
1 wide x 100000 rows (read parquet) 90 97 6 1.1 901.2 0.6X
1 wide x 100000 rows (write parquet) 150 163 11 0.7 1503.9 0.4X
100 wide x 1000 rows (read in-mem) 69 75 8 1.4 689.8 0.8X
100 wide x 1000 rows (exec in-mem) 111 148 96 0.9 1111.5 0.5X
100 wide x 1000 rows (read parquet) 181 241 35 0.6 1808.7 0.3X
100 wide x 1000 rows (write parquet) 164 180 27 0.6 1636.1 0.3X
2500 wide x 40 rows (read in-mem) 78 101 84 1.3 781.0 0.7X
2500 wide x 40 rows (exec in-mem) 943 966 37 0.1 9430.9 0.1X
2500 wide x 40 rows (read parquet) 1385 1453 95 0.1 13853.3 0.0X
2500 wide x 40 rows (write parquet) 175 190 19 0.6 1745.5 0.3X
================================================================================================
deeply nested struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 deep x 100000 rows (read in-mem) 44 49 6 2.3 441.1 1.0X
1 deep x 100000 rows (exec in-mem) 54 59 6 1.9 536.4 0.8X
1 deep x 100000 rows (read parquet) 65 68 6 1.5 646.1 0.7X
1 deep x 100000 rows (write parquet) 141 147 9 0.7 1413.9 0.3X
100 deep x 1000 rows (read in-mem) 459 470 11 0.2 4592.9 0.1X
100 deep x 1000 rows (exec in-mem) 1736 1740 6 0.1 17355.1 0.0X
100 deep x 1000 rows (read parquet) 1638 1643 6 0.1 16382.2 0.0X
100 deep x 1000 rows (write parquet) 555 567 12 0.2 5548.4 0.1X
250 deep x 400 rows (read in-mem) 2556 2556 1 0.0 25558.5 0.0X
250 deep x 400 rows (exec in-mem) 10410 10416 8 0.0 104102.6 0.0X
250 deep x 400 rows (read parquet) 9670 9688 26 0.0 96699.1 0.0X
250 deep x 400 rows (write parquet) 2638 2642 5 0.0 26379.7 0.0X
================================================================================================
bushy struct field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
bushy struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 x 1 deep x 100000 rows (read in-mem) 39 44 6 2.6 388.2 1.0X
1 x 1 deep x 100000 rows (exec in-mem) 48 50 4 2.1 477.4 0.8X
1 x 1 deep x 100000 rows (read parquet) 47 54 9 2.1 466.1 0.8X
1 x 1 deep x 100000 rows (write parquet) 135 141 5 0.7 1350.5 0.3X
128 x 8 deep x 1000 rows (read in-mem) 45 53 9 2.2 445.2 0.9X
128 x 8 deep x 1000 rows (exec in-mem) 155 160 4 0.6 1553.0 0.2X
128 x 8 deep x 1000 rows (read parquet) 173 217 31 0.6 1729.8 0.2X
128 x 8 deep x 1000 rows (write parquet) 139 154 10 0.7 1389.9 0.3X
1024 x 11 deep x 100 rows (read in-mem) 73 77 4 1.4 730.2 0.5X
1024 x 11 deep x 100 rows (exec in-mem) 733 738 8 0.1 7326.1 0.1X
1024 x 11 deep x 100 rows (read parquet) 652 660 8 0.2 6517.6 0.1X
1024 x 11 deep x 100 rows (write parquet) 171 186 20 0.6 1706.4 0.2X
================================================================================================
wide array field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide array field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 43 46 4 2.3 429.7 1.0X
1 wide x 100000 rows (exec in-mem) 54 57 4 1.8 542.4 0.8X
1 wide x 100000 rows (read parquet) 82 87 8 1.2 816.6 0.5X
1 wide x 100000 rows (write parquet) 137 159 19 0.7 1374.9 0.3X
100 wide x 1000 rows (read in-mem) 37 39 4 2.7 367.1 1.2X
100 wide x 1000 rows (exec in-mem) 45 50 6 2.2 451.6 1.0X
100 wide x 1000 rows (read parquet) 52 57 5 1.9 520.8 0.8X
100 wide x 1000 rows (write parquet) 125 131 8 0.8 1247.0 0.3X
2500 wide x 40 rows (read in-mem) 35 39 4 2.9 348.8 1.2X
2500 wide x 40 rows (exec in-mem) 46 49 5 2.2 456.0 0.9X
2500 wide x 40 rows (read parquet) 51 55 6 2.0 508.3 0.8X
2500 wide x 40 rows (write parquet) 129 135 6 0.8 1287.3 0.3X
================================================================================================
wide map field read and write
================================================================================================
OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04 on Linux 4.15.0-1044-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
wide map field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
1 wide x 100000 rows (read in-mem) 39 48 9 2.5 394.2 1.0X
1 wide x 100000 rows (exec in-mem) 51 56 9 1.9 514.4 0.8X
1 wide x 100000 rows (read parquet) 119 124 7 0.8 1195.0 0.3X
1 wide x 100000 rows (write parquet) 130 138 8 0.8 1299.8 0.3X
100 wide x 1000 rows (read in-mem) 31 32 3 3.3 306.5 1.3X
100 wide x 1000 rows (exec in-mem) 40 42 3 2.5 402.7 1.0X
100 wide x 1000 rows (read parquet) 65 70 6 1.5 651.8 0.6X
100 wide x 1000 rows (write parquet) 123 129 6 0.8 1228.5 0.3X
2500 wide x 40 rows (read in-mem) 33 37 6 3.0 330.1 1.2X
2500 wide x 40 rows (exec in-mem) 43 44 3 2.3 426.6 0.9X
2500 wide x 40 rows (read parquet) 66 69 9 1.5 657.8 0.6X
2500 wide x 40 rows (write parquet) 123 127 2 0.8 1234.4 0.3X