spark-instrumented-optimizer/sql/core
Dongjoon Hyun cb501771fa [SPARK-25668][SQL][TESTS] Refactor TPCDSQueryBenchmark to use main method
### What changes were proposed in this pull request?

This PR aims the followings.
- Refactor `TPCDSQueryBenchmark` to use main method to improve the usability.
- Reduce the number of iteration from 5 to 2 because it takes too long. (2 is okay because we have `Stdev` field now. If there is an irregular run, we can notice easily with that).
- Generate one result file for TPCDS scale factor 1. (Note that this test suite can be used for the other scale factors, too.)
  - AWS EC2 `r3.xlarge` with `ami-06f2f779464715dc5 (ubuntu-bionic-18.04-amd64-server-20190722.1)` is used.

This PR adds a JDK8 result based on the TPCDS ScaleFactor 1G data generated by the following.
```
# `spark-tpcds-datagen` needs this. (JDK8)
$ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 spark-2.4
$ export SPARK_HOME=$PWD
$ ./build/mvn clean package -DskipTests

# Generate data. (JDK8)
$ git clone gitgithub.com:maropu/spark-tpcds-datagen.git
$ cd spark-tpcds-datagen/
$ build/mvn clean package
$ mkdir -p /data/tpcds
$ ./bin/dsdgen --output-location /data/tpcds/s1  // This need `Spark 2.4`
```

### Why are the changes needed?

Although the generated TPCDS data is random, we can keep the record.

### Does this PR introduce any user-facing change?

No. (This is dev-only test benchmark).

### How was this patch tested?

Manually run the benchmark. Please note that you need to have TPCDS data.
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /data/tpcds/s1"
```

Closes #26049 from dongjoon-hyun/SPARK-25668.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-10-08 13:33:42 +09:00
..
benchmarks [SPARK-25668][SQL][TESTS] Refactor TPCDSQueryBenchmark to use main method 2019-10-08 13:33:42 +09:00
src [SPARK-25668][SQL][TESTS] Refactor TPCDSQueryBenchmark to use main method 2019-10-08 13:33:42 +09:00
v1.2.1/src [SPARK-28744][SQL][TEST] rename SharedSQLContext to SharedSparkSession 2019-08-19 19:01:56 +08:00
v2.3.5/src [SPARK-28744][SQL][TEST] rename SharedSQLContext to SharedSparkSession 2019-08-19 19:01:56 +08:00
pom.xml [SPARK-29296][BUILD][CORE] Remove use of .par to make 2.13 support easier; add scala-2.13 profile to enable pulling in par collections library separately, for the future 2019-10-03 08:56:08 -05:00