cb501771fa
### What changes were proposed in this pull request? This PR aims the followings. - Refactor `TPCDSQueryBenchmark` to use main method to improve the usability. - Reduce the number of iteration from 5 to 2 because it takes too long. (2 is okay because we have `Stdev` field now. If there is an irregular run, we can notice easily with that). - Generate one result file for TPCDS scale factor 1. (Note that this test suite can be used for the other scale factors, too.) - AWS EC2 `r3.xlarge` with `ami-06f2f779464715dc5 (ubuntu-bionic-18.04-amd64-server-20190722.1)` is used. This PR adds a JDK8 result based on the TPCDS ScaleFactor 1G data generated by the following. ``` # `spark-tpcds-datagen` needs this. (JDK8) $ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 spark-2.4 $ export SPARK_HOME=$PWD $ ./build/mvn clean package -DskipTests # Generate data. (JDK8) $ git clone gitgithub.com:maropu/spark-tpcds-datagen.git $ cd spark-tpcds-datagen/ $ build/mvn clean package $ mkdir -p /data/tpcds $ ./bin/dsdgen --output-location /data/tpcds/s1 // This need `Spark 2.4` ``` ### Why are the changes needed? Although the generated TPCDS data is random, we can keep the record. ### Does this PR introduce any user-facing change? No. (This is dev-only test benchmark). ### How was this patch tested? Manually run the benchmark. Please note that you need to have TPCDS data. ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /data/tpcds/s1" ``` Closes #26049 from dongjoon-hyun/SPARK-25668. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
v1.2.1/src | ||
v2.3.5/src | ||
pom.xml |