cb501771fa
### What changes were proposed in this pull request? This PR aims the followings. - Refactor `TPCDSQueryBenchmark` to use main method to improve the usability. - Reduce the number of iteration from 5 to 2 because it takes too long. (2 is okay because we have `Stdev` field now. If there is an irregular run, we can notice easily with that). - Generate one result file for TPCDS scale factor 1. (Note that this test suite can be used for the other scale factors, too.) - AWS EC2 `r3.xlarge` with `ami-06f2f779464715dc5 (ubuntu-bionic-18.04-amd64-server-20190722.1)` is used. This PR adds a JDK8 result based on the TPCDS ScaleFactor 1G data generated by the following. ``` # `spark-tpcds-datagen` needs this. (JDK8) $ git clone https://github.com/apache/spark.git -b branch-2.4 --depth 1 spark-2.4 $ export SPARK_HOME=$PWD $ ./build/mvn clean package -DskipTests # Generate data. (JDK8) $ git clone gitgithub.com:maropu/spark-tpcds-datagen.git $ cd spark-tpcds-datagen/ $ build/mvn clean package $ mkdir -p /data/tpcds $ ./bin/dsdgen --output-location /data/tpcds/s1 // This need `Spark 2.4` ``` ### Why are the changes needed? Although the generated TPCDS data is random, we can keep the record. ### Does this PR introduce any user-facing change? No. (This is dev-only test benchmark). ### How was this patch tested? Manually run the benchmark. Please note that you need to have TPCDS data. ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /data/tpcds/s1" ``` Closes #26049 from dongjoon-hyun/SPARK-25668. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
AggregateBenchmark-jdk11-results.txt | ||
AggregateBenchmark-results.txt | ||
BloomFilterBenchmark-jdk11-results.txt | ||
BloomFilterBenchmark-results.txt | ||
BuiltInDataSourceWriteBenchmark-jdk11-results.txt | ||
BuiltInDataSourceWriteBenchmark-results.txt | ||
ColumnarBatchBenchmark-jdk11-results.txt | ||
ColumnarBatchBenchmark-results.txt | ||
CompressionSchemeBenchmark-jdk11-results.txt | ||
CompressionSchemeBenchmark-results.txt | ||
CSVBenchmark-jdk11-results.txt | ||
CSVBenchmark-results.txt | ||
DatasetBenchmark-jdk11-results.txt | ||
DatasetBenchmark-results.txt | ||
DataSourceReadBenchmark-jdk11-results.txt | ||
DataSourceReadBenchmark-results.txt | ||
DateTimeBenchmark-jdk11-results.txt | ||
DateTimeBenchmark-results.txt | ||
ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk11-results.txt | ||
ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt | ||
ExtractBenchmark-jdk11-results.txt | ||
ExtractBenchmark-results.txt | ||
FilterPushdownBenchmark-results.txt | ||
HashedRelationMetricsBenchmark-jdk11-results.txt | ||
HashedRelationMetricsBenchmark-results.txt | ||
InExpressionBenchmark-results.txt | ||
JoinBenchmark-jdk11-results.txt | ||
JoinBenchmark-results.txt | ||
JSONBenchmark-jdk11-results.txt | ||
JsonBenchmark-results.txt | ||
MakeDateTimeBenchmark-jdk11-results.txt | ||
MakeDateTimeBenchmark-results.txt | ||
MiscBenchmark-jdk11-results.txt | ||
MiscBenchmark-results.txt | ||
OrcNestedSchemaPruningBenchmark-jdk11-results.txt | ||
OrcNestedSchemaPruningBenchmark-results.txt | ||
OrcV2NestedSchemaPruningBenchmark-jdk11-results.txt | ||
OrcV2NestedSchemaPruningBenchmark-results.txt | ||
ParquetNestedSchemaPruningBenchmark-jdk11-results.txt | ||
ParquetNestedSchemaPruningBenchmark-results.txt | ||
PrimitiveArrayBenchmark-jdk11-results.txt | ||
PrimitiveArrayBenchmark-results.txt | ||
RangeBenchmark-jdk11-results.txt | ||
RangeBenchmark-results.txt | ||
SortBenchmark-jdk11-results.txt | ||
SortBenchmark-results.txt | ||
TPCDSQueryBenchmark-results.txt | ||
UDFBenchmark-jdk11-results.txt | ||
UDFBenchmark-results.txt | ||
UnsafeArrayDataBenchmark-jdk11-results.txt | ||
UnsafeArrayDataBenchmark-results.txt | ||
WideSchemaBenchmark-results.txt | ||
WideTableBenchmark-jdk11-results.txt | ||
WideTableBenchmark-results.txt |