spark-instrumented-optimizer/sql/core
Achuth17 d36539741f [SPARK-24626][SQL] Improve location size calculation in Analyze Table command
## What changes were proposed in this pull request?

Currently, Analyze table calculates table size sequentially for each partition. We can parallelize size calculations over partitions.

Results : Tested on a table with 100 partitions and data stored in S3.
With changes :
- 10.429s
- 10.557s
- 10.439s
- 9.893s


Without changes :
- 110.034s
- 99.510s
- 100.743s
- 99.106s

## How was this patch tested?

Simple unit test.

Closes #21608 from Achuth17/improveAnalyze.

Lead-authored-by: Achuth17 <Achuth.narayan@gmail.com>
Co-authored-by: arajagopal17 <arajagopal@qubole.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
2018-08-09 08:29:24 -07:00
..
benchmarks [SPARK-24549][SQL] Support Decimal type push down to the parquet data sources 2018-07-16 15:44:51 +08:00
src [SPARK-24626][SQL] Improve location size calculation in Analyze Table command 2018-08-09 08:29:24 -07:00
pom.xml [SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules 2018-08-06 12:00:39 -07:00