spark-instrumented-optimizer/sql/core
Wenchen Fan 469423f338 [SPARK-28595][SQL] explain should not trigger partition listing
## What changes were proposed in this pull request?

Sometimes when you explain a query, you will get stuck for a while. What's worse, you will get stuck again if you explain again.

This is caused by `FileSourceScanExec`:
1. In its `toString`, it needs to report the number of partitions it reads. This needs to query the hive metastore.
2. In its `outputOrdering`, it needs to get all the files. This needs to query the hive metastore.

This PR fixes by:
1. `toString` do not need to report the number of partitions it reads. We should report it via SQL metrics.
2. The `outputOrdering` is not very useful. We can only apply it if a) all the bucket columns are read. b) there is only one file in each bucket. This condition is really hard to meet, and even if we meet, sorting an already sorted file is pretty fast and avoiding the sort is not that useful. I think it's worth to give up this optimization so that explain don't need to get stuck.

## How was this patch tested?

existing tests

Closes #25328 from cloud-fan/ui.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-08-07 19:14:25 +08:00
..
benchmarks [SPARK-27707][SQL] Prune unnecessary nested fields from Generate 2019-07-18 23:32:07 -07:00
src [SPARK-28595][SQL] explain should not trigger partition listing 2019-08-07 19:14:25 +08:00
v1.2.1/src [SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters 2019-06-24 12:23:52 +08:00
v2.3.5/src [SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters 2019-06-24 12:23:52 +08:00
pom.xml [SPARK-27521][SQL] Move data source v2 to catalyst module 2019-06-05 09:55:55 -07:00