spark-instrumented-optimizer/sql/core/v2.3.5/src
Wenchen Fan faf220aad9 [SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown
Bring back https://github.com/apache/spark/pull/25955

### What changes were proposed in this pull request?

This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan.

To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set.

This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class.

### Why are the changes needed?

This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations.

Closes #26341 from cloud-fan/back.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-31 08:25:32 -07:00
..
main [SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters 2019-06-24 12:23:52 +08:00
test/scala/org/apache/spark/sql/execution/datasources/orc [SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown 2019-10-31 08:25:32 -07:00