spark-instrumented-optimizer

History

Wenchen Fan faf220aad9 [SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown Bring back https://github.com/apache/spark/pull/25955 ### What changes were proposed in this pull request? This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan. To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set. This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class. ### Why are the changes needed? This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations. Closes #26341 from cloud-fan/back. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-31 08:25:32 -07:00
..
main	[SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters	2019-06-24 12:23:52 +08:00
test/scala/org/apache/spark/sql/execution/datasources/orc	[SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown	2019-10-31 08:25:32 -07:00

Wenchen Fan faf220aad9 [SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown

Bring back https://github.com/apache/spark/pull/25955

### What changes were proposed in this pull request?

This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan.

To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set.

This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class.

### Why are the changes needed?

This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations.

Closes #26341 from cloud-fan/back.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

2019-10-31 08:25:32 -07:00

main

[SPARK-28108][SQL][test-hadoop3.2] Simplify OrcFilters

2019-06-24 12:23:52 +08:00

test/scala/org/apache/spark/sql/execution/datasources/orc

[SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown

2019-10-31 08:25:32 -07:00