spark-instrumented-optimizer

History

yi.wu f05b940749 [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join ### What changes were proposed in this pull request? This PR introduces a new analysis rule `DeduplicateRelations`, which deduplicates any duplicate relations in a plan first and then deduplicates conflicting attributes(which resued the `dedupRight` of `ResolveReferences`). ### Why are the changes needed? `CostBasedJoinReorder` could fail when applying on self-join, e.g., ```scala // test in JoinReorderSuite test("join reorder with self-join") { val plan = t2.join(t1, Inner, Some(nameToAttr("t1.k-1-2") === nameToAttr("t2.k-1-5"))) .select(nameToAttr("t1.v-1-10")) .join(t2, Inner, Some(nameToAttr("t1.v-1-10") === nameToAttr("t2.k-1-5"))) // this can fail Optimize.execute(plan.analyze) } ``` Besides, with the new rule `DeduplicateRelations`, we'd be able to enable some optimizations, e.g., LeftSemiAnti pushdown, redundant project removal, as reflects in updated unit tests. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Added and updated unit tests. Closes #31470 from Ngone51/join-reorder. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2021-03-31 14:28:35 +08:00
..
benchmarks	[SPARK-34815][SQL] Update CSVBenchmark	2021-03-22 10:49:53 +03:00
src	[SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join	2021-03-31 14:28:35 +08:00
pom.xml	[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT	2020-12-04 14:10:42 -08:00