spark-instrumented-optimizer

History

Wenchen Fan 636119c54b [SPARK-31607][SQL] Improve the perf of CTESubstitution ### What changes were proposed in this pull request? In `CTESubstitution`, resolve CTE relations first, then traverse the main plan only once to substitute CTE relations. ### Why are the changes needed? Currently we will traverse the main query many times (if there are many CTE relations), which can be pretty slow if the main query is large. ### Does this PR introduce any user-facing change? No ### How was this patch tested? local perf test ``` scala> :pa // Entering paste mode (ctrl-D to finish) def test(i: Int): Unit = 1.to(i).foreach { _ => spark.sql(""" with t1 as (select 1), t2 as (select 1), t3 as (select 1), t4 as (select 1), t5 as (select 1), t6 as (select 1), t7 as (select 1), t8 as (select 1), t9 as (select 1) select * from t1, t2, t3, t4, t5, t6, t7, t8, t9""").queryExecution.assertAnalyzed() } // Exiting paste mode, now interpreting. test: (i: Int)Unit scala> test(10000) scala> println(org.apache.spark.sql.catalyst.rules.RuleExecutor.dumpTimeSpent) ``` The result before this patch ``` Rule Effective Time / Total Time Effective Runs / Total Runs CTESubstitution 3328796344 / 3924576425 10000 / 20000 ``` The result after this patch ``` Rule Effective Time / Total Time Effective Runs / Total Runs CTESubstitution 1503085936 / 2091992092 10000 / 20000 ``` About 2 times faster. Closes #28407 from cloud-fan/cte. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-04-30 12:11:16 +00:00
..
benchmarks	[SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter	2020-01-19 19:12:19 -08:00
src	[SPARK-31607][SQL] Improve the perf of CTESubstitution	2020-04-30 12:11:16 +00:00
pom.xml	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00