spark-instrumented-optimizer/sql/catalyst
Wenchen Fan 636119c54b [SPARK-31607][SQL] Improve the perf of CTESubstitution
### What changes were proposed in this pull request?

In `CTESubstitution`, resolve CTE relations first, then traverse the main plan only once to substitute CTE relations.

### Why are the changes needed?

Currently we will traverse the main query many times (if there are many CTE relations), which can be pretty slow if the main query is large.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

local perf test
```
scala> :pa
// Entering paste mode (ctrl-D to finish)

def test(i: Int): Unit = 1.to(i).foreach { _ =>
  spark.sql("""
    with
    t1 as (select 1),
    t2 as (select 1),
    t3 as (select 1),
    t4 as (select 1),
    t5 as (select 1),
    t6 as (select 1),
    t7 as (select 1),
    t8 as (select 1),
    t9 as (select 1)
    select * from t1, t2, t3, t4, t5, t6, t7, t8, t9""").queryExecution.assertAnalyzed()
}

// Exiting paste mode, now interpreting.

test: (i: Int)Unit

scala> test(10000)

scala> println(org.apache.spark.sql.catalyst.rules.RuleExecutor.dumpTimeSpent)
```

The result before this patch
```
Rule                                       Effective Time / Total Time                     Effective Runs / Total Runs
CTESubstitution                            3328796344 / 3924576425                         10000 / 20000
```
The result after this patch
```
Rule                                       Effective Time / Total Time                     Effective Runs / Total Runs
CTESubstitution                            1503085936 / 2091992092                         10000 / 20000
```
About 2 times faster.

Closes #28407 from cloud-fan/cte.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-30 12:11:16 +00:00
..
benchmarks [SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter 2020-01-19 19:12:19 -08:00
src [SPARK-31607][SQL] Improve the perf of CTESubstitution 2020-04-30 12:11:16 +00:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00