spark-instrumented-optimizer/sql/core
Liang-Chi Hsieh 4b725e50a7 [SPARK-27439][SQL] Explainging Dataset should show correct resolved plans
## What changes were proposed in this pull request?

Because a review is resolved during analysis when we create a dataset, the content of the view is determined when the dataset is created, not when it is evaluated. Now the explain result of a dataset is not correctly consistent with the collected result of it, because we use pre-analyzed logical plan of the dataset in explain command. The explain command will analyzed the logical plan passed in. So if a view is changed after the dataset was created, the plans shown by explain command aren't the same with the plan of the dataset.

```scala
scala> spark.range(10).createOrReplaceTempView("test")
scala> spark.range(5).createOrReplaceTempView("test2")
scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
scala> val df = spark.sql("select * from tmp001")
scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
scala> df.show
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
scala> df.explain(true)
```

Before:
```scala
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `tmp001`

== Analyzed Logical Plan ==
id: bigint
Project [id#2L]
+- SubqueryAlias `tmp001`
   +- Project [id#2L]
      +- SubqueryAlias `test2`
         +- Range (0, 5, step=1, splits=Some(12))

== Optimized Logical Plan ==
Range (0, 5, step=1, splits=Some(12))

== Physical Plan ==
*(1) Range (0, 5, step=1, splits=12)
```

After:
```scala
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `tmp001`

== Analyzed Logical Plan ==
id: bigint
Project [id#0L]
+- SubqueryAlias `tmp001`
   +- Project [id#0L]
      +- SubqueryAlias `test`
         +- Range (0, 10, step=1, splits=Some(12))

== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12))

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

To fix it, this passes query execution of Dataset when explaining it. The query execution contains pre-analyzed plan which is consistent with Dataset's result.

## How was this patch tested?

Manually test and unit test.

Closes #24464 from viirya/SPARK-27439-2.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-05-05 23:19:19 -07:00
..
benchmarks [SPARK-27535][SQL][TEST] Date and timestamp JSON benchmarks 2019-04-23 11:09:14 +09:00
src [SPARK-27439][SQL] Explainging Dataset should show correct resolved plans 2019-05-05 23:19:19 -07:00
v1.2.1/src [SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1 2019-03-26 22:32:03 -07:00
v2.3.4/src [SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 2019-04-08 08:42:21 -07:00
pom.xml [SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1 2019-03-26 22:32:03 -07:00