[SPARK-27439][SQL] Use analyzed plan when explaining Dataset

## What changes were proposed in this pull request? Because a review is resolved during analysis when we create a dataset, the content of the view is determined when the dataset is created, not when it is evaluated. Now the explain result of a dataset is not correctly consistent with the collected result of it, because we use pre-analyzed logical plan of the dataset in explain command. The explain command will analyzed the logical plan passed in. So if a view is changed after the dataset was created, the plans shown by explain command aren't the same with the plan of the dataset. ```scala scala> spark.range(10).createOrReplaceTempView("test") scala> spark.range(5).createOrReplaceTempView("test2") scala> spark.sql("select * from test").createOrReplaceTempView("tmp001") scala> val df = spark.sql("select * from tmp001") scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001") scala> df.show +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ scala> df.explain ``` Before: ```scala == Physical Plan == *(1) Range (0, 5, step=1, splits=12) ``` After: ```scala == Physical Plan == *(1) Range (0, 10, step=1, splits=12) ``` ## How was this patch tested? Manually test and unit test. Closes #24415 from viirya/SPARK-27439. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-21 10:25:56 -07:00 · 2019-04-21 10:25:56 -07:00 · ad60c6d9be
parent 4cb1cd6ab7
commit ad60c6d9be
2 changed files with 22 additions and 2 deletions
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@ -498,7 +498,10 @@ class Dataset[T] private[sql](
   * @since 1.6.0
   */
  def explain(extended: Boolean): Unit = {
-    val explain = ExplainCommand(queryExecution.logical, extended = extended)
+    // Because views are possibly resolved in the analyzed plan of this dataset. We use analyzed
+    // plan in `ExplainCommand`, for consistency. Otherwise, the plans shown by explain command
+    // might be inconsistent with the evaluated data of this dataset.
+    val explain = ExplainCommand(queryExecution.analyzed, extended = extended)
    sparkSession.sessionState.executePlan(explain).executedPlan.executeCollect().foreach {
      // scalastyle:off println
      r => println(r.getString(0))
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@ -17,7 +17,7 @@

 package org.apache.spark.sql

-import java.io.File
+import java.io.{ByteArrayOutputStream, File}
 import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 import java.util.UUID
@ -2133,4 +2133,21 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
      checkAnswer(res, Row("1-1", 6, 6))
    }
  }
+
+  test("SPARK-27439: Explain result should match collected result after view change") {
+    withTempView("test", "test2", "tmp") {
+      spark.range(10).createOrReplaceTempView("test")
+      spark.range(5).createOrReplaceTempView("test2")
+      spark.sql("select * from test").createOrReplaceTempView("tmp")
+      val df = spark.sql("select * from tmp")
+      spark.sql("select * from test2").createOrReplaceTempView("tmp")
+
+      val captured = new ByteArrayOutputStream()
+      Console.withOut(captured) {
+        df.explain()
+      }
+      checkAnswer(df, spark.range(10).toDF)
+      assert(captured.toString().contains("Range (0, 10, step=1, splits=2)"))
+    }
+  }
 }