[SPARK-37939][SQL][3.3] Use error classes in the parsing errors of properties

## What changes were proposed in this pull request? Migrate the following errors in QueryParsingErrors onto use error classes: - cannotCleanReservedNamespacePropertyError => UNSUPPORTED_FEATURE - cannotCleanReservedTablePropertyError => UNSUPPORTED_FEATURE - invalidPropertyKeyForSetQuotedConfigurationError => INVALID_PROPERTY_KEY - invalidPropertyValueForSetQuotedConfigurationError => INVALID_PROPERTY_VALUE - propertiesAndDbPropertiesBothSpecifiedError => UNSUPPORTED_FEATURE This is a backport of https://github.com/apache/spark/pull/36561. ### Why are the changes needed? Porting parsing errors of partitions to new error framework, improve test coverage, and document expected error messages in tests. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running new test: ``` $ build/sbt "sql/testOnly *QueryParsingErrorsSuite*" ``` Closes #36916 from panbingkun/branch-3.3-SPARK-37939. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
[SPARK-39163][SQL][3.3] Throw an exception w/ error class for an invalid bucket file
2022-06-21 17:12:39 +03:00 · 2022-06-20 11:04:14 +09:00 · 2022-06-19 12:34:41 +03:00 · 2022-06-18 09:25:21 +09:00 · 2022-06-17 13:30:33 -05:00
18 changed files with 246 additions and 81 deletions
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@ -95,6 +95,9 @@
  "INVALID_ARRAY_INDEX_IN_ELEMENT_AT" : {
    "message" : [ "The index <indexValue> is out of bounds. The array has <arraySize> elements. Use `try_element_at` to tolerate accessing element at invalid index and return NULL instead. If necessary set <config> to \"false\" to bypass this error." ]
  },
+  "INVALID_BUCKET_FILE" : {
+    "message" : [ "Invalid bucket file: <path>" ]
+  },
  "INVALID_FIELD_NAME" : {
    "message" : [ "Field name <fieldName> is invalid: <path> is not a struct." ],
    "sqlState" : "42000"
@ -110,6 +113,12 @@
    "message" : [ "The value of parameter(s) '<parameter>' in <functionName> is invalid: <expected>" ],
    "sqlState" : "22023"
  },
+  "INVALID_PROPERTY_KEY" : {
+    "message" : [ "<key> is an invalid property key, please use quotes, e.g. SET <key>=<value>" ]
+  },
+  "INVALID_PROPERTY_VALUE" : {
+    "message" : [ "<value> is an invalid property value, please use quotes, e.g. SET <key>=<value>" ]
+  },
  "INVALID_SQL_SYNTAX" : {
    "message" : [ "Invalid SQL syntax: <inputString>" ],
    "sqlState" : "42000"
@ -174,6 +183,17 @@
    "message" : [ "Unsupported data type <typeName>" ],
    "sqlState" : "0A000"
  },
+  "UNSUPPORTED_DESERIALIZER" : {
+    "message" : [ "The deserializer is not supported: " ],
+    "subClass" : {
+      "DATA_TYPE_MISMATCH" : {
+        "message" : [ "need a(n) <desiredType> field but got <dataType>." ]
+      },
+      "FIELD_NUMBER_MISMATCH" : {
+        "message" : [ "try to map <schema> to Tuple<ordinal>, but failed as the number of fields does not line up." ]
+      }
+    }
+  },
  "UNSUPPORTED_FEATURE" : {
    "message" : [ "The feature is not supported: <feature>" ],
    "sqlState" : "0A000"
--- a/core/src/main/resources/org/apache/spark/ui/static/log-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/log-view.js
@ -85,7 +85,7 @@ function loadMore() {
      if (retStartByte == 0) {
        disableMoreButton();
      }
-      $("pre", ".log-content").prepend(cleanData);
+      $("pre", ".log-content").prepend(document.createTextNode(cleanData));

      curLogLength = curLogLength + (startByte - retStartByte);
      startByte = retStartByte;
@ -115,7 +115,7 @@ function loadNew() {
            var retLogLength = dataInfo[2];

            var cleanData = data.substring(newlineIndex + 1);
-            $("pre", ".log-content").append(cleanData);
+            $("pre", ".log-content").append(document.createTextNode(cleanData));

            curLogLength = curLogLength + (retEndByte - retStartByte);
            endByte = retEndByte;
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
@ -452,13 +452,17 @@ case class Inline(child: Expression) extends UnaryExpression with CollectionGene

  private lazy val numFields = elementSchema.fields.length

+  private lazy val generatorNullRow = new GenericInternalRow(elementSchema.length)
+
  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
    val inputArray = child.eval(input).asInstanceOf[ArrayData]
    if (inputArray == null) {
      Nil
    } else {
-      for (i <- 0 until inputArray.numElements())
-        yield inputArray.getStruct(i, numFields)
+      for (i <- 0 until inputArray.numElements()) yield {
+        val s = inputArray.getStruct(i, numFields)
+        if (s == null) generatorNullRow else s
+      }
    }
  }

--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@ -146,16 +146,18 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase {

  def dataTypeMismatchForDeserializerError(
      dataType: DataType, desiredType: String): Throwable = {
-    val quantifier = if (desiredType.equals("array")) "an" else "a"
    new AnalysisException(
-      s"need $quantifier $desiredType field but got " + dataType.catalogString)
+      errorClass = "UNSUPPORTED_DESERIALIZER",
+      messageParameters =
+        Array("DATA_TYPE_MISMATCH", toSQLType(desiredType), toSQLType(dataType)))
  }

  def fieldNumberMismatchForDeserializerError(
      schema: StructType, maxOrdinal: Int): Throwable = {
    new AnalysisException(
-      s"Try to map ${schema.catalogString} to Tuple${maxOrdinal + 1}, " +
-        "but failed as the number of fields does not line up.")
+      errorClass = "UNSUPPORTED_DESERIALIZER",
+      messageParameters =
+        Array("FIELD_NUMBER_MISMATCH", toSQLType(schema), (maxOrdinal + 1).toString))
  }

  def upCastFailureError(
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
@ -79,6 +79,10 @@ private[sql] trait QueryErrorsBase {
    quoteByDefault(t.sql)
  }

+  def toSQLType(text: String): String = {
+    quoteByDefault(text.toUpperCase(Locale.ROOT))
+  }
+
  def toSQLConf(conf: String): String = {
    quoteByDefault(conf)
  }
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@ -2075,4 +2075,9 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase {
    new SparkException(errorClass = "NULL_COMPARISON_RESULT",
      messageParameters = Array(), cause = null)
  }
+
+  def invalidBucketFile(path: String): Throwable = {
+    new SparkException(errorClass = "INVALID_BUCKET_FILE", messageParameters = Array(path),
+      cause = null)
+  }
 }
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@ -260,17 +260,20 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase {
  }

  def cannotCleanReservedNamespacePropertyError(
-      property: String, ctx: ParserRuleContext, msg: String): Throwable = {
-    new ParseException(s"$property is a reserved namespace property, $msg.", ctx)
+      property: String, ctx: ParserRuleContext, msg: String): ParseException = {
+    new ParseException("UNSUPPORTED_FEATURE",
+      Array(s"$property is a reserved namespace property, $msg."), ctx)
  }

-  def propertiesAndDbPropertiesBothSpecifiedError(ctx: CreateNamespaceContext): Throwable = {
-    new ParseException("Either PROPERTIES or DBPROPERTIES is allowed.", ctx)
+  def propertiesAndDbPropertiesBothSpecifiedError(ctx: CreateNamespaceContext): ParseException = {
+    new ParseException("UNSUPPORTED_FEATURE",
+      Array("set PROPERTIES and DBPROPERTIES at the same time."), ctx)
  }

  def cannotCleanReservedTablePropertyError(
-      property: String, ctx: ParserRuleContext, msg: String): Throwable = {
-    new ParseException(s"$property is a reserved table property, $msg.", ctx)
+      property: String, ctx: ParserRuleContext, msg: String): ParseException = {
+    new ParseException("UNSUPPORTED_FEATURE",
+      Array(s"$property is a reserved table property, $msg."), ctx)
  }

  def duplicatedTablePathsFoundError(
@ -367,15 +370,17 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase {
  }

  def invalidPropertyKeyForSetQuotedConfigurationError(
-      keyCandidate: String, valueStr: String, ctx: ParserRuleContext): Throwable = {
-    new ParseException(s"'$keyCandidate' is an invalid property key, please " +
-      s"use quotes, e.g. SET `$keyCandidate`=`$valueStr`", ctx)
+      keyCandidate: String, valueStr: String, ctx: ParserRuleContext): ParseException = {
+    new ParseException(errorClass = "INVALID_PROPERTY_KEY",
+      messageParameters = Array(toSQLConf(keyCandidate),
+        toSQLConf(keyCandidate), toSQLConf(valueStr)), ctx)
  }

  def invalidPropertyValueForSetQuotedConfigurationError(
-      valueCandidate: String, keyStr: String, ctx: ParserRuleContext): Throwable = {
-    new ParseException(s"'$valueCandidate' is an invalid property value, please " +
-      s"use quotes, e.g. SET `$keyStr`=`$valueCandidate`", ctx)
+      valueCandidate: String, keyStr: String, ctx: ParserRuleContext): ParseException = {
+    new ParseException(errorClass = "INVALID_PROPERTY_VALUE",
+      messageParameters = Array(toSQLConf(valueCandidate),
+        toSQLConf(keyStr), toSQLConf(valueCandidate)), ctx)
  }

  def unexpectedFormatForResetConfigurationError(ctx: ResetConfigurationContext): Throwable = {
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala
@ -118,7 +118,7 @@ class EncoderResolutionSuite extends PlanTest {
    val encoder = ExpressionEncoder[ArrayClass]
    val attrs = Seq('arr.int)
    assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-      "need an array field but got int")
+      """The deserializer is not supported: need a(n) "ARRAY" field but got "INT".""")
  }

  test("the real type is not compatible with encoder schema: array element type") {
@ -134,7 +134,7 @@ class EncoderResolutionSuite extends PlanTest {
    withClue("inner element is not array") {
      val attrs = Seq('nestedArr.array(new StructType().add("arr", "int")))
      assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-        "need an array field but got int")
+        """The deserializer is not supported: need a(n) "ARRAY" field but got "INT".""")
    }

    withClue("nested array element type is not compatible") {
@ -168,15 +168,16 @@ class EncoderResolutionSuite extends PlanTest {
    {
      val attrs = Seq('a.string, 'b.long, 'c.int)
      assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-        "Try to map struct<a:string,b:bigint,c:int> to Tuple2, " +
-          "but failed as the number of fields does not line up.")
+        """The deserializer is not supported: """ +
+        """try to map "STRUCT<a: STRING, b: BIGINT, c: INT>" to Tuple2, """ +
+        """but failed as the number of fields does not line up.""")
    }

    {
      val attrs = Seq('a.string)
      assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-        "Try to map struct<a:string> to Tuple2, " +
-          "but failed as the number of fields does not line up.")
+        """The deserializer is not supported: try to map "STRUCT<a: STRING>" to Tuple2, """ +
+        """but failed as the number of fields does not line up.""")
    }
  }

@ -186,15 +187,17 @@ class EncoderResolutionSuite extends PlanTest {
    {
      val attrs = Seq('a.string, 'b.struct('x.long, 'y.string, 'z.int))
      assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-        "Try to map struct<x:bigint,y:string,z:int> to Tuple2, " +
-          "but failed as the number of fields does not line up.")
+        """The deserializer is not supported: """ +
+        """try to map "STRUCT<x: BIGINT, y: STRING, z: INT>" to Tuple2, """ +
+        """but failed as the number of fields does not line up.""")
    }

    {
      val attrs = Seq('a.string, 'b.struct('x.long))
      assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message ==
-        "Try to map struct<x:bigint> to Tuple2, " +
-          "but failed as the number of fields does not line up.")
+        """The deserializer is not supported: """ +
+        """try to map "STRUCT<x: BIGINT>" to Tuple2, """ +
+        """but failed as the number of fields does not line up.""")
    }
  }

--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@ -31,6 +31,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.QueryPlan
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning, UnknownPartitioning}
 import org.apache.spark.sql.catalyst.util.truncatedString
+import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat => ParquetSource}
 import org.apache.spark.sql.execution.datasources.v2.PushedDownOperators
@ -592,8 +593,7 @@ case class FileSourceScanExec(
      }.groupBy { f =>
        BucketingUtils
          .getBucketId(new Path(f.filePath).getName)
-          // TODO(SPARK-39163): Throw an exception w/ error class for an invalid bucket file
-          .getOrElse(throw new IllegalStateException(s"Invalid bucket file ${f.filePath}"))
+          .getOrElse(throw QueryExecutionErrors.invalidBucketFile(f.filePath))
      }

    val prunedFilesGroupedToBuckets = if (optionalBucketSet.isDefined) {
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@ -1002,24 +1002,6 @@ class DatasetSuite extends QueryTest
    checkDataset(cogrouped, "a13", "b24")
  }

-  test("give nice error message when the real number of fields doesn't match encoder schema") {
-    val ds = Seq(ClassData("a", 1), ClassData("b", 2)).toDS()
-
-    val message = intercept[AnalysisException] {
-      ds.as[(String, Int, Long)]
-    }.message
-    assert(message ==
-      "Try to map struct<a:string,b:int> to Tuple3, " +
-        "but failed as the number of fields does not line up.")
-
-    val message2 = intercept[AnalysisException] {
-      ds.as[Tuple1[String]]
-    }.message
-    assert(message2 ==
-      "Try to map struct<a:string,b:int> to Tuple1, " +
-        "but failed as the number of fields does not line up.")
-  }
-
  test("SPARK-13440: Resolving option fields") {
    val df = Seq(1, 2, 3).toDS()
    val ds = df.as[Option[Int]]
--- a/sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCo
 import org.apache.spark.sql.catalyst.expressions.codegen.Block._
 import org.apache.spark.sql.catalyst.trees.LeafLike
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types.{IntegerType, StructType}

@ -389,7 +390,7 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession {
    }
  }

-  test("SPARK-39061: inline should handle null struct") {
+  def testNullStruct(): Unit = {
    val df = sql(
      """select * from values
        |(
@ -413,6 +414,16 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession {
      sql("select a, inline(b) from t1"),
      Row(1, 0, 1) :: Row(1, null, null) :: Row(1, 2, 3) :: Row(1, null, null) :: Nil)
  }
+
+  test("SPARK-39061: inline should handle null struct") {
+    testNullStruct
+  }
+
+  test("SPARK-39496: inline eval path should handle null struct") {
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
+      testNullStruct
+    }
+  }
 }

 case class EmptyGenerator() extends Generator with LeafLike[Expression] {
--- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala
@ -17,7 +17,7 @@

 package org.apache.spark.sql.errors

-import org.apache.spark.sql.{AnalysisException, IntegratedUDFTestUtils, QueryTest}
+import org.apache.spark.sql.{AnalysisException, ClassData, IntegratedUDFTestUtils, QueryTest}
 import org.apache.spark.sql.functions.{grouping, grouping_id, sum}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
@ -28,6 +28,8 @@ case class StringIntClass(a: String, b: Int)

 case class ComplexClass(a: Long, b: StringLongClass)

+case class ArrayClass(arr: Seq[StringIntClass])
+
 class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
  import testImplicits._

@ -173,4 +175,34 @@ class QueryCompilationErrorsSuite extends QueryTest with SharedSparkSession {
      "The feature is not supported: " +
      "Pandas UDF aggregate expressions don't support pivot.")
  }
+
+  test("UNSUPPORTED_DESERIALIZER: data type mismatch") {
+    val e = intercept[AnalysisException] {
+      sql("select 1 as arr").as[ArrayClass]
+    }
+    assert(e.errorClass === Some("UNSUPPORTED_DESERIALIZER"))
+    assert(e.message ===
+      """The deserializer is not supported: need a(n) "ARRAY" field but got "INT".""")
+  }
+
+  test("UNSUPPORTED_DESERIALIZER:" +
+    "the real number of fields doesn't match encoder schema") {
+    val ds = Seq(ClassData("a", 1), ClassData("b", 2)).toDS()
+
+    val e1 = intercept[AnalysisException] {
+      ds.as[(String, Int, Long)]
+    }
+    assert(e1.errorClass === Some("UNSUPPORTED_DESERIALIZER"))
+    assert(e1.message ===
+      "The deserializer is not supported: try to map \"STRUCT<a: STRING, b: INT>\" " +
+      "to Tuple3, but failed as the number of fields does not line up.")
+
+    val e2 = intercept[AnalysisException] {
+      ds.as[Tuple1[String]]
+    }
+    assert(e2.errorClass === Some("UNSUPPORTED_DESERIALIZER"))
+    assert(e2.message ===
+      "The deserializer is not supported: try to map \"STRUCT<a: STRING, b: INT>\" " +
+      "to Tuple1, but failed as the number of fields does not line up.")
+  }
 }
--- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@ -17,6 +17,9 @@

 package org.apache.spark.sql.errors

+import java.io.File
+import java.net.URI
+
 import org.apache.spark.{SparkArithmeticException, SparkException, SparkIllegalArgumentException, SparkRuntimeException, SparkUnsupportedOperationException, SparkUpgradeException}
 import org.apache.spark.sql.{DataFrame, QueryTest, SaveMode}
 import org.apache.spark.sql.execution.datasources.orc.OrcTest
@ -286,4 +289,26 @@ class QueryExecutionErrorsSuite extends QueryTest
      assert(e2.getMessage === "The save mode NULL is not supported for: an existent path.")
    }
  }
+
+  test("INVALID_BUCKET_FILE: error if there exists any malformed bucket files") {
+    val df1 = (0 until 50).map(i => (i % 5, i % 13, i.toString)).
+      toDF("i", "j", "k").as("df1")
+
+    withTable("bucketed_table") {
+      df1.write.format("parquet").bucketBy(8, "i").
+        saveAsTable("bucketed_table")
+      val warehouseFilePath = new URI(spark.sessionState.conf.warehousePath).getPath
+      val tableDir = new File(warehouseFilePath, "bucketed_table")
+      Utils.deleteRecursively(tableDir)
+      df1.write.parquet(tableDir.getAbsolutePath)
+
+      val aggregated = spark.table("bucketed_table").groupBy("i").count()
+
+      val e = intercept[SparkException] {
+        aggregated.count()
+      }
+      assert(e.getErrorClass === "INVALID_BUCKET_FILE")
+      assert(e.getMessage.matches("Invalid bucket file: .+"))
+    }
+  }
 }
--- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
@ -213,4 +213,96 @@ class QueryParsingErrorsSuite extends QueryTest with SharedSparkSession {
          |--------------------------------------------^^^
          |""".stripMargin)
  }
+
+  test("UNSUPPORTED_FEATURE: cannot set reserved namespace property") {
+    val sql = "CREATE NAMESPACE IF NOT EXISTS a.b.c WITH PROPERTIES ('location'='/home/user/db')"
+    val msg = """The feature is not supported: location is a reserved namespace property, """ +
+      """please use the LOCATION clause to specify it.(line 1, pos 0)"""
+    validateParsingError(
+      sqlText = sql,
+      errorClass = "UNSUPPORTED_FEATURE",
+      sqlState = "0A000",
+      message =
+        s"""
+           |$msg
+           |
+           |== SQL ==
+           |$sql
+           |^^^
+           |""".stripMargin)
+  }
+
+  test("UNSUPPORTED_FEATURE: cannot set reserved table property") {
+    val sql = "CREATE TABLE student (id INT, name STRING, age INT) " +
+      "USING PARQUET TBLPROPERTIES ('provider'='parquet')"
+    val msg = """The feature is not supported: provider is a reserved table property, """ +
+      """please use the USING clause to specify it.(line 1, pos 66)"""
+    validateParsingError(
+      sqlText = sql,
+      errorClass = "UNSUPPORTED_FEATURE",
+      sqlState = "0A000",
+      message =
+        s"""
+           |$msg
+           |
+           |== SQL ==
+           |$sql
+           |------------------------------------------------------------------^^^
+           |""".stripMargin)
+  }
+
+  test("INVALID_PROPERTY_KEY: invalid property key for set quoted configuration") {
+    val sql = "set =`value`"
+    val msg = """"" is an invalid property key, please use quotes, """ +
+      """e.g. SET ""="value"(line 1, pos 0)"""
+    validateParsingError(
+      sqlText = sql,
+      errorClass = "INVALID_PROPERTY_KEY",
+      sqlState = null,
+      message =
+        s"""
+           |$msg
+           |
+           |== SQL ==
+           |$sql
+           |^^^
+           |""".stripMargin)
+  }
+
+  test("INVALID_PROPERTY_VALUE: invalid property value for set quoted configuration") {
+    val sql = "set `key`=1;2;;"
+    val msg = """"1;2;;" is an invalid property value, please use quotes, """ +
+      """e.g. SET "key"="1;2;;"(line 1, pos 0)"""
+    validateParsingError(
+      sqlText = sql,
+      errorClass = "INVALID_PROPERTY_VALUE",
+      sqlState = null,
+      message =
+        s"""
+           |$msg
+           |
+           |== SQL ==
+           |$sql
+           |^^^
+           |""".stripMargin)
+  }
+
+  test("UNSUPPORTED_FEATURE: cannot set Properties and DbProperties at the same time") {
+    val sql = "CREATE NAMESPACE IF NOT EXISTS a.b.c WITH PROPERTIES ('a'='a', 'b'='b', 'c'='c') " +
+      "WITH DBPROPERTIES('a'='a', 'b'='b', 'c'='c')"
+    val msg = """The feature is not supported: set PROPERTIES and DBPROPERTIES at the same time.""" +
+      """(line 1, pos 0)"""
+    validateParsingError(
+      sqlText = sql,
+      errorClass = "UNSUPPORTED_FEATURE",
+      sqlState = "0A000",
+      message =
+        s"""
+           |$msg
+           |
+           |== SQL ==
+           |$sql
+           |^^^
+           |""".stripMargin)
+  }
 }
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
@ -168,11 +168,11 @@ class SparkSqlParserSuite extends AnalysisTest {
    intercept("SET a=1;2;;", expectedErrMsg)

    intercept("SET a b=`1;;`",
-      "'a b' is an invalid property key, please use quotes, e.g. SET `a b`=`1;;`")
+      "\"a b\" is an invalid property key, please use quotes, e.g. SET \"a b\"=\"1;;\"")

    intercept("SET `a`=1;2;;",
-      "'1;2;;' is an invalid property value, please use quotes, e.g." +
-        " SET `a`=`1;2;;`")
+      "\"1;2;;\" is an invalid property value, please use quotes, e.g." +
+        " SET \"a\"=\"1;2;;\"")
  }

  test("refresh resource") {
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@ -23,6 +23,7 @@ import java.net.URI
 import org.apache.logging.log4j.Level
 import org.scalatest.PrivateMethodTester

+import org.apache.spark.SparkException
 import org.apache.spark.scheduler.{SparkListener, SparkListenerEvent, SparkListenerJobStart}
 import org.apache.spark.sql.{Dataset, QueryTest, Row, SparkSession, Strategy}
 import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight}
@ -856,12 +857,11 @@ class AdaptiveQueryExecSuite
        df1.write.parquet(tableDir.getAbsolutePath)

        val aggregated = spark.table("bucketed_table").groupBy("i").count()
-        val error = intercept[IllegalStateException] {
+        val error = intercept[SparkException] {
          aggregated.count()
        }
-        // TODO(SPARK-39163): Throw an exception w/ error class for an invalid bucket file
-        assert(error.toString contains "Invalid bucket file")
-        assert(error.getSuppressed.size === 0)
+        assert(error.getErrorClass === "INVALID_BUCKET_FILE")
+        assert(error.getMessage contains "Invalid bucket file")
      }
    }
  }
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/CreateNamespaceParserSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/CreateNamespaceParserSuite.scala
@ -84,7 +84,8 @@ class CreateNamespaceParserSuite extends AnalysisTest {
         |WITH PROPERTIES ('a'='a', 'b'='b', 'c'='c')
         |WITH DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')
      """.stripMargin
-    intercept(sql, "Either PROPERTIES or DBPROPERTIES is allowed")
+    intercept(sql, "The feature is not supported: " +
+      "set PROPERTIES and DBPROPERTIES at the same time.")
  }

  test("create namespace - support for other types in PROPERTIES") {
--- a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
@ -17,9 +17,6 @@

 package org.apache.spark.sql.sources

-import java.io.File
-import java.net.URI
-
 import scala.util.Random

 import org.apache.spark.sql._
@ -36,7 +33,6 @@ import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.StaticSQLConf.CATALOG_IMPLEMENTATION
 import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
-import org.apache.spark.util.Utils
 import org.apache.spark.util.collection.BitSet

 class BucketedReadWithoutHiveSupportSuite
@ -832,23 +828,6 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils with Adapti
    }
  }

-  test("error if there exists any malformed bucket files") {
-    withTable("bucketed_table") {
-      df1.write.format("parquet").bucketBy(8, "i").saveAsTable("bucketed_table")
-      val warehouseFilePath = new URI(spark.sessionState.conf.warehousePath).getPath
-      val tableDir = new File(warehouseFilePath, "bucketed_table")
-      Utils.deleteRecursively(tableDir)
-      df1.write.parquet(tableDir.getAbsolutePath)
-
-      val aggregated = spark.table("bucketed_table").groupBy("i").count()
-      val e = intercept[IllegalStateException] {
-        aggregated.count()
-      }
-      // TODO(SPARK-39163): Throw an exception w/ error class for an invalid bucket file
-      assert(e.toString contains "Invalid bucket file")
-    }
-  }
-
  test("disable bucketing when the output doesn't contain all bucketing columns") {
    withTable("bucketed_table") {
      df1.write.format("parquet").bucketBy(8, "i").saveAsTable("bucketed_table")
Author	SHA1	Message	Date
panbingkun	d736bec27b	[SPARK-37939][SQL][3.3] Use error classes in the parsing errors of properties ## What changes were proposed in this pull request? Migrate the following errors in QueryParsingErrors onto use error classes: - cannotCleanReservedNamespacePropertyError => UNSUPPORTED_FEATURE - cannotCleanReservedTablePropertyError => UNSUPPORTED_FEATURE - invalidPropertyKeyForSetQuotedConfigurationError => INVALID_PROPERTY_KEY - invalidPropertyValueForSetQuotedConfigurationError => INVALID_PROPERTY_VALUE - propertiesAndDbPropertiesBothSpecifiedError => UNSUPPORTED_FEATURE This is a backport of https://github.com/apache/spark/pull/36561. ### Why are the changes needed? Porting parsing errors of partitions to new error framework, improve test coverage, and document expected error messages in tests. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running new test: ``` $ build/sbt "sql/testOnly QueryParsingErrorsSuite" ``` Closes #36916 from panbingkun/branch-3.3-SPARK-37939. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>	2022-06-21 17:12:39 +03:00
panbingkun	5d3f3365c0	[SPARK-39163][SQL][3.3] Throw an exception w/ error class for an invalid bucket file ### What changes were proposed in this pull request? In the PR, I propose to use the INVALID_BUCKET_FILE error classes for an invalid bucket file. This is a backport of https://github.com/apache/spark/pull/36603. ### Why are the changes needed? Porting the executing errors for multiple rows from a subquery used as an expression to the new error framework should improve user experience with Spark SQL. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36913 from panbingkun/branch-3.3-SPARK-39163. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2022-06-20 11:04:14 +09:00
panbingkun	458be83fda	[SPARK-38688][SQL][TESTS][3.3] Use error classes in the compilation errors of deserializer ### What changes were proposed in this pull request? Migrate the following errors in QueryCompilationErrors: * dataTypeMismatchForDeserializerError -> UNSUPPORTED_DESERIALIZER.DATA_TYPE_MISMATCH * fieldNumberMismatchForDeserializerError -> UNSUPPORTED_DESERIALIZER.FIELD_NUMBER_MISMATCH This is a backport of https://github.com/apache/spark/pull/36479. ### Why are the changes needed? Porting compilation errors of unsupported deserializer to new error framework. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new UT. Closes #36897 from panbingkun/branch-3.3-SPARK-38688. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>	2022-06-19 12:34:41 +03:00
Bruce Robbins	1dea5746fe	[SPARK-39496][SQL] Handle null struct in `Inline.eval` ### What changes were proposed in this pull request? Change `Inline.eval` to return a row of null values rather than a null row in the case of a null input struct. ### Why are the changes needed? Consider the following query: ``` set spark.sql.codegen.wholeStage=false; select inline(array(named_struct('a', 1, 'b', 2), null)); ``` This query fails with a `NullPointerException`: ``` 22/06/16 15:10:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$11(GenerateExec.scala:122) ``` (In Spark 3.1.3, you don't need to set `spark.sql.codegen.wholeStage` to false to reproduce the error, since Spark 3.1.3 has no codegen path for `Inline`). This query fails regardless of the setting of `spark.sql.codegen.wholeStage`: ``` val dfWide = (Seq((1)) .toDF("col0") .selectExpr(Seq.tabulate(99)(x => s"$x as col${x + 1}"): _)) val df = (dfWide .selectExpr("", "array(named_struct('a', 1, 'b', 2), null) as struct_array")) df.selectExpr("*", "inline(struct_array)").collect ``` It fails with ``` 22/06/16 15:18:55 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1] java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.JoinedRow.isNullAt(JoinedRow.scala:80) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_8$(Unknown Source) ``` When `Inline.eval` returns a null row in the collection, GenerateExec gets a NullPointerException either when joining the null row with required child output, or projecting the null row. This PR avoids producing the null row and produces a row of null values instead: ``` spark-sql> set spark.sql.codegen.wholeStage=false; spark.sql.codegen.wholeStage false Time taken: 3.095 seconds, Fetched 1 row(s) spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); 1 2 NULL NULL Time taken: 1.214 seconds, Fetched 2 row(s) spark-sql> ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit test. Closes #36903 from bersprockets/inline_eval_null_struct_issue. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `c4d5390dd0`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2022-06-18 09:25:21 +09:00
Sean Owen	ad90195de5	[SPARK-39505][UI] Escape log content rendered in UI ### What changes were proposed in this pull request? Escape log content rendered to the UI. ### Why are the changes needed? Log content may contain reserved characters or other code in the log and be misinterpreted in the UI as HTML. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #36902 from srowen/LogViewEscape. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2022-06-17 13:30:33 -05:00