spark-instrumented-optimizer/sql/core
Pablo Langa f12793de20 [SPARK-35320][SQL] Improve error message for unsupported key types in MapType in from_json expression
### What changes were proposed in this pull request?

Currently, when a map is parsed in a from_json function, only StringType key is supported. If you try to parse other type, it results on a cast exception.
For example:
```scala
Seq((s"""{"2021-05-05T20:05:08": "sampleValue"}"""))
  .toDF("value")
  .withColumn("value1", from_json(col("value"),  MapType(TimestampType, StringType)))
  .show
```
```
Exception in thread "main" java.lang.ClassCastException: class org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long (org.apache.spark.unsafe.types.UTF8String is in unnamed module of loader 'app'; java.lang.Long is in module java.base of loader 'bootstrap')
	at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
	at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$8$adapted(Cast.scala:297)
	at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285)
	at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$7(Cast.scala:297)
```
This PR proposes to improve the error message.
```
org.apache.spark.sql.AnalysisException: cannot resolve 'entries' due to data type mismatch: Input schema map<timestamp,string> can only contain StringType as a key type for a MapType.;
'Project [unresolvedalias(from_json(MapType(TimestampType,StringType,true), value#1, Some(America/Los_Angeles)), Some(org.apache.spark.sql.Column$$Lambda$1496/54693608710e5bf9c))]
+- LocalRelation [value#1]
	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:197)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:182)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
...
```
In https://github.com/apache/spark/pull/32599 we decide to improve the error message instead of support this.

### Why are the changes needed?

Avoid confusion in the interpretation of the error

### Does this PR introduce _any_ user-facing change?

Yes, the error message returned in this case

### How was this patch tested?

Unit testing and manual testing

Closes #33525 from planga82/feature/spark35320_improve_error_message.

Authored-by: Pablo Langa <soypab@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-28 14:46:06 +08:00
..
benchmarks [SPARK-34981][SQL][FOLLOWUP] Use SpecificInternalRow in ApplyFunctionExpression 2021-05-24 17:25:24 +09:00
src [SPARK-35320][SQL] Improve error message for unsupported key types in MapType in from_json expression 2021-07-28 14:46:06 +08:00
pom.xml [SPARK-35996][BUILD] Setting version to 3.3.0-SNAPSHOT 2021-07-02 13:47:36 -07:00