f12793de20
### What changes were proposed in this pull request? Currently, when a map is parsed in a from_json function, only StringType key is supported. If you try to parse other type, it results on a cast exception. For example: ```scala Seq((s"""{"2021-05-05T20:05:08": "sampleValue"}""")) .toDF("value") .withColumn("value1", from_json(col("value"), MapType(TimestampType, StringType))) .show ``` ``` Exception in thread "main" java.lang.ClassCastException: class org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long (org.apache.spark.unsafe.types.UTF8String is in unnamed module of loader 'app'; java.lang.Long is in module java.base of loader 'bootstrap') at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$8$adapted(Cast.scala:297) at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285) at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$7(Cast.scala:297) ``` This PR proposes to improve the error message. ``` org.apache.spark.sql.AnalysisException: cannot resolve 'entries' due to data type mismatch: Input schema map<timestamp,string> can only contain StringType as a key type for a MapType.; 'Project [unresolvedalias(from_json(MapType(TimestampType,StringType,true), value#1, Some(America/Los_Angeles)), Some(org.apache.spark.sql.Column$$Lambda$1496/54693608710e5bf9c))] +- LocalRelation [value#1] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:197) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:182) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) ... ``` In https://github.com/apache/spark/pull/32599 we decide to improve the error message instead of support this. ### Why are the changes needed? Avoid confusion in the interpretation of the error ### Does this PR introduce _any_ user-facing change? Yes, the error message returned in this case ### How was this patch tested? Unit testing and manual testing Closes #33525 from planga82/feature/spark35320_improve_error_message. Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |