[SPARK-15547][SQL] nested case class in encoder can have different number of fields from the real schema

## What changes were proposed in this pull request?

There are 2 kinds of `GetStructField`:

1. resolved from `UnresolvedExtractValue`, and it will have a `name` property.
2. created when we build deserializer expression for nested tuple, no `name` property.

When we want to validate the ordinals of nested tuple, we should only catch `GetStructField` without the name property.

## How was this patch tested?

new test in `EncoderResolutionSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #13474 from cloud-fan/ordinal-check.
This commit is contained in:
Wenchen Fan 2016-06-03 14:26:24 -07:00 committed by Cheng Lian
parent eb10b481ca
commit 61b80d552a
2 changed files with 12 additions and 1 deletions

View file

@ -1964,7 +1964,12 @@ class Analyzer(
*/
private def validateNestedTupleFields(deserializer: Expression): Unit = {
val structChildToOrdinals = deserializer
.collect { case g: GetStructField => g }
// There are 2 kinds of `GetStructField`:
// 1. resolved from `UnresolvedExtractValue`, and it will have a `name` property.
// 2. created when we build deserializer expression for nested tuple, no `name` property.
// Here we want to validate the ordinals of nested tuple, so we should only catch
// `GetStructField` without the name property.
.collect { case g: GetStructField if g.name.isEmpty => g }
.groupBy(_.child)
.mapValues(_.map(_.ordinal).distinct.sorted)

View file

@ -115,6 +115,12 @@ class EncoderResolutionSuite extends PlanTest {
}
}
test("nested case class can have different number of fields from the real schema") {
val encoder = ExpressionEncoder[(String, StringIntClass)]
val attrs = Seq('a.string, 'b.struct('a.string, 'b.int, 'c.int))
encoder.resolveAndBind(attrs)
}
test("throw exception if real type is not compatible with encoder schema") {
val msg1 = intercept[AnalysisException] {
ExpressionEncoder[StringIntClass].resolveAndBind(Seq('a.string, 'b.long))