spark-instrumented-optimizer/external
Erik Krogen 9371ea8c7b [SPARK-34133][AVRO] Respect case sensitivity when performing Catalyst-to-Avro field matching
### What changes were proposed in this pull request?
Make the field name matching between Avro and Catalyst schemas, on both the reader and writer paths, respect the global SQL settings for case sensitivity (i.e. case-insensitive by default). `AvroSerializer` and `AvroDeserializer` share a common utility in `AvroUtils` to search for an Avro field to match a given Catalyst field.

### Why are the changes needed?
Spark SQL is normally case-insensitive (by default), but currently when `AvroSerializer` and `AvroDeserializer` perform matching between Catalyst schemas and Avro schemas, the matching is done in a case-sensitive manner. So for example the following will fail:
```scala
      val avroSchema =
        """
          |{
          |  "type" : "record",
          |  "name" : "test_schema",
          |  "fields" : [
          |    {"name": "foo", "type": "int"},
          |    {"name": "BAR", "type": "int"}
          |  ]
          |}
      """.stripMargin
      val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar")

      df.write.option("avroSchema", avroSchema).format("avro").save(savePath)
```

The same is true on the read path, if we assume `testAvro` has been written using the schema above, the below will fail to match the fields:
```scala
df.read.schema(new StructType().add("FOO", IntegerType).add("bar", IntegerType))
  .format("avro").load(testAvro)
```

### Does this PR introduce _any_ user-facing change?
When reading Avro data, or writing Avro data using the `avroSchema` option, field matching will be performed with case sensitivity respecting the global SQL settings.

### How was this patch tested?
New tests added to `AvroSuite` to validate the case sensitivity logic in an end-to-end manner through the SQL engine.

Closes #31201 from xkrogen/xkrogen-SPARK-34133-avro-serde-casesensitivity-errormessages.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-25 04:54:41 +00:00
..
avro [SPARK-34133][AVRO] Respect case sensitivity when performing Catalyst-to-Avro field matching 2021-01-25 04:54:41 +00:00
docker [SPARK-32353][TEST] Update docker/spark-test and clean up unused stuff 2020-07-17 12:05:45 -07:00
docker-integration-tests [SPARK-34186][SQL][TESTS] Fix DockerJDBCIntegrationSuites to reflect the change of SPARK-33888 2021-01-24 18:44:14 -08:00
kafka-0-10 [SPARK-33346][CORE][SQL][MLLIB][DSTREAM][K8S] Change the never changed 'var' to 'val' 2021-01-15 08:47:02 -06:00
kafka-0-10-assembly [SPARK-27733][CORE] Upgrade Avro to version 1.10.1 2021-01-20 15:42:27 -08:00
kafka-0-10-sql [SPARK-34187][SS] Use available offset range obtained during polling when checking offset validation 2021-01-24 11:50:56 -08:00
kafka-0-10-token-provider [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile 2021-01-15 14:06:50 -08:00
kinesis-asl [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00
kinesis-asl-assembly [SPARK-27733][CORE] Upgrade Avro to version 1.10.1 2021-01-20 15:42:27 -08:00
spark-ganglia-lgpl [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00