spark-instrumented-optimizer

History

Erik Krogen 4dd41b9678 [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching ### What changes were proposed in this pull request? Provide the (configurable) ability to perform Avro-to-Catalyst schema field matching using the position of the fields instead of their names. A new `option` is added for the Avro datasource, `positionalFieldMatching`, which instructs `AvroSerializer`/`AvroDeserializer` to perform positional field matching instead of matching by name. ### Why are the changes needed? This by-name matching is somewhat recent; prior to PR #24635, at least on the write path, schemas were matched by positionally ("structural" comparison). While by-name is better behavior as a default, it will be better to make this configurable by a user. Even at the time that PR #24635 was handled, there was [interest in making this behavior configurable](https://github.com/apache/spark/pull/24635#issuecomment-494205251), but it appears it went unaddressed. There is precedence for configurability of this behavior as seen in PR #29737, which added this support for ORC. Besides this precedence, the behavior of Hive is to perform matching positionally ([ref](https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-WritingtablestoAvrofiles)), so this is behavior that Hadoop/Hive ecosystem users are familiar with. ### Does this PR introduce _any_ user-facing change? Yes, a new option is provided for the Avro datasource, `positionalFieldMatching`, which provides compatibility with Hive and pre-3.0.0 Spark behavior. ### How was this patch tested? New unit tests are added within `AvroSuite`, `AvroSchemaHelperSuite`, and `AvroSerdeSuite`; and most of the existing tests within `AvroSerdeSuite` are adapted to perform the same test using by-name and positional matching to ensure feature parity. Closes #31490 from xkrogen/xkrogen-SPARK-34365-avro-positional-field-matching. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>		2021-06-30 16:20:45 +08:00
..
benchmarks	[SPARK-35817][SQL] Restore performance of queries against wide Avro tables	2021-06-23 22:36:56 +08:00
src	[SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching	2021-06-30 16:20:45 +08:00
pom.xml	[SPARK-35838][BUILD][TESTS] Ensure all modules can be maven test independently in Scala 2.13	2021-06-22 06:31:24 -07:00