spark-instrumented-optimizer

History

Ivan Sadikov fc0b85fb26 [SPARK-36803][SQL] Fix ArrayType conversion when reading Parquet files written in legacy mode ### What changes were proposed in this pull request? This PR fixes an issue when reading of a Parquet file written with legacy mode would fail due to incorrect Parquet LIST to ArrayType conversion. The issue arises when using schema evolution and utilising the parquet-mr reader. 2-level LIST annotated types could be parsed incorrectly as 3-level LIST annotated types because their underlying element type does not match the full inferred Catalyst schema. ### Why are the changes needed? It appears to be a long-standing issue with the legacy mode due to the imprecise check in ParquetRowConverter that was trying to determine Parquet backward compatibility using Catalyst schema: `DataType.equalsIgnoreCompatibleNullability(guessedElementType, elementType)` in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L606. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a new test case in ParquetInteroperabilitySuite.scala. Closes #34044 from sadikovi/parquet-legacy-write-mode-list-issue. Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `ec26d94eac`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2021-09-22 17:40:55 +08:00
..
benchmarks	[SPARK-34981][SQL][FOLLOWUP] Use SpecificInternalRow in ApplyFunctionExpression	2021-05-24 17:25:24 +09:00
src	[SPARK-36803][SQL] Fix ArrayType conversion when reading Parquet files written in legacy mode	2021-09-22 17:40:55 +08:00
pom.xml	[SPARK-36670][FOLLOWUP][TEST] Remove brotli-codec dependency	2021-09-21 10:57:34 -07:00