fc0b85fb26
### What changes were proposed in this pull request?
This PR fixes an issue when reading of a Parquet file written with legacy mode would fail due to incorrect Parquet LIST to ArrayType conversion.
The issue arises when using schema evolution and utilising the parquet-mr reader. 2-level LIST annotated types could be parsed incorrectly as 3-level LIST annotated types because their underlying element type does not match the full inferred Catalyst schema.
### Why are the changes needed?
It appears to be a long-standing issue with the legacy mode due to the imprecise check in ParquetRowConverter that was trying to determine Parquet backward compatibility using Catalyst schema: `DataType.equalsIgnoreCompatibleNullability(guessedElementType, elementType)` in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L606.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added a new test case in ParquetInteroperabilitySuite.scala.
Closes #34044 from sadikovi/parquet-legacy-write-mode-list-issue.
Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit
|
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |