e6c6f90a55
## What changes were proposed in this pull request? With https://github.com/apache/spark/pull/21389, data source schema is validated on driver side before launching read/write tasks. However, 1. Putting all the validations together in `DataSourceUtils` is tricky and hard to maintain. On second thought after review, I find that the `OrcFileFormat` in hive package is not matched, so that its validation wrong. 2. `DataSourceUtils.verifyWriteSchema` and `DataSourceUtils.verifyReadSchema` is not supposed to be called in every file format. We can move them to some upper entry. So, I propose we can add a new method `validateDataType` in FileFormat. File format implementation can override the method to specify its supported/non-supported data types. Although we should focus on data source V2 API, `FileFormat` should remain workable for some time. Adding this new method should be helpful. ## How was this patch tested? Unit test Author: Gengliang Wang <gengliang.wang@databricks.com> Closes #21667 from gengliangwang/refactorSchemaValidate. |
||
---|---|---|
.. | ||
compatibility/src/test/scala/org/apache/spark/sql/hive/execution | ||
src | ||
pom.xml |