267160b360
## What changes were proposed in this pull request? Currently, File source v2 allows each data source to specify the supported data types by implementing the method `supportsDataType` in `FileScan` and `FileWriteBuilder`. However, in the read path, the validation checks all the data types in `readSchema`, which might contain partition columns. This is actually a regression. E.g. Text data source only supports String data type, while the partition columns can still contain Integer type since partition columns are processed by Spark. This PR is to: 1. Refactor schema validation and check data schema only. 2. Filter the partition columns in data schema if user specified schema provided. ## How was this patch tested? Unit test Closes #24203 from gengliangwang/schemaValidation. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |