spark-instrumented-optimizer/sql/hive
Gengliang Wang e6c6f90a55 [SPARK-24691][SQL] Dispatch the type support check in FileFormat implementation
## What changes were proposed in this pull request?

With https://github.com/apache/spark/pull/21389,  data source schema is validated on driver side before launching read/write tasks.
However,

1. Putting all the validations together in `DataSourceUtils` is tricky and hard to maintain. On second thought after review, I find that the `OrcFileFormat` in hive package is not matched, so that its validation wrong.
2.  `DataSourceUtils.verifyWriteSchema` and `DataSourceUtils.verifyReadSchema` is not supposed to be called in every file format. We can move them to some upper entry.

So, I propose we can add a new method `validateDataType` in FileFormat. File format implementation can override the method to specify its supported/non-supported data types.
Although we should focus on data source V2 API, `FileFormat` should remain workable for some time. Adding this new method should be helpful.

## How was this patch tested?

Unit test

Author: Gengliang Wang <gengliang.wang@databricks.com>

Closes #21667 from gengliangwang/refactorSchemaValidate.
2018-07-13 00:26:49 +08:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules 2018-01-22 04:31:24 -08:00
src [SPARK-24691][SQL] Dispatch the type support check in FileFormat implementation 2018-07-13 00:26:49 +08:00
pom.xml [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT 2018-01-13 00:37:59 +08:00