spark-instrumented-optimizer

History

Gengliang Wang e6c6f90a55 [SPARK-24691][SQL] Dispatch the type support check in FileFormat implementation ## What changes were proposed in this pull request? With https://github.com/apache/spark/pull/21389, data source schema is validated on driver side before launching read/write tasks. However, 1. Putting all the validations together in `DataSourceUtils` is tricky and hard to maintain. On second thought after review, I find that the `OrcFileFormat` in hive package is not matched, so that its validation wrong. 2. `DataSourceUtils.verifyWriteSchema` and `DataSourceUtils.verifyReadSchema` is not supposed to be called in every file format. We can move them to some upper entry. So, I propose we can add a new method `validateDataType` in FileFormat. File format implementation can override the method to specify its supported/non-supported data types. Although we should focus on data source V2 API, `FileFormat` should remain workable for some time. Adding this new method should be helpful. ## How was this patch tested? Unit test Author: Gengliang Wang <gengliang.wang@databricks.com> Closes #21667 from gengliangwang/refactorSchemaValidate.		2018-07-13 00:26:49 +08:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution	[SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules	2018-01-22 04:31:24 -08:00
src	[SPARK-24691][SQL] Dispatch the type support check in FileFormat implementation	2018-07-13 00:26:49 +08:00
pom.xml	[SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT	2018-01-13 00:37:59 +08:00