spark-instrumented-optimizer

History

Gengliang Wang 4dce45a599 [SPARK-26744][SQL] Support schema validation in FileDataSourceV2 framework ## What changes were proposed in this pull request? The file source has a schema validation feature, which validates 2 schemas: 1. the user-specified schema when reading. 2. the schema of input data when writing. If a file source doesn't support the schema, we can fail the query earlier. This PR is to implement the same feature in the `FileDataSourceV2` framework. Comparing to `FileFormat`, `FileDataSourceV2` has multiple layers. The API is added in two places: 1. Read path: the table schema is determined in `TableProvider.getTable`. The actual read schema can be a subset of the table schema. This PR proposes to validate the actual read schema in `FileScan`. 2. Write path: validate the actual output schema in `FileWriteBuilder`. ## How was this patch tested? Unit test Closes #23714 from gengliangwang/schemaValidationV2. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2019-02-16 17:11:36 +08:00
..
benchmarks	[SPARK-26745][SQL] Revert count optimization in JSON datasource by SPARK-24959	2019-01-31 14:32:31 +08:00
src	[SPARK-26744][SQL] Support schema validation in FileDataSourceV2 framework	2019-02-16 17:11:36 +08:00
pom.xml	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0	2018-11-14 16:22:23 -08:00