spark-instrumented-optimizer

History

Gengliang Wang 568db94e0c [SPARK-27356][SQL] File source V2: Fix the case that data columns overlap with partition schema ## What changes were proposed in this pull request? In the current file source V2 framework, the schema of `FileScan` is not returned correctly if there are overlap columns between `dataSchema` and `partitionSchema`. The actual schema should be `dataSchema - overlapSchema + partitionSchema`, which might have different column order from the pushed down `requiredSchema` in `SupportsPushDownRequiredColumns.pruneColumns`. For example, if the data schema is `[a: String, b: String, c: String]` and the partition schema is `[b: Int, d: Int]`, the result schema is `[a: String, b: Int, c: String, d: Int]` in current `FileTable` and `HadoopFsRelation`. while the actual scan schema is `[a: String, c: String, b: Int, d: Int]` in `FileScan`. To fix the corner case, this PR proposes that the output schema of `FileTable` should be `dataSchema - overlapSchema + partitionSchema`, so that the column order is consistent with `FileScan`. Putting all the partition columns to the end of table schema is more reasonable. ## How was this patch tested? Unit test. Closes #24284 from gengliangwang/FixReadSchema. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2019-04-05 13:34:46 +08:00
..
benchmarks	[SPARK-27327][SQL] New JSON benchmarks: functions, Dataset[String]	2019-04-01 08:33:16 +09:00
src	[SPARK-27356][SQL] File source V2: Fix the case that data columns overlap with partition schema	2019-04-05 13:34:46 +08:00
v1.2.1/src	[SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1	2019-03-26 22:32:03 -07:00
v2.3.4/src	[SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1	2019-03-26 22:32:03 -07:00
pom.xml	[SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1	2019-03-26 22:32:03 -07:00