spark-instrumented-optimizer

History

Liang-Chi Hsieh 3030b82c89 [SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unnecessary root fields ## What changes were proposed in this pull request? Schema pruning doesn't work if nested column is used in where clause. For example, ``` sql("select name.first from contacts where name.first = 'David'") == Physical Plan == (1) Project [name#19.first AS first#40] +- (1) Filter (isnotnull(name#19) && (name#19.first = David)) +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:struct<first:string,middle:string,last:string>> ``` In above query plan, the scan node reads the entire schema of `name` column. This issue is reported by: https://github.com/apache/spark/pull/21320#issuecomment-419290197 The cause is that we infer a root field from expression `IsNotNull(name)`. However, for such expression, we don't really use the nested fields of this root field, so we can ignore the unnecessary nested fields. ## How was this patch tested? Unit tests. Closes #22357 from viirya/SPARK-25363. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>		2018-09-12 17:43:40 +00:00
..
benchmarks	[SPARK-25306][SQL] Avoid skewed filter trees to speed up `createFilter` in ORC	2018-09-05 10:24:13 +08:00
src	[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unnecessary root fields	2018-09-12 17:43:40 +00:00
pom.xml	[SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules	2018-08-06 12:00:39 -07:00