spark-instrumented-optimizer/sql/core
Liang-Chi Hsieh c59988aa79 [SPARK-34638][SQL] Single field nested column prune on generator output
### What changes were proposed in this pull request?

This patch proposes an improvement on nested column pruning if the pruning target is generator's output. Previously we disallow such case. This patch allows to prune on it if there is only one single nested column is accessed after `Generate`.

E.g., `df.select(explode($"items").as('item)).select($"item.itemId")`. As we only need `itemId` from `item`, we can prune other fields out and only keep `itemId`.

In this patch, we only address explode-like generators. We will address other generators in followups.

### Why are the changes needed?

This helps to extend the availability of nested column pruning.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes #31966 from viirya/SPARK-34638.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
2021-04-26 09:32:22 -07:00
..
benchmarks [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
src [SPARK-34638][SQL] Single field nested column prune on generator output 2021-04-26 09:32:22 -07:00
pom.xml [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00