c59988aa79
### What changes were proposed in this pull request? This patch proposes an improvement on nested column pruning if the pruning target is generator's output. Previously we disallow such case. This patch allows to prune on it if there is only one single nested column is accessed after `Generate`. E.g., `df.select(explode($"items").as('item)).select($"item.itemId")`. As we only need `itemId` from `item`, we can prune other fields out and only keep `itemId`. In this patch, we only address explode-like generators. We will address other generators in followups. ### Why are the changes needed? This helps to extend the availability of nested column pruning. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #31966 from viirya/SPARK-34638. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |