46f80a3073
## What changes were proposed in this pull request? As part of the bugfix in https://github.com/apache/spark/pull/12279, if a row batch consist of both dictionary encoded and non-dictionary encoded pages, we explicitly decode the dictionary for the values that are already dictionary encoded. Currently we reset the dictionary while reading every page that can potentially cause ` java.lang.ArrayIndexOutOfBoundsException` while decoding older pages. This patch fixes the problem by maintaining a single dictionary per row-batch in vectorized parquet reader. ## How was this patch tested? Manual Tests against a number of hand-generated parquet files. Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14225 from sameeragarwal/vectorized. |
||
---|---|---|
.. | ||
main | ||
test |