a2c9acb0e5
## What changes were proposed in this pull request? This patch fixes a bug in the vectorized parquet reader that's caused by re-using the same dictionary column vector while reading consecutive row groups. Specifically, this issue manifests for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. ## How was this patch tested? Manually tested on datasets provided by the community. Thanks to Chris Perluss and Keith Kraus for their invaluable help in tracking down this issue! Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14941 from sameeragarwal/parquet-exception-2. |
||
---|---|---|
.. | ||
java/org/apache/spark/sql | ||
resources | ||
scala/org/apache/spark/sql |