spark-instrumented-optimizer

History

Sameer Agarwal a2c9acb0e5 [SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error ## What changes were proposed in this pull request? This patch fixes a bug in the vectorized parquet reader that's caused by re-using the same dictionary column vector while reading consecutive row groups. Specifically, this issue manifests for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. ## How was this patch tested? Manually tested on datasets provided by the community. Thanks to Chris Perluss and Keith Kraus for their invaluable help in tracking down this issue! Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14941 from sameeragarwal/parquet-exception-2.		2016-09-02 15:16:16 -07:00
..
java/org/apache/spark/sql	[SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error	2016-09-02 15:16:16 -07:00
resources	[SPARK-16031] Add debug-only socket source in Structured Streaming	2016-06-19 21:27:04 -07:00
scala/org/apache/spark/sql	[SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter	2016-09-02 15:10:12 -07:00