spark-instrumented-optimizer

History

Sameer Agarwal 46f80a3073 [SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader ## What changes were proposed in this pull request? As part of the bugfix in https://github.com/apache/spark/pull/12279, if a row batch consist of both dictionary encoded and non-dictionary encoded pages, we explicitly decode the dictionary for the values that are already dictionary encoded. Currently we reset the dictionary while reading every page that can potentially cause ` java.lang.ArrayIndexOutOfBoundsException` while decoding older pages. This patch fixes the problem by maintaining a single dictionary per row-batch in vectorized parquet reader. ## How was this patch tested? Manual Tests against a number of hand-generated parquet files. Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14225 from sameeragarwal/vectorized.	2016-07-21 15:34:32 -07:00
..
main	[SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader	2016-07-21 15:34:32 -07:00
test	[SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable	2016-07-21 12:10:26 -07:00

Sameer Agarwal 46f80a3073 [SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader

## What changes were proposed in this pull request?

As part of the bugfix in https://github.com/apache/spark/pull/12279, if a row batch consist of both dictionary encoded and non-dictionary encoded pages, we explicitly decode the dictionary for the values that are already dictionary encoded. Currently we reset the dictionary while reading every page that can potentially cause ` java.lang.ArrayIndexOutOfBoundsException` while decoding older pages. This patch fixes the problem by maintaining a single dictionary per row-batch in vectorized parquet reader.

## How was this patch tested?

Manual Tests against a number of hand-generated parquet files.

Author: Sameer Agarwal <sameerag@cs.berkeley.edu>

Closes #14225 from sameeragarwal/vectorized.

2016-07-21 15:34:32 -07:00

main

[SPARK-16334] Maintain single dictionary per row-batch in vectorized parquet reader

2016-07-21 15:34:32 -07:00

test

[SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable

2016-07-21 12:10:26 -07:00