spark-instrumented-optimizer

History

Reynold Xin 2f0b882e5c [SPARK-14482][SQL] Change default Parquet codec from gzip to snappy ## What changes were proposed in this pull request? Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core. This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON). ## How was this patch tested? Should be covered by existing unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12256 from rxin/SPARK-14482.	2016-04-08 23:52:04 -07:00
..
src	[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy	2016-04-08 23:52:04 -07:00
pom.xml	[SPARK-14103][SQL] Parse unescaped quotes in CSV data source.	2016-04-08 00:28:59 -07:00

Reynold Xin 2f0b882e5c [SPARK-14482][SQL] Change default Parquet codec from gzip to snappy

## What changes were proposed in this pull request?
Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core.

This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON).

## How was this patch tested?
Should be covered by existing unit tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #12256 from rxin/SPARK-14482.

2016-04-08 23:52:04 -07:00

src

[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy

2016-04-08 23:52:04 -07:00

pom.xml

[SPARK-14103][SQL] Parse unescaped quotes in CSV data source.

2016-04-08 00:28:59 -07:00