[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files

## What changes were proposed in this pull request?
Hadoop2.6  and  hadoop2.7 do not contain zstd and brotli compressioncodec ,hadoop 3.1 also contains only zstd  compressioncodec .
 So I think we should remove zstd and brotil  for the time being.

**set  `spark.sql.parquet.compression.codec=brotli`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.BrotliCodec was not found
        at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

**set  `spark.sql.parquet.compression.codec=zstd`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.ZStandardCodec was not found
        at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

## How was this patch tested?
Exist unit test

Closes #22358 from 10110346/notsupportzstdandbrotil.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>

This commit is contained in:

liuxian

2018-09-20 16:53:48 -05:00

committed by

Sean Owen

parent 2f51e72356

commit 4d114fc9a2

1 changed files with 2 additions and 0 deletions

									
										2

docs/sql-programming-guide.md
									
										View file
										
				@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

				    `parquet.compression` is specified in the table-specific options/properties, the precedence would be

				    `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:

				    none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.

				    Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires

				    `BrotliCodec` to be installed.

				  </td>

				</tr>

				<tr>

[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files

2 docs/sql-programming-guide.md Unescape Escape View file

2

docs/sql-programming-guide.md

View file