[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files

## What changes were proposed in this pull request?
Hadoop2.6  and  hadoop2.7 do not contain zstd and brotli compressioncodec ,hadoop 3.1 also contains only zstd  compressioncodec .
 So I think we should remove zstd and brotil  for the time being.

**set  `spark.sql.parquet.compression.codec=brotli`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.BrotliCodec was not found
        at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

**set  `spark.sql.parquet.compression.codec=zstd`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.ZStandardCodec was not found
        at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
        at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
        at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
        at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)

## How was this patch tested?
Exist unit test

Closes #22358 from 10110346/notsupportzstdandbrotil.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
This commit is contained in:
liuxian 2018-09-20 16:53:48 -05:00 committed by Sean Owen
parent 2f51e72356
commit 4d114fc9a2

View file

@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires
`BrotliCodec` to be installed.
</td>
</tr>
<tr>