From 4d114fc9a2cb0be7256560bc8b2e4ce72adb7a7f Mon Sep 17 00:00:00 2001 From: liuxian Date: Thu, 20 Sep 2018 16:53:48 -0500 Subject: [PATCH] [SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported for parquet files ## What changes were proposed in this pull request? Hadoop2.6 and hadoop2.7 do not contain zstd and brotli compressioncodec ,hadoop 3.1 also contains only zstd compressioncodec . So I think we should remove zstd and brotil for the time being. **set `spark.sql.parquet.compression.codec=brotli`:** Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.BrotliCodec was not found at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.(CodecFactory.java:142) at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189) at org.apache.parquet.hadoop.ParquetRecordWriter.(ParquetRecordWriter.java:153) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161) **set `spark.sql.parquet.compression.codec=zstd`:** Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.ZStandardCodec was not found at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.(CodecFactory.java:142) at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189) at org.apache.parquet.hadoop.ParquetRecordWriter.(ParquetRecordWriter.java:153) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161) ## How was this patch tested? Exist unit test Closes #22358 from 10110346/notsupportzstdandbrotil. Authored-by: liuxian Signed-off-by: Sean Owen --- docs/sql-programming-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index d2e3ee3e77..8ec4865d58 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession `parquet.compression` is specified in the table-specific options/properties, the precedence would be `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. + Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires + `BrotliCodec` to be installed.