spark-instrumented-optimizer/licenses
Sital Kedia 444bce1c98 [SPARK-19112][CORE] Support for ZStandard codec
## What changes were proposed in this pull request?

Using zstd compression for Spark jobs spilling 100s of TBs of data, we could reduce the amount of data written to disk by as much as 50%. This translates to significant latency gain because of reduced disk io operations. There is a degradation CPU time by 2 - 5% because of zstd compression overhead, but for jobs which are bottlenecked by disk IO, this hit can be taken.

## Benchmark
Please note that this benchmark is using real world compute heavy production workload spilling TBs of data to disk

|         | zstd performance as compred to LZ4   |
| ------------- | -----:|
| spill/shuffle bytes    | -48% |
| cpu time    |    + 3% |
| cpu reservation time       |    -40%|
| latency     |     -40% |

## How was this patch tested?

Tested by running few jobs spilling large amount of data on the cluster and amount of intermediate data written to disk reduced by as much as 50%.

Author: Sital Kedia <skedia@fb.com>

Closes #18805 from sitalkedia/skedia/upstream_zstd.
2017-11-01 14:54:08 +01:00
..
LICENSE-AnchorJS.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-antlr.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-boto.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-cloudpickle.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-d3.min.js.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-dagre-d3.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-DPark.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-f2j.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-graphlib-dot.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-heapq.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-javolution.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-jbcrypt.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-jline.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-jpmml-model.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-jquery.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-junit-interface.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-kryo.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-minlog.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-Mockito.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-modernizr.txt [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright 2016-06-04 21:41:27 +01:00
LICENSE-netlib.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-paranamer.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-postgresql.txt [SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover 2016-05-06 13:58:12 -07:00
LICENSE-protobuf.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-py4j.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-pyrolite.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-reflectasm.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-sbt-launch-lib.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-scala.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-scalacheck.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-scopt.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-slf4j.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-SnapTree.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-sorttable.js.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-spire.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-xmlenc.txt [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE 2015-09-28 22:56:43 -04:00
LICENSE-zstd-jni.txt [SPARK-19112][CORE] Support for ZStandard codec 2017-11-01 14:54:08 +01:00
LICENSE-zstd.txt [SPARK-19112][CORE] Support for ZStandard codec 2017-11-01 14:54:08 +01:00