spark-instrumented-optimizer/core
Patrick Wendell 35886f3474 Merge pull request #41 from pwendell/shuffle-benchmark
Provide Instrumentation for Shuffle Write Performance

Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are:

1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes.
2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache.
3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs.

I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.
2013-10-20 22:20:32 -07:00
..
src Merge pull request #41 from pwendell/shuffle-benchmark 2013-10-20 22:20:32 -07:00
pom.xml Add a zookeeper compile dependency to fix build in maven 2013-10-11 16:31:47 +08:00