spark-instrumented-optimizer/core/src/main
Andrew Or a46b8f2d71 [SPARK-3277] Fix external spilling with LZ4 assertion error
**Summary of the changes**

The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External*` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`.

**The bug itself**

In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics.

Thanks mridulm and pwendell for help with debugging.

Author: Andrew Or <andrewor14@gmail.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits:

1b54bdc [Andrew Or] Speed up tests by not compressing everything
1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling
6b2e7d1 [Andrew Or] Fix compilation error
92e251b [Patrick Wendell] Better documentation for BlockObjectWriter.
a1ad536 [Andrew Or] Fix tests
089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail)
4bbcf68 [Andrew Or] Update tests to actually test all compression codecs
b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor)
1bfa743 [Andrew Or] Add more information to assert for better debugging
2014-08-28 17:05:21 -07:00
..
java/org/apache/spark [SPARK-2936] Migrate Netty network module from Java to Scala 2014-08-10 20:36:54 -07:00
resources/org/apache/spark [SPARK-2976] Replace tabs with spaces 2014-08-25 19:40:23 -07:00
scala/org/apache [SPARK-3277] Fix external spilling with LZ4 assertion error 2014-08-28 17:05:21 -07:00