spark-instrumented-optimizer

History

sandeep katta ba04562234 [SPARK-29435][CORE] MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true ### What changes were proposed in this pull request? Shuffle Block Construction during Shuffle Write and Read is wrong Shuffle Map Task (Shuffle Write) 19/10/11 22:07:32\| ERROR\| [Executor task launch worker for task 3] org.apache.spark.shuffle.IndexShuffleBlockResolver: ####### For Debug ############ /tmp/hadoop-root1/nm-local-dir/usercache/root1/appcache/application_1570422377362_0008/blockmgr-6d03250d-6e7c-4bc2-bbb7-22b8e3981c35/0d/shuffle_0_3_0.index Result Task (Shuffle Read) 19/10/11 22:07:32\| ERROR\| [Executor task launch worker for task 6] org.apache.spark.storage.ShuffleBlockFetcherIterator: Error occurred while fetching local blocks java.nio.file.NoSuchFileException: /tmp/hadoop-root1/nm-local-dir/usercache/root1/appcache/application_1570422377362_0008/blockmgr-6d03250d-6e7c-4bc2-bbb7-22b8e3981c35/30/shuffle_0_0_0.index at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) As per [SPARK-25341](https://issues.apache.org/jira/browse/SPARK-25341) `mapId` of `SortShuffleManager.getWriter `changed to `context.taskAttemptId() ` from `partitionId` [code]( https://github.com/apache/spark/pull/25620/files#diff-363c53ca5a72cfdc37dac4a723309638R54) But `MapOutputTracker.convertMapStatuses` returns the wrong ShuffleBlock, if `spark.shuffle.useOldFetchProtocol `enabled, it returns `paritionId` as `mapID` which is wrong . [Code](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L912) ### Why are the changes needed? Already MapStatus returned by the ShuffleWriter has the mapId for e.g.[ code here](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L134). So it's nice to use `status.mapTaskId` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UT and manually tested with `spark.shuffle.useOldFetchProtocol` as true and false ![image](https://user-images.githubusercontent.com/35216143/66716530-4f4caa80-edec-11e9-833d-7131a9fbd442.png) Closes #26095 from sandeep-katta/shuffleIssue. Lead-authored-by: sandeep katta <sandeep.katta2007@gmail.com> Co-authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2019-10-15 00:03:13 +08:00
..
benchmarks	[SPARK-29297][TESTS] Compare `core`/`mllib` module benchmarks in JDK8/11	2019-09-29 21:43:58 -07:00
src	[SPARK-29435][CORE] MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true	2019-10-15 00:03:13 +08:00
pom.xml	[SPARK-29296][BUILD][CORE] Remove use of .par to make 2.13 support easier; add scala-2.13 profile to enable pulling in par collections library separately, for the future	2019-10-03 08:56:08 -05:00