38ef4771d4
### What changes were proposed in this pull request? This is one of the patches for SPIP SPARK-30602 for push-based shuffle. Summary of changes: - Introduce `MergeStatus` which tracks the partition level metadata for a merged shuffle partition in the Spark driver - Unify `MergeStatus` and `MapStatus` under a single trait to allow code reusing inside `MapOutputTracker` - Extend `MapOutputTracker` to support registering / unregistering `MergeStatus`, calculate preferred locations for a shuffle taking into consideration of merged shuffle partitions, and serving reducer requests for block fetching locations with merged shuffle partitions. The added APIs in `MapOutputTracker` will be used by `DAGScheduler` in SPARK-32920 and by `ShuffleBlockFetcherIterator` in SPARK-32922 ### Why are the changes needed? Refer to SPARK-30602 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests. Lead-authored-by: Min Shen mshenlinkedin.com Co-authored-by: Chandni Singh chsinghlinkedin.com Co-authored-by: Venkata Sowrirajan vsowrirajanlinkedin.com Closes #30480 from Victsm/SPARK-32921. Lead-authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com> Co-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Chandni Singh <singh.chandni@gmail.com> Co-authored-by: Chandni Singh <chsingh@linkedin.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |