spark-instrumented-optimizer

History

Shubham Chopra a250933c62 [SPARK-19803][CORE][TEST] Proactive replication test failures ## What changes were proposed in this pull request? Executors cache a list of their peers that is refreshed by default every minute. The cached stale references were randomly being used for replication. Since those executors were removed from the master, they did not occur in the block locations as reported by the master. This was fixed by 1. Refreshing peer cache in the block manager before trying to pro-actively replicate. This way the probability of replicating to a failed executor is eliminated. 2. Explicitly stopping the block manager in the tests. This shuts down the RPC endpoint use by the block manager. This way, even if a block manager tries to replicate using a stale reference, the replication logic should take care of refreshing the list of peers after failure. ## How was this patch tested? Tested manually Author: Shubham Chopra <schopra31@bloomberg.net> Author: Kay Ousterhout <kayousterhout@gmail.com> Author: Shubham Chopra <shubhamchopra@users.noreply.github.com> Closes #17325 from shubhamchopra/SPARK-19803.	2017-03-28 09:47:29 +08:00
..
src	[SPARK-19803][CORE][TEST] Proactive replication test failures	2017-03-28 09:47:29 +08:00
pom.xml	[SPARK-17807][CORE] split test-tags into test-JAR	2016-12-21 16:37:20 -08:00

Shubham Chopra a250933c62 [SPARK-19803][CORE][TEST] Proactive replication test failures

## What changes were proposed in this pull request?
Executors cache a list of their peers that is refreshed by default every minute. The cached stale references were randomly being used for replication. Since those executors were removed from the master, they did not occur in the block locations as reported by the master. This was fixed by
1. Refreshing peer cache in the block manager before trying to pro-actively replicate. This way the probability of replicating to a failed executor is eliminated.
2. Explicitly stopping the block manager in the tests. This shuts down the RPC endpoint use by the block manager. This way, even if a block manager tries to replicate using a stale reference, the replication logic should take care of refreshing the list of peers after failure.

## How was this patch tested?
Tested manually

Author: Shubham Chopra <schopra31@bloomberg.net>
Author: Kay Ousterhout <kayousterhout@gmail.com>
Author: Shubham Chopra <shubhamchopra@users.noreply.github.com>

Closes #17325 from shubhamchopra/SPARK-19803.

2017-03-28 09:47:29 +08:00

src

[SPARK-19803][CORE][TEST] Proactive replication test failures

2017-03-28 09:47:29 +08:00

pom.xml

[SPARK-17807][CORE] split test-tags into test-JAR

2016-12-21 16:37:20 -08:00