spark-instrumented-optimizer/core
Dongjoon Hyun a908b67502 [SPARK-33218][CORE] Update misleading log messages for removed shuffle blocks
### What changes were proposed in this pull request?

This updates the misleading log messages for removed shuffle block during migration.

### Why are the changes needed?

1. For the deleted shuffle blocks, `IndexShuffleBlockResolver` shows users WARN message saying `skipping migration`. However, `BlockManagerDecommissioner` shows users INFO message including `Migrated ShuffleBlockInfo(...)` inconsistently. Technically, we didn't migrated. We should not show `Migrated` message in this case.

```
INFO BlockManagerDecommissioner: Trying to migrate shuffle ShuffleBlockInfo(109,18924) to BlockManagerId(...) (2 / 3)
WARN IndexShuffleBlockResolver: Failed to resolve shuffle block ShuffleBlockInfo(109,18924), skipping migration. This is expected to occur if a block is removed after decommissioning has started.
INFO BlockManagerDecommissioner: Got migration sub-blocks List()
...
INFO BlockManagerDecommissioner: Migrated ShuffleBlockInfo(109,18924) to BlockManagerId(...)
```

2. In addition, if the shuffle file is deleted while the information is in the queue, the above messages are repeated multiple times, `spark.storage.decommission.maxReplicationFailuresPerBlock`. We had better use one line instead of the group of messages for that case.
```
INFO BlockManagerDecommissioner: Trying to migrate shuffle ShuffleBlockInfo(109,18924) to BlockManagerId(...) (0 / 3)
...
INFO BlockManagerDecommissioner: Trying to migrate shuffle ShuffleBlockInfo(109,18924) to BlockManagerId(...) (1 / 3)
...
INFO BlockManagerDecommissioner: Trying to migrate shuffle ShuffleBlockInfo(109,18924) to BlockManagerId(...) (2 / 3)
```

3. Skipping or not is a role of `BlockManagerDecommissioner` class. `IndexShuffleBlockResolver.getMigrationBlocks` is used twice differently like the following. We had better inform users at `BlockManagerDecommissioner` once.
    - At the beginning, to get the sub-blocks.
    - In case of `IOException`, to determine whether ignoring it or re-throwing. And, `BlockManagerDecommissioner` shows WARN message (`Skipping block ...`) again.

### Does this PR introduce _any_ user-facing change?

No. This is an update for log message info to be consistent.

### How was this patch tested?

Manually.

Closes #30129 from dongjoon-hyun/SPARK-33218.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-22 01:10:24 -07:00
..
benchmarks [SPARK-32437][CORE] Improve MapStatus deserialization speed with RoaringBitmap 0.9.0 2020-07-25 08:07:28 -07:00
src [SPARK-33218][CORE] Update misleading log messages for removed shuffle blocks 2020-10-22 01:10:24 -07:00
pom.xml [SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile 2020-10-22 03:21:34 +00:00