ffdbbae1d4
### What changes were proposed in this pull request? This PR allows an external agent to inform the Master that certain hosts are being decommissioned. ### Why are the changes needed? The current decommissioning is triggered by the Worker getting getting a SIGPWR (out of band possibly by some cleanup hook), which then informs the Master about it. This approach may not be feasible in some environments that cannot trigger a clean up hook on the Worker. In addition, when a large number of worker nodes are being decommissioned then the master will get a flood of messages. So we add a new post endpoint `/workers/kill` on the MasterWebUI that allows an external agent to inform the master about all the nodes being decommissioned in bulk. The list of nodes is specified by providing a list of hostnames. All workers on those hosts will be decommissioned. This API is merely a new entry point into the existing decommissioning logic. It does not change how the decommissioning request is handled in its core. ### Does this PR introduce _any_ user-facing change? Yes, a new endpoint `/workers/kill` is added to the MasterWebUI. By default only requests originating from an IP address local to the MasterWebUI are allowed. ### How was this patch tested? Added unit tests Closes #29015 from agrawaldevesh/master_decom_endpoint. Authored-by: Devesh Agrawal <devesh.agrawal@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |