[SPARK-35399][DOCUMENTATION] State is still needed in the event of executor failure
### What changes were proposed in this pull request? Fix incorrect statement that state is no longer needed in the event of executor failure and document that it is needed in the case of a flaky app causing occasional executor failure. SO [discussion](https://stackoverflow.com/questions/67466878/can-spark-with-external-shuffle-service-use-saved-shuffle-files-in-the-event-of/67507439#67507439). ### Why are the changes needed? To fix the documentation and guide users as to additional use case for the Shuffle Service. ### Does this PR introduce _any_ user-facing change? Documentation only. ### How was this patch tested? N/A. Closes #32538 from chrisheaththomas/shuffle-service-and-executor-failure. Authored-by: Chris Thomas <chrisheaththomas@hotmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
This commit is contained in:
parent
b4348b7e56
commit
ceb8122c40
|
@ -943,8 +943,8 @@ Apart from these, the following properties are also available, and may be useful
|
||||||
<td>false</td>
|
<td>false</td>
|
||||||
<td>
|
<td>
|
||||||
Enables the external shuffle service. This service preserves the shuffle files written by
|
Enables the external shuffle service. This service preserves the shuffle files written by
|
||||||
executors so the executors can be safely removed. The external shuffle service
|
executors e.g. so that executors can be safely removed, or so that shuffle fetches can continue in
|
||||||
must be set up in order to enable it. See
|
the event of executor failure. The external shuffle service must be set up in order to enable it. See
|
||||||
<a href="job-scheduling.html#configuration-and-setup">dynamic allocation
|
<a href="job-scheduling.html#configuration-and-setup">dynamic allocation
|
||||||
configuration and setup documentation</a> for more information.
|
configuration and setup documentation</a> for more information.
|
||||||
</td>
|
</td>
|
||||||
|
|
|
@ -142,13 +142,12 @@ an executor should not be idle if there are still pending tasks to be scheduled.
|
||||||
|
|
||||||
### Graceful Decommission of Executors
|
### Graceful Decommission of Executors
|
||||||
|
|
||||||
Before dynamic allocation, a Spark executor exits either on failure or when the associated
|
Before dynamic allocation, if a Spark executor exits when the associated application has also exited
|
||||||
application has also exited. In both scenarios, all state associated with the executor is no
|
then all state associated with the executor is no longer needed and can be safely discarded.
|
||||||
longer needed and can be safely discarded. With dynamic allocation, however, the application
|
With dynamic allocation, however, the application is still running when an executor is explicitly
|
||||||
is still running when an executor is explicitly removed. If the application attempts to access
|
removed. If the application attempts to access state stored in or written by the executor, it will
|
||||||
state stored in or written by the executor, it will have to perform a recompute the state. Thus,
|
have to perform a recompute the state. Thus, Spark needs a mechanism to decommission an executor
|
||||||
Spark needs a mechanism to decommission an executor gracefully by preserving its state before
|
gracefully by preserving its state before removing it.
|
||||||
removing it.
|
|
||||||
|
|
||||||
This requirement is especially important for shuffles. During a shuffle, the Spark executor first
|
This requirement is especially important for shuffles. During a shuffle, the Spark executor first
|
||||||
writes its own map outputs locally to disk, and then acts as the server for those files when other
|
writes its own map outputs locally to disk, and then acts as the server for those files when other
|
||||||
|
|
Loading…
Reference in a new issue