[SPARK-6343] Doc driver-worker network reqs
Attempt at making the driver-worker networking requirement more explicit and up-front in the documentation (see https://issues.apache.org/jira/browse/SPARK-6343). Update cluster overview diagram to show connections from workers to driver. Add a bullet below about how driver listens / accepts connections from workers. Author: Peter Parente <pparent@us.ibm.com> Closes #5382 from parente/SPARK-6343 and squashes the following commits: 0b2fb9d [Peter Parente] [SPARK-6343] Doc driver-worker network reqs
This commit is contained in:
parent
2fe0a1aaee
commit
b9c51c0493
|
@ -33,7 +33,11 @@ There are several useful things to note about this architecture:
|
||||||
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
|
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
|
||||||
processes, and these communicate with each other, it is relatively easy to run it even on a
|
processes, and these communicate with each other, it is relatively easy to run it even on a
|
||||||
cluster manager that also supports other applications (e.g. Mesos/YARN).
|
cluster manager that also supports other applications (e.g. Mesos/YARN).
|
||||||
3. Because the driver schedules tasks on the cluster, it should be run close to the worker
|
3. The driver program must listen for and accept incoming connections from its executors throughout
|
||||||
|
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
|
||||||
|
section](configuration.html#networking)). As such, the driver program must be network
|
||||||
|
addressable from the worker nodes.
|
||||||
|
4. Because the driver schedules tasks on the cluster, it should be run close to the worker
|
||||||
nodes, preferably on the same local area network. If you'd like to send requests to the
|
nodes, preferably on the same local area network. If you'd like to send requests to the
|
||||||
cluster remotely, it's better to open an RPC to the driver and have it submit operations
|
cluster remotely, it's better to open an RPC to the driver and have it submit operations
|
||||||
from nearby than to run a driver far away from the worker nodes.
|
from nearby than to run a driver far away from the worker nodes.
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 27 KiB After Width: | Height: | Size: 33 KiB |
Binary file not shown.
Loading…
Reference in a new issue