[SPARK-6343] Doc driver-worker network reqs

Attempt at making the driver-worker networking requirement more explicit and up-front in the documentation (see https://issues.apache.org/jira/browse/SPARK-6343).

Update cluster overview diagram to show connections from workers to driver. Add a bullet below about how driver listens / accepts connections from workers.

Author: Peter Parente <pparent@us.ibm.com>

Closes #5382 from parente/SPARK-6343 and squashes the following commits:

0b2fb9d [Peter Parente] [SPARK-6343] Doc driver-worker network reqs
This commit is contained in:
Peter Parente 2015-04-09 06:37:20 -04:00 committed by Sean Owen
parent 2fe0a1aaee
commit b9c51c0493
3 changed files with 5 additions and 1 deletions

View file

@ -33,7 +33,11 @@ There are several useful things to note about this architecture:
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor 2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
processes, and these communicate with each other, it is relatively easy to run it even on a processes, and these communicate with each other, it is relatively easy to run it even on a
cluster manager that also supports other applications (e.g. Mesos/YARN). cluster manager that also supports other applications (e.g. Mesos/YARN).
3. Because the driver schedules tasks on the cluster, it should be run close to the worker 3. The driver program must listen for and accept incoming connections from its executors throughout
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
section](configuration.html#networking)). As such, the driver program must be network
addressable from the worker nodes.
4. Because the driver schedules tasks on the cluster, it should be run close to the worker
nodes, preferably on the same local area network. If you'd like to send requests to the nodes, preferably on the same local area network. If you'd like to send requests to the
cluster remotely, it's better to open an RPC to the driver and have it submit operations cluster remotely, it's better to open an RPC to the driver and have it submit operations
from nearby than to run a driver far away from the worker nodes. from nearby than to run a driver far away from the worker nodes.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.