[SPARK-23104][K8S][DOCS] Changes to Kubernetes scheduler documentation

## What changes were proposed in this pull request?

Docs changes:
- Adding a warning that the backend is experimental.
- Removing a defunct internal-only option from documentation
- Clarifying that node selectors can be used right away, and other minor cosmetic changes

## How was this patch tested?

Docs only change

Author: foxish <ramanathana@google.com>

Closes #20314 from foxish/ambiguous-docs.
This commit is contained in:
foxish 2018-01-19 10:23:13 -08:00 committed by Marcelo Vanzin
parent d8aaa771e2
commit 73d3b230f3
2 changed files with 22 additions and 25 deletions

View file

@ -52,8 +52,8 @@ The system currently supports three cluster managers:
* [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can also run Hadoop MapReduce
and service applications.
* [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2.
* [Kubernetes](running-on-kubernetes.html) -- [Kubernetes](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)
is an open-source platform that provides container-centric infrastructure.
* [Kubernetes](running-on-kubernetes.html) -- an open-source system for automating deployment, scaling,
and management of containerized applications.
A third-party project (not supported by the Spark project) exists to add support for
[Nomad](https://github.com/hashicorp/nomad-spark) as a cluster manager.

View file

@ -8,6 +8,10 @@ title: Running Spark on Kubernetes
Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native
Kubernetes scheduler that has been added to Spark.
**The Kubernetes scheduler is currently experimental.
In future versions, there may be behavioral changes around configuration,
container images and entrypoints.**
# Prerequisites
* A runnable distribution of Spark 2.3 or above.
@ -41,11 +45,10 @@ logs and remains in "completed" state in the Kubernetes API until it's eventuall
Note that in the completed state, the driver pod does *not* use any computational or memory resources.
The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling
decisions for driver and executor pods using advanced primitives like
[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
in a future release.
The driver and executor pod scheduling is handled by Kubernetes. It is possible to schedule the
driver and executor pods on a subset of available nodes through a [node selector](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
using the configuration property for it. It will be possible to use more advanced
scheduling hints like [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) in a future release.
# Submitting Applications to Kubernetes
@ -62,8 +65,10 @@ use with the Kubernetes backend.
Example usage is:
./bin/docker-image-tool.sh -r <repo> -t my-tag build
./bin/docker-image-tool.sh -r <repo> -t my-tag push
```bash
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
```
## Cluster Mode
@ -94,7 +99,7 @@ must consist of lower case alphanumeric characters, `-`, and `.` and must start
If you have a Kubernetes cluster setup, one way to discover the apiserver URL is by executing `kubectl cluster-info`.
```bash
kubectl cluster-info
$ kubectl cluster-info
Kubernetes master is running at http://127.0.0.1:6443
```
@ -105,7 +110,7 @@ authenticating proxy, `kubectl proxy` to communicate to the Kubernetes API.
The local proxy can be started by:
```bash
kubectl proxy
$ kubectl proxy
```
If the local proxy is running at localhost:8001, `--master k8s://http://127.0.0.1:8001` can be used as the argument to
@ -173,7 +178,7 @@ Logs can be accessed using the Kubernetes API and the `kubectl` CLI. When a Spar
to stream logs from the application using:
```bash
kubectl -n=<namespace> logs -f <driver-pod-name>
$ kubectl -n=<namespace> logs -f <driver-pod-name>
```
The same logs can also be accessed through the
@ -186,7 +191,7 @@ The UI associated with any application can be accessed locally using
[`kubectl port-forward`](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/#forward-a-local-port-to-a-port-on-the-pod).
```bash
kubectl port-forward <driver-pod-name> 4040:4040
$ kubectl port-forward <driver-pod-name> 4040:4040
```
Then, the Spark driver UI can be accessed on `http://localhost:4040`.
@ -200,13 +205,13 @@ are errors during the running of the application, often, the best way to investi
To get some basic information about the scheduling decisions made around the driver pod, you can run:
```bash
kubectl describe pod <spark-driver-pod>
$ kubectl describe pod <spark-driver-pod>
```
If the pod has encountered a runtime error, the status can be probed further using:
```bash
kubectl logs <spark-driver-pod>
$ kubectl logs <spark-driver-pod>
```
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
@ -254,7 +259,7 @@ To create a custom service account, a user can use the `kubectl create serviceac
following command creates a service account named `spark`:
```bash
kubectl create serviceaccount spark
$ kubectl create serviceaccount spark
```
To grant a service account a `Role` or `ClusterRole`, a `RoleBinding` or `ClusterRoleBinding` is needed. To create
@ -263,7 +268,7 @@ for `ClusterRoleBinding`) command. For example, the following command creates an
namespace and grants it to the `spark` service account created above:
```bash
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
$ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
```
Note that a `Role` can only be used to grant access to resources (like pods) within a single namespace, whereas a
@ -543,14 +548,6 @@ specific to Spark on Kubernetes.
to avoid name conflicts.
</td>
</tr>
<tr>
<td><code>spark.kubernetes.executor.podNamePrefix</code></td>
<td>(none)</td>
<td>
Prefix for naming the executor pods.
If not set, the executor pod name is set to driver pod name suffixed by an integer.
</td>
</tr>
<tr>
<td><code>spark.kubernetes.executor.lostCheck.maxAttempts</code></td>
<td><code>10</code></td>