[SPARK-26015][K8S] Set a default UID for Spark on K8S Images
Adds USER directives to the Dockerfiles which is configurable via build argument (`spark_uid`) for easy customisation. A `-u` flag is added to `bin/docker-image-tool.sh` to make it easy to customise this e.g. ``` > bin/docker-image-tool.sh -r rvesse -t uid -u 185 build > bin/docker-image-tool.sh -r rvesse -t uid push ``` If no UID is explicitly specified it defaults to `185` - this is per skonto's suggestion to align with the OpenShift standard reserved UID for Java apps ( https://lists.openshift.redhat.com/openshift-archives/users/2016-March/msg00283.html) Notes: - We have to make the `WORKDIR` writable by the root group or otherwise jobs will fail with `AccessDeniedException` To Do: - [x] Debug and resolve issue with client mode test - [x] Consider whether to always propagate `SPARK_USER_NAME` to environment of driver and executor pods so `entrypoint.sh` can insert that into `/etc/passwd` entry - [x] Rebase once PR #23013 is merged and update documentation accordingly Built the Docker images with the new Dockerfiles that include the `USER` directives. Ran the Spark on K8S integration tests against the new images. All pass except client mode which I am currently debugging further. Also manually dropped myself into the resulting container images via `docker run` and checked `id -u` output to see that UID is as expected. Tried customising the UID from the default via the new `-u` argument to `docker-image-tool.sh` and again checked the resulting image for the correct runtime UID. cc felixcheung skonto vanzin Closes #23017 from rvesse/SPARK-26015. Authored-by: Rob Vesse <rvesse@dotnetrdf.org> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This commit is contained in:
parent
24e78b7f16
commit
1144df3b5d
|
@ -146,6 +146,12 @@ function build {
|
|||
fi
|
||||
|
||||
local BUILD_ARGS=(${BUILD_PARAMS})
|
||||
|
||||
# If a custom SPARK_UID was set add it to build arguments
|
||||
if [ -n "$SPARK_UID" ]; then
|
||||
BUILD_ARGS+=(--build-arg spark_uid=$SPARK_UID)
|
||||
fi
|
||||
|
||||
local BINDING_BUILD_ARGS=(
|
||||
${BUILD_PARAMS}
|
||||
--build-arg
|
||||
|
@ -207,6 +213,8 @@ Options:
|
|||
-t tag Tag to apply to the built image, or to identify the image to be pushed.
|
||||
-m Use minikube's Docker daemon.
|
||||
-n Build docker image with --no-cache
|
||||
-u uid UID to use in the USER directive to set the user the main Spark process runs as inside the
|
||||
resulting container
|
||||
-b arg Build arg to build or push the image. For multiple build args, this option needs to
|
||||
be used separately for each build arg.
|
||||
|
||||
|
@ -243,7 +251,8 @@ PYDOCKERFILE=
|
|||
RDOCKERFILE=
|
||||
NOCACHEARG=
|
||||
BUILD_PARAMS=
|
||||
while getopts f:p:R:mr:t:nb: option
|
||||
SPARK_UID=
|
||||
while getopts f:p:R:mr:t:nb:u: option
|
||||
do
|
||||
case "${option}"
|
||||
in
|
||||
|
@ -263,6 +272,7 @@ do
|
|||
fi
|
||||
eval $(minikube docker-env)
|
||||
;;
|
||||
u) SPARK_UID=${OPTARG};;
|
||||
esac
|
||||
done
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@ Please see [Spark Security](security.html) and the specific advice below before
|
|||
|
||||
## User Identity
|
||||
|
||||
Images built from the project provided Dockerfiles do not contain any [`USER`](https://docs.docker.com/engine/reference/builder/#user) directives. This means that the resulting images will be running the Spark processes as `root` inside the container. On unsecured clusters this may provide an attack vector for privilege escalation and container breakout. Therefore security conscious deployments should consider providing custom images with `USER` directives specifying an unprivileged UID and GID.
|
||||
Images built from the project provided Dockerfiles contain a default [`USER`](https://docs.docker.com/engine/reference/builder/#user) directive with a default UID of `185`. This means that the resulting images will be running the Spark processes as this UID inside the container. Security conscious deployments should consider providing custom images with `USER` directives specifying their desired unprivileged UID and GID. The resulting UID should include the root group in its supplementary groups in order to be able to run the Spark executables. Users building their own images with the provided `docker-image-tool.sh` script can use the `-u <uid>` option to specify the desired UID.
|
||||
|
||||
Alternatively the [Pod Template](#pod-template) feature can be used to add a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems) with a `runAsUser` to the pods that Spark submits. Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments. Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups) if they wish to limit the users that pods may run as.
|
||||
Alternatively the [Pod Template](#pod-template) feature can be used to add a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems) with a `runAsUser` to the pods that Spark submits. This can be used to override the `USER` directives in the images themselves. Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments. Cluster administrators should use [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups) if they wish to limit the users that pods may run as.
|
||||
|
||||
## Volume Mounts
|
||||
|
||||
|
@ -87,6 +87,7 @@ Example usage is:
|
|||
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
|
||||
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
|
||||
```
|
||||
This will build using the projects provided default `Dockerfiles`. To see more options available for customising the behaviour of this tool, including providing custom `Dockerfiles`, please run with the `-h` flag.
|
||||
|
||||
By default `bin/docker-image-tool.sh` builds docker image for running JVM jobs. You need to opt-in to build additional
|
||||
language binding docker images.
|
||||
|
|
|
@ -17,6 +17,8 @@
|
|||
|
||||
FROM openjdk:8-alpine
|
||||
|
||||
ARG spark_uid=185
|
||||
|
||||
# Before building the docker image, first build and make a Spark distribution following
|
||||
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
|
||||
# If this docker file is being used in the context of building your images from a Spark
|
||||
|
@ -47,5 +49,9 @@ COPY data /opt/spark/data
|
|||
ENV SPARK_HOME /opt/spark
|
||||
|
||||
WORKDIR /opt/spark/work-dir
|
||||
RUN chmod g+w /opt/spark/work-dir
|
||||
|
||||
ENTRYPOINT [ "/opt/entrypoint.sh" ]
|
||||
|
||||
# Specify the User that the actual main process will run as
|
||||
USER ${spark_uid}
|
||||
|
|
|
@ -16,8 +16,14 @@
|
|||
#
|
||||
|
||||
ARG base_img
|
||||
ARG spark_uid=185
|
||||
|
||||
FROM $base_img
|
||||
WORKDIR /
|
||||
|
||||
# Reset to root to run installation tasks
|
||||
USER 0
|
||||
|
||||
RUN mkdir ${SPARK_HOME}/R
|
||||
|
||||
RUN apk add --no-cache R R-dev
|
||||
|
@ -27,3 +33,6 @@ ENV R_HOME /usr/lib/R
|
|||
|
||||
WORKDIR /opt/spark/work-dir
|
||||
ENTRYPOINT [ "/opt/entrypoint.sh" ]
|
||||
|
||||
# Specify the User that the actual main process will run as
|
||||
USER ${spark_uid}
|
||||
|
|
|
@ -16,8 +16,14 @@
|
|||
#
|
||||
|
||||
ARG base_img
|
||||
ARG spark_uid=185
|
||||
|
||||
FROM $base_img
|
||||
WORKDIR /
|
||||
|
||||
# Reset to root to run installation tasks
|
||||
USER 0
|
||||
|
||||
RUN mkdir ${SPARK_HOME}/python
|
||||
# TODO: Investigate running both pip and pip3 via virtualenvs
|
||||
RUN apk add --no-cache python && \
|
||||
|
@ -37,3 +43,6 @@ ENV PYTHONPATH ${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4
|
|||
|
||||
WORKDIR /opt/spark/work-dir
|
||||
ENTRYPOINT [ "/opt/entrypoint.sh" ]
|
||||
|
||||
# Specify the User that the actual main process will run as
|
||||
USER ${spark_uid}
|
||||
|
|
|
@ -30,7 +30,7 @@ set -e
|
|||
# If there is no passwd entry for the container UID, attempt to create one
|
||||
if [ -z "$uidentry" ] ; then
|
||||
if [ -w /etc/passwd ] ; then
|
||||
echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd
|
||||
echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
|
||||
else
|
||||
echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
|
||||
fi
|
||||
|
|
|
@ -62,11 +62,12 @@ private[spark] trait ClientModeTestsSuite { k8sSuite: KubernetesSuite =>
|
|||
.endMetadata()
|
||||
.withNewSpec()
|
||||
.withServiceAccountName(kubernetesTestComponents.serviceAccountName)
|
||||
.withRestartPolicy("Never")
|
||||
.addNewContainer()
|
||||
.withName("spark-example")
|
||||
.withImage(image)
|
||||
.withImagePullPolicy("IfNotPresent")
|
||||
.withCommand("/opt/spark/bin/run-example")
|
||||
.addToArgs("/opt/spark/bin/run-example")
|
||||
.addToArgs("--master", s"k8s://https://kubernetes.default.svc")
|
||||
.addToArgs("--deploy-mode", "client")
|
||||
.addToArgs("--conf", s"spark.kubernetes.container.image=$image")
|
||||
|
|
Loading…
Reference in a new issue