spark-instrumented-optimizer/bin
Rob Vesse 69dab94b13 [SPARK-26687][K8S] Fix handling of custom Dockerfile paths
## What changes were proposed in this pull request?

With the changes from vanzin's PR #23019 (SPARK-26025) we use a pared down temporary Docker build context which significantly improves build times.  However the way this is implemented leads to non-intuitive behaviour when supplying custom Docker file paths.  This is because of the following code snippets:

```
(cd $(img_ctx_dir base) && docker build $NOCACHEARG "${BUILD_ARGS[]}" \
    -t $(image_ref spark) \
    -f "$BASEDOCKERFILE" .)
```

Since the script changes to the temporary build context directory and then runs `docker build` there any path given for the Docker file is taken as relative to the temporary build context directory rather than to the directory where the user invoked the script.  This is rather unintuitive and produces somewhat unhelpful errors e.g.

```
> ./bin/docker-image-tool.sh -r rvesse -t badpath -p resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile build
Sending build context to Docker daemon  218.4MB
Step 1/15 : FROM openjdk:8-alpine
 ---> 5801f7d008e5
Step 2/15 : ARG spark_uid=185
 ---> Using cache
 ---> 5fd63df1ca39
...
Successfully tagged rvesse/spark:badpath
unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /Users/rvesse/Documents/Work/Code/spark/target/tmp/docker/pyspark/resource-managers: no such file or directory
Failed to build PySpark Docker image, please refer to Docker build output for details.
```

Here we can see that the relative path that was valid where the user typed the command was not valid inside the build context directory.

To resolve this we need to ensure that we are resolving relative paths to Docker files appropriately which we do by adding a `resolve_file` function to the script and invoking that on the supplied Docker file paths

## How was this patch tested?

Validated that relative paths now work as expected:

```
> ./bin/docker-image-tool.sh -r rvesse -t badpath -p resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile build
Sending build context to Docker daemon  218.4MB
Step 1/15 : FROM openjdk:8-alpine
 ---> 5801f7d008e5
Step 2/15 : ARG spark_uid=185
 ---> Using cache
 ---> 5fd63df1ca39
Step 3/15 : RUN set -ex &&     apk upgrade --no-cache &&     apk add --no-cache bash tini libc6-compat linux-pam krb5 krb5-libs &&     mkdir -p /opt/spark &&     mkdir -p /opt/spark/examples &&     mkdir -p /opt/spark/work-dir &&     touch /opt/spark/RELEASE &&     rm /bin/sh &&     ln -sv /bin/bash /bin/sh &&     echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su &&     chgrp root /etc/passwd && chmod ug+rw /etc/passwd
 ---> Using cache
 ---> eb0a568e032f
Step 4/15 : COPY jars /opt/spark/jars
...
Successfully tagged rvesse/spark:badpath
Sending build context to Docker daemon  6.599MB
Step 1/13 : ARG base_img
Step 2/13 : ARG spark_uid=185
Step 3/13 : FROM $base_img
 ---> 8f4fff16f903
Step 4/13 : WORKDIR /
 ---> Running in 25466e66f27f
Removing intermediate container 25466e66f27f
 ---> 1470b6efae61
Step 5/13 : USER 0
 ---> Running in b094b739df37
Removing intermediate container b094b739df37
 ---> 6a27eb4acad3
Step 6/13 : RUN mkdir ${SPARK_HOME}/python
 ---> Running in bc8002c5b17c
Removing intermediate container bc8002c5b17c
 ---> 19bb12f4286a
Step 7/13 : RUN apk add --no-cache python &&     apk add --no-cache python3 &&     python -m ensurepip &&     python3 -m ensurepip &&     rm -r /usr/lib/python*/ensurepip &&     pip install --upgrade pip setuptools &&     rm -r /root/.cache
 ---> Running in 12dcba5e527f
...
Successfully tagged rvesse/spark-py:badpath
```

Closes #23613 from rvesse/SPARK-26687.

Authored-by: Rob Vesse <rvesse@dotnetrdf.org>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-01-24 10:11:55 -08:00
..
beeline [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
beeline.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
docker-image-tool.sh [SPARK-26687][K8S] Fix handling of custom Dockerfile paths 2019-01-24 10:11:55 -08:00
find-spark-home [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
find-spark-home.cmd [SPARK-22597][SQL] Add spark-sql cmd script for Windows users 2017-11-24 19:55:26 +01:00
load-spark-env.cmd [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default 2017-11-09 14:33:08 +09:00
load-spark-env.sh [SPARK-26076][BUILD][MINOR] Revise ambiguous error message from load-spark-env.sh 2018-11-20 08:29:59 -06:00
pyspark [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1 2018-10-31 09:55:03 -07:00
pyspark.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
pyspark2.cmd [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1 2018-10-31 09:55:03 -07:00
run-example [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
run-example.cmd [SPARK-22597][SQL] Add spark-sql cmd script for Windows users 2017-11-24 19:55:26 +01:00
spark-class [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode 2017-05-05 11:36:51 +01:00
spark-class.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
spark-class2.cmd [SPARK-22495] Fix setup of SPARK_HOME variable on Windows 2017-11-23 12:47:38 +09:00
spark-shell [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell 2018-11-06 10:39:58 +08:00
spark-shell.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
spark-shell2.cmd [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell 2018-11-06 10:39:58 +08:00
spark-sql [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
spark-sql.cmd [SPARK-22597][SQL] Add spark-sql cmd script for Windows users 2017-11-24 19:55:26 +01:00
spark-sql2.cmd [SPARK-22597][SQL] Add spark-sql cmd script for Windows users 2017-11-24 19:55:26 +01:00
spark-submit [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
spark-submit.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
spark-submit2.cmd [SPARK-11518][DEPLOY, WINDOWS] Handle spaces in Windows command scripts 2016-02-10 09:54:22 +00:00
sparkR [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
sparkR.cmd [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts 2017-10-06 23:38:47 +09:00
sparkR2.cmd [SPARK-22597][SQL] Add spark-sql cmd script for Windows users 2017-11-24 19:55:26 +01:00