[SPARK-26083][K8S] Add Copy pyspark into corresponding dir cmd in pyspark Dockerfile

When I try to run `./bin/pyspark` cmd in a pod in Kubernetes(image built without change from pyspark Dockerfile), I'm getting an error:
```
$SPARK_HOME/bin/pyspark --deploy-mode client --master k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT_HTTPS ...
Python 2.7.15 (default, Aug 22 2018, 13:24:18)
[GCC 6.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
IOError: [Errno 2] No such file or directory: '/opt/spark/python/pyspark/shell.py'
```
This is because `pyspark` folder doesn't exist under `/opt/spark/python/`

## What changes were proposed in this pull request?

Added `COPY python/pyspark ${SPARK_HOME}/python/pyspark` to pyspark Dockerfile to resolve issue above.

## How was this patch tested?
Google Kubernetes Engine

Closes #23037 from AzureQ/master.

Authored-by: Qi Shao <qi.shao.nyu@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This commit is contained in:
Qi Shao 2018-12-03 15:36:41 -08:00 committed by Marcelo Vanzin
parent a24e1a126c
commit 0889fbaf95
2 changed files with 2 additions and 0 deletions

View file

@ -107,6 +107,7 @@ function create_dev_build_context {(
"$PYSPARK_CTX/kubernetes/dockerfiles"
mkdir "$PYSPARK_CTX/python"
cp -r "python/lib" "$PYSPARK_CTX/python/lib"
cp -r "python/pyspark" "$PYSPARK_CTX/python/pyspark"
local R_CTX="$CTX_DIR/sparkr"
mkdir -p "$R_CTX/kubernetes"

View file

@ -38,6 +38,7 @@ RUN apk add --no-cache python && \
# Removed the .cache to save space
rm -r /root/.cache
COPY python/pyspark ${SPARK_HOME}/python/pyspark
COPY python/lib ${SPARK_HOME}/python/lib
ENV PYTHONPATH ${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4j-*.zip