spark-instrumented-optimizer/resource-managers/kubernetes
Marcelo Vanzin a6ddc9d083 [SPARK-24736][K8S] Let spark-submit handle dependency resolution.
Before this change, there was some code in the k8s backend to deal
with how to resolve dependencies and make them available to the
Spark application. It turns out that none of that code is necessary,
since spark-submit already handles all that for applications started
in client mode - like the k8s driver that is run inside a Spark-created
pod.

For that reason, specifically for pyspark, there's no need for the
k8s backend to deal with PYTHONPATH; or, in general, to change the URIs
provided by the user at all. spark-submit takes care of that.

For testing, I created a pyspark script that depends on another module
that is shipped with --py-files. Then I used:

- --py-files http://.../dep.py http://.../test.py
- --py-files http://.../dep.zip http://.../test.py
- --py-files local:/.../dep.py local:/.../test.py
- --py-files local:/.../dep.zip local:/.../test.py

Without this change, all of the above commands fail. With the change, the
driver is able to see the dependencies in all the above cases; but executors
don't see the dependencies in the last two. That's a bug in shared Spark code
that deals with local: dependencies in pyspark (SPARK-26934).

I also tested a Scala app using the main jar from an http server.

Closes #23793 from vanzin/SPARK-24736.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-02-27 09:49:31 -08:00
..
core [SPARK-24736][K8S] Let spark-submit handle dependency resolution. 2019-02-27 09:49:31 -08:00
docker/src/main/dockerfiles/spark [SPARK-24736][K8S] Let spark-submit handle dependency resolution. 2019-02-27 09:49:31 -08:00
integration-tests Revert "[SPARK-26742][K8S] Update Kubernetes-Client version to 4.1.2" 2019-02-26 13:42:07 -08:00