Commit graph

431 commits

Author SHA1 Message Date
Cong Du 54b97b2e14 [MINOR][DOCS] Fix a typo in ContainerPlacementStrategy's class comment
### What changes were proposed in this pull request?
This PR fixes a typo in deploy/yarn/LocalityPreferredContainerPlacementStrategy.scala file.

### Why are the changes needed?
To deliver correct explanation about how the placement policy works.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
UT as specified, although shouldn't influence any functionality since it's in the comment.

Closes #28267 from asclepiusaka/master.

Authored-by: Cong Du <asclepius1993@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-22 09:44:43 -05:00
Marcelo Vanzin b8ccd75524
[SPARK-29905][K8S] Improve pod lifecycle manager behavior with dynamic allocation
This issue mainly shows up when you enable dynamic allocation:
because there are many executor state changes (because of executors
being requested and starting to run, and later stopped), the lifecycle
manager class could end up logging information about the same executor
multiple times, since the different events would cause the same
executor update to be present in multiple pod snapshots. On top of that,
it could end up making multiple redundant calls into the API server
for the same pod.

Another issue was when the config was set to not delete executor
pods; with dynamic allocation, that means pods keep accumulating
in the API server, and every time the full sync is done by the
polling source, all executors, even the finished ones that Spark
technically does not care about anymore, would be processed.

The change modifies the lifecycle monitor so that it:

- logs executor updates a single time, even if it shows up in
  multiple snapshots, by checking whether the state change
  happened before.
- marks finished-but-not-deleted-in-k8s executors with a label
  so that they can be easily filtered out.

This reduces the amount of logging done by the lifecycle manager,
which is a minor thing in general since the logs are at debug level.
But it also reduces the amount of data that needs to be fetched
from the API server under certain configurations, and overall
reduces interaction with the API server when dynamic allocation is on.

There's also a change in the snapshot store to ensure that the
same subscriber is not called concurrently. That is kind of a bug,
since it means subscribers could be processing snapshots out of order,
or even that they could block multiple threads (e.g. the allocator
callback was synchronized). I actually ran into the "concurrent calls"
situation in the lifecycle manager during testing, and while it did not
seem to cause problems, it did make for some head scratching while
looking at the logs. It seemed safer to fix that.

Unit tests were updated to check for the changes. Also tested in real
cluster with dynamic allocation on.

Closes #26535 from vanzin/SPARK-29905.

Lead-authored-by: Marcelo Vanzin <vanzin@apache.org>
Co-authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-16 14:15:10 -07:00
Seongjin Cho 7699f765f5
[SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts
### What changes were proposed in this pull request?
This PR (SPARK-31394) aims to add a new feature that enables mounting of Kubernetes NFS volumes. Most of the codes are just slight modifications from the existing codes for EmptyDir/HostDir/PVC support.

### Why are the changes needed?
Kubernetes supports various kinds of volumes, but Spark for Kubernetes supports only EmptyDir/HostDir/PVC. By adding support for NFS, we can use Spark for Kubernetes with NFS storage.

In order to use NFS with the current Spark using PVC, the user needs to first create an empty new PVC with NFS. Kubernetes' NFS provisioner will create a new empty dir in NFS under some pre-configured dir for this PVC, for example, `/nfs/k8s/sjcho-my-notebook-pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b`. Then the user should add files to process in the newly created PVC using some file-copying job, and then run the desired Spark job using that populated PVC. And then to get the final results out, the user should run another file-copying job.

This in theory works, but for data analysis tasks, is quite cumbersome. With this change, one could simply use existing files in NFS, say `/nfs/home/sjcho/myfiles10.sstable` from the Spark job directly, and also write the results directly to some existing dir under NFS such as `/nfs/home/sjcho/output`.

This PR doesn't use any features other than the features already provided by Kubernetes itself, so there should be no compatibility issues (other than limited by k8s) between the wide variety of NFS choices. This PR merely enables an existing volume type `nfs` supported officially by Kubernetes, just like Spark is currently supporting `hostPath` and `persistentVolumeClaim` right now.

### Does this PR introduce any user-facing change?
Users can now mount NFS volumes by running commands like:
```
spark-submit \
--conf spark.kubernetes.driver.volumes.nfs.myshare.mount.path=/myshare \
--conf spark.kubernetes.driver.volumes.nfs.myshare.mount.readOnly=false \
--conf spark.kubernetes.driver.volumes.nfs.myshare.options.server=nfs.example.com \
--conf spark.kubernetes.driver.volumes.nfs.myshare.options.path=/storage/myshare \
...
```

### How was this patch tested?
Test cases were added just like the existing EmptyDir support.

The code were tested using minikube using the following script:
https://gist.github.com/w4-sjcho/4ba48f8c35a9685f5307fbd46b2c0656#file-run-test-sh

The script creates a new minikube cluster, launches an NFS server inside the cluster, copy `README.md` file to the NFS share, and run `JavaWordCount` example against the file located in NFS.

Closes #27364 from w4-sjcho/master.

Authored-by: Seongjin Cho <sjcho@wisefour.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-04-15 03:45:39 -07:00
Nicholas Marcott 8b4862953a [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling
### What changes were proposed in this pull request?

[Delay scheduling](http://elmeleegy.com/khaled/papers/delay_scheduling.pdf) is an optimization that sacrifices fairness for data locality in order to improve cluster and workload throughput.

One useful definition of "delay" here is how much time has passed since the TaskSet was using its fair share of resources.

However it is impractical to calculate this delay, as it would require running simulations assuming no delay scheduling. Tasks would be run in different orders with different run times.

Currently the heuristic used to estimate this delay is the time since a task was last launched for a TaskSet. The problem is that it essentially does not account for resource utilization, potentially leaving the cluster heavily underutilized.

This PR modifies the heuristic in an attempt to move closer to the useful definition of delay above.
The newly proposed delay is the time since a TasksSet last launched a task **and** did not reject any resources due to delay scheduling when offered its "fair share".

See the last comments of #26696 for more discussion.

### Why are the changes needed?

cluster can become heavily underutilized as described in [SPARK-18886](https://issues.apache.org/jira/browse/SPARK-18886?jql=project%20%3D%20SPARK%20AND%20text%20~%20delay)

### How was this patch tested?

TaskSchedulerImplSuite

cloud-fan
tgravescs
squito

Closes #27207 from bmarcott/nmarcott-fulfill-slots-2.

Authored-by: Nicholas Marcott <481161+bmarcott@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-09 11:00:29 +00:00
Dongjoon Hyun dba525c997 [SPARK-31313][K8S][TEST] Add m01 node name to support Minikube 1.8.x
### What changes were proposed in this pull request?

This PR aims to add `m01` as a node name additionally to `PVTestsSuite`.

### Why are the changes needed?

minikube 1.8.0 ~ 1.8.2 generate a cluster with a nodename `m01` while all the other versions have `minikube`. This causes `PVTestSuite` failure.
```
$ minikube --vm-driver=hyperkit start --memory 6000 --cpus 8
* minikube v1.8.2 on Darwin 10.15.3
  - MINIKUBE_ACTIVE_DOCKERD=minikube
* Using the hyperkit driver based on user configuration
* Creating hyperkit VM (CPUs=8, Memory=6000MB, Disk=20000MB) ...
* Preparing Kubernetes v1.18.0 on Docker 19.03.6 ...
* Launching Kubernetes ...
* Enabling addons: default-storageclass, storage-provisioner
* Waiting for cluster to come online ...
* Done! kubectl is now configured to use "minikube"

$ kubectl get nodes
NAME   STATUS   ROLES    AGE   VERSION
m01    Ready    master   22s   v1.17.3
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This only adds a new node name. So, K8S Jenkins job should passed.
In addition, `K8s` integration test suite should be tested on `minikube 1.8.2` manually.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
- Run SparkR on simple dataframe.R example
Run completed in 10 minutes, 23 seconds.
Total number of tests run: 20
Suites: completed 2, aborted 0
Tests: succeeded 20, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

For the above test, Minikube 1.8.2 and K8s v1.18.0 is used.
```
$ minikube version
minikube version: v1.8.2
commit: eb13446e786c9ef70cb0a9f85a633194e62396a1

$ kubectl version --short
Client Version: v1.18.0
Server Version: v1.18.0
```

Closes #28080 from dongjoon-hyun/SPARK-31313.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2020-04-01 03:42:26 +00:00
Đặng Minh Dũng 1d0fc9aa85
[SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker entrypoint.sh
### What changes were proposed in this pull request?
A small change to fix an error in Docker `entrypoint.sh`

### Why are the changes needed?
When spark running on Kubernetes, I got the following logs:
```log
+ '[' -n ']'
+ '[' -z ']'
++ /bin/hadoop classpath
/opt/entrypoint.sh: line 62: /bin/hadoop: No such file or directory
+ export SPARK_DIST_CLASSPATH=
+ SPARK_DIST_CLASSPATH=
```
This is because you are missing some quotes on bash comparisons.

### Does this PR introduce any user-facing change?
No

## How was this patch tested?
CI

Closes #28075 from dungdm93/patch-1.

Authored-by: Đặng Minh Dũng <dungdm93@live.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-30 15:41:57 -07:00
Prashant Sharma f87957371d
[SPARK-31200][K8S] Enforce to use https in /etc/apt/sources.list
…n progress errors.

### What changes were proposed in this pull request?
Switching to `https` instead of `http` in the debian mirror urls.

### Why are the changes needed?
My ISP was trying to intercept (or trying to serve from cache) the `http` traffic and this was causing a very confusing errors while building the spark image. I thought by posting this, I can help someone save his time and energy, if he encounters the same issue.
```
bash-3.2$ bin/docker-image-tool.sh -r scrapcodes -t v3.1.0-f1cc86 build
Sending build context to Docker daemon  203.4MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex &&     apt-get update &&     ln -s /lib /lib64 &&     apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps &&     mkdir -p /opt/spark &&     mkdir -p /opt/spark/examples &&     mkdir -p /opt/spark/work-dir &&     touch /opt/spark/RELEASE &&     rm /bin/sh &&     ln -sv /bin/bash /bin/sh &&     echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su &&     chgrp root /etc/passwd && chmod ug+rw /etc/passwd &&     rm -rf /var/cache/apt/*
 ---> Running in 96bcbe927d35
+ apt-get update
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Err:3 http://deb.debian.org/debian buster/main amd64 Packages
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7906744 [weak]
   - SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
   - MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
  Release file created at: Sat, 08 Feb 2020 10:57:10 +0000
Get:5 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 B]
Err:5 http://deb.debian.org/debian buster-updates/main amd64 Packages
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7380 [weak]
   - SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  Release file created at: Fri, 20 Mar 2020 02:28:11 +0000
Get:4 http://security-cdn.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:6 http://security-cdn.debian.org/debian-security buster/updates/main amd64 Packages [183 kB]
Fetched 419 kB in 1s (327 kB/s)
Reading package lists...
E: Failed to fetch 80ed5d1cc1  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 151.101.10.133 80]
   Hashes of expected file:
    - Filesize:7906744 [weak]
    - SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
    - MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
   Release file created at: Sat, 08 Feb 2020 10:57:10 +0000
E: Failed to fetch 6af9ea081b  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 151.101.10.133 80]
   Hashes of expected file:
    - Filesize:7380 [weak]
    - SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
   Release file created at: Fri, 20 Mar 2020 02:28:11 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
The command '/bin/sh -c set -ex &&     apt-get update &&     ln -s /lib /lib64 &&     apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps &&     mkdir -p /opt/spark &&     mkdir -p /opt/spark/examples &&     mkdir -p /opt/spark/work-dir &&     touch /opt/spark/RELEASE &&     rm /bin/sh &&     ln -sv /bin/bash /bin/sh &&     echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su &&     chgrp root /etc/passwd && chmod ug+rw /etc/passwd &&     rm -rf /var/cache/apt/*' returned a non-zero code: 100
Failed to build Spark JVM Docker image, please refer to Docker build output for details.
```
### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manually by switching to `https` mirrors on the offending ISP (I am already on).

Closes #27966 from ScrapCodes/docker-mirror.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-27 09:13:55 -07:00
Thomas Graves 474b1bb5c2 [SPARK-29154][CORE] Update Spark scheduler for stage level scheduling
### What changes were proposed in this pull request?

This is the core scheduler changes to support Stage level scheduling.

The main changes here include modification to the DAGScheduler to look at the ResourceProfiles associated with an RDD and have those applied inside the scheduler.
Currently if multiple RDD's in a stage have conflicting ResourceProfiles we throw an error. logic to allow this will happen in SPARK-29153. I added the interfaces to RDD to add and get the REsourceProfile so that I could add unit tests for the scheduler. These are marked as private for now until we finish the feature and will be exposed in SPARK-29150. If you think this is confusing I can remove those and remove the tests and add them back later.
I modified the task scheduler to make sure to only schedule on executor that exactly match the resource profile. It will then check those executors to make sure the current resources meet the task needs before assigning it.  In here I changed the way we do the custom resource assignment.
Other changes here include having the cpus per task passed around so that we can properly account for them. Previously we just used the one global config, but now it can change based on the ResourceProfile.
I removed the exceptions that require the cores to be the limiting resource. With this change all the places I found that used executor cores /task cpus as slots has been updated to use the ResourceProfile logic and look to see what resource is limiting.

### Why are the changes needed?

Stage level sheduling feature

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

unit tests and lots of manual testing

Closes #27773 from tgravescs/SPARK-29154.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-03-26 09:46:36 -05:00
Dongjoon Hyun f206bbde3a
[SPARK-31244][K8S][TEST] Use Minio instead of Ceph in K8S DepsTestsSuite
### What changes were proposed in this pull request?

This PR (SPARK-31244) replaces `Ceph` with `Minio` in K8S `DepsTestSuite`.

### Why are the changes needed?

Currently, `DepsTestsSuite` is using `ceph` for S3 storage. However, the used version and all new releases are broken on new `minikube` releases. We had better use more robust and small one.

```
$ minikube version
minikube version: v1.8.2

$ minikube -p minikube docker-env | source

$ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo ceph/daemon:v4.0.3-stable-4.0-nautilus-centos-7-x86_64 /bin/sh
2020-03-25 04:26:21  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks like we have not been able to discover the network settings

$ docker run -it --rm -e NETWORK_AUTO_DETECT=4 -e RGW_FRONTEND_PORT=8000 -e SREE_PORT=5001 -e CEPH_DEMO_UID=nano -e CEPH_DAEMON=demo ceph/daemon:v4.0.11-stable-4.0-nautilus-centos-7 /bin/sh
2020-03-25 04:20:30  /opt/ceph-container/bin/entrypoint.sh: ERROR- it looks like we have not been able to discover the network settings
```

Also, the image size is unnecessarily big (almost `1GB`) and growing while `minio` is `55.8MB` with the same features.
```
$ docker images | grep ceph
ceph/daemon v4.0.3-stable-4.0-nautilus-centos-7-x86_64 a6a05ccdf924 6 months ago 852MB
ceph/daemon v4.0.11-stable-4.0-nautilus-centos-7       87f695550d8e 12 hours ago 901MB

$ docker images | grep minio
minio/minio latest                                     95c226551ea6 5 days ago   55.8MB
```

### Does this PR introduce any user-facing change?

No. (This is a test case change)

### How was this patch tested?

Pass the existing Jenkins K8s integration test job and test with the latest minikube.
```
$ minikube version
minikube version: v1.8.2

$ kubectl version --short
Client Version: v1.17.4
Server Version: v1.17.4

$ NO_MANUAL=1 ./dev/make-distribution.sh --r --pip --tgz -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz $PWD/spark-*.tgz
...
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage *** FAILED *** // This is irrelevant to this PR.
- Launcher client dependencies          // This is the fixed test case by this PR.
- Test basic decommissioning
- Run SparkR on simple dataframe.R example
Run completed in 12 minutes, 4 seconds.
...
```

The following is the working snapshot of `DepsTestSuite` test.
```
$ kubectl get all -ncf9438dd8a65436686b1196a6b73000f
NAME                                                  READY   STATUS    RESTARTS   AGE
pod/minio-0                                           1/1     Running   0          70s
pod/spark-test-app-8494bddca3754390b9e59a2ef47584eb   1/1     Running   0          55s

NAME                                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/minio-s3                                     NodePort    10.109.54.180   <none>        9000:30678/TCP               70s
service/spark-test-app-fd916b711061c7b8-driver-svc   ClusterIP   None            <none>        7078/TCP,7079/TCP,4040/TCP   55s

NAME                     READY   AGE
statefulset.apps/minio   1/1     70s
```

Closes #28015 from dongjoon-hyun/SPARK-31244.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-25 12:38:15 -07:00
Prashant Sharma 3799d2b9d8
[SPARK-30715][K8S][TESTS][FOLLOWUP] Update k8s client version in IT as well
### What changes were proposed in this pull request?
This is a follow up for SPARK-30715 . Kubernetes client version in sync in integration-tests and kubernetes/core

### Why are the changes needed?
More than once, the kubernetes client version has gone out of sync between integration tests and kubernetes/core. So brought them up in sync again and added a comment to save us from future need of this additional followup.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manually.

Closes #27948 from ScrapCodes/follow-up-spark-30715.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-21 18:26:53 -07:00
Holden Karau 57d27e900f
[SPARK-31125][K8S] Terminating pods have a deletion timestamp but they are not yet dead
### What changes were proposed in this pull request?

Change what we consider a deleted pod to not include "Terminating"

### Why are the changes needed?

If we get a new snapshot while a pod is in the process of being cleaned up we shouldn't delete the executor until it is fully terminated.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

This should be covered by the decommissioning tests in that they currently are flaky because we sometimes delete the executor instead of allowing it to decommission all the way.

I also ran this in a loop locally ~80 times with the only failures being the PV suite because of unrelated minikube mount issues.

Closes #27905 from holdenk/SPARK-31125-Processing-state-snapshots-incorrect.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-17 12:04:06 -07:00
Pedro Rossi ed06d98044
[SPARK-25355][K8S] Add proxy user to driver if present on spark-submit
### What changes were proposed in this pull request?

This PR adds the proxy user on the spark-submit command to the childArgs, so the proxy user can be retrieved and used in the KubernetesAplication to add the proxy user in the driver container args

### Why are the changes needed?

The proxy user when used on the spark submit doesn't work on the Kubernetes environment since it doesn't add the `--proxy-user` argument on the driver container and when I added it manually to the Pod definition it worked just fine.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Tests were added

Closes #27422 from PedroRossi/SPARK-25355.

Authored-by: Pedro Rossi <pgrr@cin.ufpe.br>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-16 21:53:58 -07:00
Dale Clarke 2a4fed0443 [SPARK-30654][WEBUI] Bootstrap4 WebUI upgrade
### What changes were proposed in this pull request?
Spark's Web UI is using an older version of Bootstrap (v. 2.3.2) for the portal pages. Bootstrap 2.x was moved to EOL in Aug 2013 and Bootstrap 3.x was moved to EOL in July 2019 (https://github.com/twbs/release). Older versions of Bootstrap are also getting flagged in security scans for various CVEs:

https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-72889
https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-173700
https://snyk.io/vuln/npm:bootstrap:20180529
https://snyk.io/vuln/npm:bootstrap:20160627

I haven't validated each CVE, but it would be nice to resolve any potential issues and get on a supported release.

The bad news is that there have been quite a few changes between Bootstrap 2 and Bootstrap 4. I've tried updating the library, refactoring/tweaking the CSS and JS to maintain a similar appearance and functionality, and testing the UI for functionality and appearance. This is a fairly large change so I'm sure additional testing and fixes will be needed.

### How was this patch tested?
This has been manually tested, but there is a ton of functionality and there are many pages and detail pages so it is very possible bugs introduced from the upgrade were missed. Additional testing and feedback is welcomed. If it appears a whole page was missed let me know and I'll take a pass at addressing that page/section.

Closes #27370 from clarkead/bootstrap4-core-upgrade.

Authored-by: Dale Clarke <a.dale.clarke@gmail.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-03-13 15:24:48 -07:00
beliefer 1cd80fa9fa [SPARK-31109][MESOS][DOC] Add version information to the configuration of Mesos
### What changes were proposed in this pull request?
Add version information to the configuration of `Mesos`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.mesos.$taskType.secret.names | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.values | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.envkeys | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.$taskType.secret.filenames | 2.3.0 | SPARK-22131 | 5415963d2caaf95604211419ffc4e29fff38e1d7#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.principal | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.principal.file | 2.4.0 | SPARK-16501 | 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d |  
spark.mesos.secret | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.secret.file | 2.4.0 | SPARK-16501 | 7f10cf83f311526737fc96d5bb8281d12e41932f#diff-daf48dabbe58afaeed8787751750b01d |  
spark.shuffle.cleaner.interval | 2.0.0 | SPARK-12583 | 310981d49a332bd329303f610b150bbe02cf5f87#diff-2fafefee94f2a2023ea9765536870258 |  
spark.mesos.dispatcher.webui.url | 2.0.0 | SPARK-13492 | a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-f541460c7a74cee87cbb460b3b01665e |  
spark.mesos.dispatcher.historyServer.url | 2.1.0 | SPARK-16809 | 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-3779e2035d9a09fa5f6af903925b9512 |  
spark.mesos.driver.labels | 2.3.0 | SPARK-21000 | 8da3f7041aafa71d7596b531625edb899970fec2#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.webui.url | 2.0.0 | SPARK-13492 | a4a0addccffb7cd0ece7947d55ce2538afa54c97#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.driver.failoverTimeout | 2.3.0 | SPARK-21456 | c42ef953343073a50ef04c5ce848b574ff7f2238#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.network.name | 2.1.0 | SPARK-18232 | d89bfc92302424406847ac7a9cfca714e6b742fc#diff-ab5bf34f1951a8f7ea83c9456a6c3ab7 |  
spark.mesos.network.labels | 2.3.0 | SPARK-21694 | ce0d3bb377766bdf4df7852272557ae846408877#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.constraints | 2.2.1 | SPARK-19606 | f6ee3d90d5c299e67ae6e2d553c16c0d9759d4b5#diff-91e6e5f871160782dc50d4060d6faea3 |  
spark.mesos.driver.frameworkId | 2.1.0 | SPARK-16809 | 62e62124419f3fa07b324f5e42feb2c5b4fde715#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.executor.uri | 0.8.0 | None | 46eecd110a4017ea0c86cbb1010d0ccd6a5eb2ef#diff-a885e7df97790e9b59c21c63353e7476 |  
spark.mesos.proxy.baseURL | 2.3.0 | SPARK-13041 | 663f30d14a0c9219e07697af1ab56e11a714d9a6#diff-0b9b4e122eb666155aa189a4321a6ca8 |  
spark.mesos.coarse | 0.6.0 | None | 63051dd2bcc4bf09d413ff7cf89a37967edc33ba#diff-eaf125f56ce786d64dcef99cf446a751 |  
spark.mesos.coarse.shutdownTimeout | 2.0.0 | SPARK-12330 | c756bda477f458ba4aad7fdb2026263507e0ad9b#diff-d425d35aa23c47a62fbb538554f2f2cf |  
spark.mesos.maxDrivers | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.retainedDrivers | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.cluster.retry.wait.max | 1.4.0 | SPARK-5338 | 53befacced828bbac53c6e3a4976ec3f036bae9e#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.fetcherCache.enable | 2.1.0 | SPARK-15994 | e34b4e12673fb76c92f661d7c03527410857a0f8#diff-772ea7311566edb25f11a4c4f882179a |  
spark.mesos.appJar.local.resolution.mode | 2.4.0 | SPARK-24326 | 22df953f6bb191858053eafbabaa5b3ebca29f56#diff-6e4d0a0445975f03f975fdc1e3d80e49 |  
spark.mesos.rejectOfferDuration | 2.2.0 | SPARK-19702 | 2e30c0b9bcaa6f7757bd85d1f1ec392d5f916f83#diff-daf48dabbe58afaeed8787751750b01d |  
spark.mesos.rejectOfferDurationForUnmetConstraints | 1.6.0 | SPARK-10471 | 74f50275e429e649212928a9f36552941b862edc#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.rejectOfferDurationForReachedMaxCores | 2.0.0 | SPARK-13001 | 1e7d9bfb5a41f5c2479ab3b4d4081f00bf00bd31#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
spark.mesos.uris | 1.5.0 | SPARK-8798 | a2f805729b401c68b60bd690ad02533b8db57b58#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.executor.home | 1.1.1 | SPARK-3264 | 069ecfef02c4af69fc0d3755bd78be321b68b01d#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.mesosExecutor.cores | 1.4.0 | SPARK-6350 | 6fbeb82e13db7117d8f216e6148632490a4bc5be#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.extra.cores | 0.6.0 | None | 2d761e3353651049f6707c74bb5ffdd6e86f6f35#diff-37af8c6e3634f97410ade813a5172621 |  
spark.mesos.executor.memoryOverhead | 1.1.1 | SPARK-3535 | 6f150978477830bbc14ba983786dd2bce12d1fe2#diff-6b498f5407d10e848acac4a1b182457c |  
spark.mesos.executor.docker.image | 1.4.0 | SPARK-2691 | 8f50a07d2188ccc5315d979755188b1e5d5b5471#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.executor.docker.forcePullImage | 2.1.0 | SPARK-15271 | 978cd5f125eb5a410bad2e60bf8385b11cf1b978#diff-0dd025320c7ecda2ea310ed7172d7f5a |  
spark.mesos.executor.docker.portmaps | 1.4.0 | SPARK-7373 | 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.executor.docker.parameters | 2.2.0 | SPARK-19740 | a888fed3099e84c2cf45e9419f684a3658ada19d#diff-4139e6605a8c7f242f65cde538770c99 |  
spark.mesos.executor.docker.volumes | 1.4.0 | SPARK-7373 | 226033cfffa2f37ebaf8bc2c653f094e91ef0c9b#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.gpus.max | 2.1.0 | SPARK-14082 | 29f186bfdf929b1e8ffd8e33ee37b76d5dc5af53#diff-d427ee890b913c5a7056be21eb4f39d7 |  
spark.mesos.task.labels | 2.2.0 | SPARK-20085 | c8fc1f3badf61bcfc4bd8eeeb61f73078ca068d1#diff-387c5d0c916278495fc28420571adf9e |  
spark.mesos.constraints | 1.5.0 | SPARK-6707 | 1165b17d24cdf1dbebb2faca14308dfe5c2a652c#diff-e3a5e67b8de2069ce99801372e214b8e |  
spark.mesos.containerizer | 2.1.0 | SPARK-16637 | 266b92faffb66af24d8ed2725beb80770a2d91f8#diff-0dd025320c7ecda2ea310ed7172d7f5a |  
spark.mesos.role | 1.5.0 | SPARK-6284 | d86bbb4e286f16f77ba125452b07827684eafeed#diff-02a6d899f7a529eb7cfbb12182a110b0 |  
The following appears in the document |   |   |   |  
spark.mesos.driverEnv.[EnvironmentVariableName] | 2.1.0 | SPARK-16194 | 235cb256d06653bcde4c3ed6b081503a94996321#diff-b964c449b99c51f0a5fd77270b2951a4 |  
spark.mesos.dispatcher.driverDefault.[PropertyName] | 2.1.0 | SPARK-16927 and SPARK-16923 | eca58755fbbc11937b335ad953a3caff89b818e6#diff-b964c449b99c51f0a5fd77270b2951a4 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27863 from beliefer/add-version-to-mesos-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 11:02:29 +09:00
beliefer 1254c88034 [SPARK-31118][K8S][DOC] Add version information to the configuration of K8S
### What changes were proposed in this pull request?
Add version information to the configuration of `K8S`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.kubernetes.context | 3.0.0 | SPARK-25887 | c542c247bbfe1214c0bf81076451718a9e8931dc#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.master | 3.0.0 | SPARK-30371 | f14061c6a4729ad419902193aa23575d8f17f597#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.namespace | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image | 2.3.0 | SPARK-22994 | b94debd2b01b87ef1d2a34d48877e38ade0969e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.container.image | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullPolicy | 2.3.0 | SPARK-22807 | fb3636b482be3d0940345b1528c1d5090bbc25e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.container.image.pullSecrets | 2.4.0 | SPARK-23668 | cccaaa14ad775fb981e501452ba2cc06ff5c0f0a#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.requestTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.connectionTimeout | 3.0.0 | SPARK-27023 | e9e8bb33ef9ad785473ded168bc85867dad4ee70#diff-6e882d5561424e7e6651eb46f10104b8 |  
KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.driver
KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX.serviceAccountName | 3.1.0 | SPARK-30122 | f9f06eee9853ad4b6458ac9d31233e729a1ca226#diff-6e882d5561424e7e6651eb46f10104b8 | spark.kubernetes.authenticate.executor
spark.kubernetes.driver.limit.cores | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.request.cores | 3.0.0 | SPARK-27754 | 1a8c09334db87b0e938c38cd6b59d326bdcab3c3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submitInDriver | 2.4.0 | SPARK-22839 | f15906da153f139b698e192ec6f82f078f896f1e#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.limit.cores | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.scheduler.name | 3.0.0 | SPARK-29436 | f800fa383131559c4e841bf062c9775d09190935#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.request.cores | 2.4.0 | SPARK-23285 | fe2b7a4568d65a62da6e6eb00fff05f248b4332c#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.pod.name | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.resourceNamePrefix | 3.0.0 | SPARK-25876 | 6be272b75b4ae3149869e19df193675cc4117763#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podNamePrefix | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.size | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.allocation.batch.delay | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.lostCheck.maxAttempts | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.submission.waitAppCompletion | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.report.interval | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.apiPollingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.eventProcessingInterval | 2.4.0 | SPARK-24248 | 270a9a3cac25f3e799460320d0fc94ccd7ecfaea#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.memoryOverheadFactor | 2.4.0 | SPARK-23984 | 1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.pyspark.pythonVersion | 2.4.0 | SPARK-23984 | a791c29bd824adadfb2d85594bc8dad4424df936#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.path | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.krb5.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.hadoop.configMapName | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.name | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.kerberos.tokenSecret.itemKey | 3.0.0 | SPARK-23257 | 6c9c84ffb9c8d98ee2ece7ba4b010856591d383d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.resource.type | 2.4.1 | SPARK-25021 | 9031c784847353051bc0978f63ef4146ae9095ff#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.local.dirs.tmpfs | 3.0.0 | SPARK-25262 | da6fa3828bb824b65f50122a8a0a0d4741551257#diff-6e882d5561424e7e6651eb46f10104b8 | It exists in branch-3.0, but in pom.xml it is 2.4.0-snapshot
spark.kubernetes.driver.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateFile | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.podTemplateContainerName | 3.0.0 | SPARK-24434 | f6cc354d83c2c9a757f9b507aadd4dbdc5825cca#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.deleteOnTermination | 3.0.0 | SPARK-25515 | 0c2935b01def8a5f631851999d9c2d57b63763e6#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.dynamicAllocation.deleteGracePeriod | 3.0.0 | SPARK-28487 | 0343854f54b48b206ca434accec99355011560c2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.appKillPodDeletionGracePeriod | 3.0.0 | SPARK-24793 | 05168e725d2a17c4164ee5f9aa068801ec2454f4#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.file.upload.path | 3.0.0 | SPARK-23153 | 5e74570c8f5e7dfc1ca1c53c177827c5cea57bf1#diff-6e882d5561424e7e6651eb46f10104b8 |  
The following appears in the document |   |   |   |  
spark.kubernetes.authenticate.submission.caCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientKeyFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.clientCertFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthToken | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.submission.oauthTokenFile | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthToken | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.caCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientKeyFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.clientCertFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.driver.mounted.oauthTokenFile | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.caCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientKeyFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.clientCertFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthToken | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.authenticate.oauthTokenFile | 2.4.0 | SPARK-23146 | 571a6f0574e50e53cea403624ec3795cd03aa204#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.label.[LabelName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.annotation.[AnnotationName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.node.selector.[labelKey] | 2.3.0 | SPARK-18278 | e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driverEnv.[EnvironmentVariableName] | 2.3.0 | SPARK-22646 | 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secrets.[SecretName] | 2.3.0 | SPARK-22757 | 171f6ddadc6185ffcc6ad82e5f48952fb49095b2#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.secretKeyRef.[EnvName] | 2.4.0 | SPARK-24232 | 21e1fc7d4aed688d7b685be6ce93f76752159c98#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.path | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath | 3.0.0 | SPARK-25960 | 3df307aa515b3564686e75d1b71754bbcaaf2dec#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.readOnly | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-6e882d5561424e7e6651eb46f10104b8 |  
spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].options.[OptionName] | 2.4.0 | SPARK-23529 | 5ff1b9ba1983d5601add62aef64a3e87d07050eb#diff-b5527f236b253e0d9f5db5164bdb43e9 |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'

### How was this patch tested?
Exists UT

Closes #27875 from beliefer/add-version-to-k8s-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:54:08 +09:00
beliefer 0722dc5fb8 [SPARK-31092][YARN][DOC] Add version information to the configuration of Yarn
### What changes were proposed in this pull request?
Add version information to the configuration of `Yarn`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.yarn.tags | 1.5.0 | SPARK-9782 | 9b731fad2b43ca18f3c5274062d4c7bc2622ab72#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.priority | 3.0.0 | SPARK-29603 | 4615769736f4c052ae1a2de26e715e229154cd2f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.attemptFailuresValidityInterval | 1.6.0 | SPARK-10739 | f97e9323b526b3d0b0fee0ca03f4276f37bb5750#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.executor.failuresValidityInterval | 2.0.0 | SPARK-6735 | 8b44bd52fa40c0fc7d34798c3654e31533fd3008#diff-14b8ed2ef4e3da985300b8d796a38fa9 |
spark.yarn.maxAppAttempts | 1.3.0 | SPARK-2165 | 8fdd48959c93b9cf809f03549e2ae6c4687d1fcd#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.user.classpath.first | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.gatewayPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.replacementPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.queue | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.historyServer.address | 1.0.0 | SPARK-1408 | 0058b5d2c74147d24b127a5432f89ebc7050dc18#diff-923ae58523a12397f74dd590744b8b41 |  
spark.yarn.historyServer.allowTracking | 2.2.0 | SPARK-19554 | 4661d30b988bf773ab45a15b143efb2908d33743#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.archive | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.jars | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.dist.archives | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.files | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.jars | 2.0.0 | SPARK-12343 | 8ba2b7f28fee39c4839e5ea125bd25f5091a3a1e#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.preserve.staging.files | 1.1.0 | SPARK-2933 | b92d823ad13f6fcc325eeb99563bea543871c6aa#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.submit.file.replication | 0.8.1 | None | 4668fcb9ff8f9c176c4866480d52dde5d67c8522#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.submit.waitAppCompletion | 1.4.0 | SPARK-3591 | b65bad65c3500475b974ca0219f218eef296db2c#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.report.interval | 0.9.0 | None | ebdfa6bb9766209bc5a3c4241fa47141c5e9c5cb#diff-e0a7ae95b6d8e04a67ebca0945d27b65 |  
spark.yarn.clientLaunchMonitorInterval | 2.3.0 | SPARK-16019 | 1cad31f00644d899d8e74d58c6eb4e9f72065473#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.waitTime | 1.3.0 | SPARK-3779 | 253b72b56fe908bbab5d621eae8a5f359c639dfd#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.metrics.namespace | 2.4.0 | SPARK-24594 | d2436a85294a178398525c37833dae79d45c1452#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.nodeLabelExpression | 1.6.0 | SPARK-7173 | 7db3610327d0725ec2ad378bc873b127a59bb87a#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.containerLauncherMaxThreads | 1.2.0 | SPARK-1713 | 1f4a648d4e30e837d6cf3ea8de1808e2254ad70b#diff-801a04f9e67321f3203399f7f59234c1 |  
spark.yarn.max.executor.failures | 1.0.0 | SPARK-1183 | 698373211ef3cdf841c82d48168cd5dbe00a57b4#diff-0c239e58b37779967e0841fb42f3415a |  
spark.yarn.scheduler.reporterThread.maxFailures | 1.2.0 | SPARK-3304 | 11c10df825419372df61a8d23c51e8c3cc78047f#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.scheduler.heartbeat.interval-ms | 0.8.1 | None | ee22be0e6c302fb2cdb24f83365c2b8a43a1baab#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.scheduler.initial-allocation.interval | 1.4.0 | SPARK-7533 | 3ddf051ee7256f642f8a17768d161c7b5f55c7e1#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.am.finalMessageLimit | 2.4.0 | SPARK-25174 | f8346d2fc01f1e881e4e3f9c4499bf5f9e3ceb3f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.cores | 1.3.0 | SPARK-1507 | 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.extraJavaOptions | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.extraLibraryPath | 1.4.0 | SPARK-7281 | 7b5dd3e3c0030087eea5a8224789352c03717c1d#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.memoryOverhead | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.memory | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.driver.appUIAddress | 1.1.0 | SPARK-1291 | 72ea56da8e383c61c6f18eeefef03b9af00f5158#diff-2b4617e158e9c5999733759550440b96 |  
spark.yarn.executor.nodeLabelExpression | 1.4.0 | SPARK-6470 | 82fee9d9aad2c9ba2fb4bd658579fe99218cafac#diff-d4620cf162e045960d84c88b2e0aa428 |  
spark.yarn.unmanagedAM.enabled | 3.0.0 | SPARK-22404 | f06bc0cd1dee2a58e04ebf24bf719a2f7ef2dc4e#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.rolledLog.includePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.rolledLog.excludePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.user.jar | 1.1.0 | SPARK-1395 | e380767de344fd6898429de43da592658fd86a39#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.secondary.jars | 0.9.2 | SPARK-1870 | 1d3aab96120c6770399e78a72b5692cf8f61a144#diff-50b743cff4885220c828b16c44eeecfd |  
spark.yarn.cache.filenames | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.sizes | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.timestamps | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.visibilities | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.types | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.confArchive | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.blacklist.executor.launch.blacklisting.enabled | 2.4.0 | SPARK-16630 | b56e9c613fb345472da3db1a567ee129621f6bf3#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.exclude.nodes | 3.0.0 | SPARK-26688 | caceaec93203edaea1d521b88e82ef67094cdea9#diff-4804e0f83ca7f891183eb0db229b4b9a |  
The following appears in the document |   |   |   |  
spark.yarn.am.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.driver.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.executor.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.appMasterEnv.[EnvironmentVariableName] | 1.1.0 | SPARK-1680 | 7b798e10e214cd407d3399e2cab9e3789f9a929e#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.kerberos.relogin.period | 2.3.0 | SPARK-22290 | dc2714da50ecba1bf1fdf555a82a4314f763a76e#diff-4804e0f83ca7f891183eb0db229b4b9a |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27856 from beliefer/add-version-to-yarn-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:52:57 +09:00
Holden Karau 2825237448 [SPARK-31062][K8S][TESTS] Improve spark decommissioning k8s test reliability
### What changes were proposed in this pull request?

Replace a sleep with waiting for the first collect to happen to try and make the K8s test code more reliable.

### Why are the changes needed?

Currently the Decommissioning test appears to be flaky in Jenkins.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Ran K8s test suite in a loop on minikube on my desktop for 10 iterations without this test failing on any of the runs.

Closes #27858 from holdenk/SPARK-31062-Improve-Spark-Decommissioning-K8s-test-teliability.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-03-11 14:42:31 -07:00
Thomas Graves 0e2ca11d80 [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling
### What changes were proposed in this pull request?

Yarn side changes for Stage level scheduling.  The previous PR for dynamic allocation changes was https://github.com/apache/spark/pull/27313

Modified the data structures to store things on a per ResourceProfile basis.
 I tried to keep the code changes to a minimum, the main loop that requests just goes through each Resourceprofile and the logic inside for each one stayed very close to the same.
On submission we now have to give each ResourceProfile a separate yarn Priority because yarn doesn't support asking for containers with different resources at the same Priority. We just use the profile id as the priority level.
Using a different Priority actually makes things easier when the containers come back to match them again which ResourceProfile they were requested for.
The expectation is that yarn will only give you a container with resource amounts you requested or more. It should never give you a container if it doesn't satisfy your resource requests.

If you want to see the full feature changes you can look at https://github.com/apache/spark/pull/27053/files for reference

### Why are the changes needed?

For stage level scheduling YARN support.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Tested manually on YARN cluster and then unit tests.

Closes #27583 from tgravescs/SPARK-29149.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-28 15:23:33 -06:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Holden Karau d273a2bb0f [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
This PR is based on an existing/previou PR - https://github.com/apache/spark/pull/19045

### What changes were proposed in this pull request?

This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache.

There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing

### Why are the changes needed?

With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required.

### Does this PR introduce any user-facing change?

There is no API change, however an additional configuration flag is added to enable/disable this behaviour.

### How was this patch tested?

New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s.

Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
2020-02-14 12:36:52 -08:00
Dongjoon Hyun 74cd46eb69 [SPARK-30816][K8S][TESTS] Fix dev-run-integration-tests.sh to ignore empty params
### What changes were proposed in this pull request?

This PR aims to fix `dev-run-integration-tests.sh` to ignore empty params correctly.

### Why are the changes needed?

The following script runs `mvn` integration test like the following.
```
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
...
build/mvn integration-test
-f /Users/dongjoon/APACHE/spark/pom.xml
-pl resource-managers/kubernetes/integration-tests
-am
-Pscala-2.12
-Pkubernetes
-Pkubernetes-integration-tests
-Djava.version=8
-Dspark.kubernetes.test.sparkTgz=N/A
-Dspark.kubernetes.test.imageTag=N/A
-Dspark.kubernetes.test.imageRepo=docker.io/kubespark
-Dspark.kubernetes.test.deployMode=minikube
-Dtest.include.tags=k8s
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
-Dspark.kubernetes.test.jvmImage=spark
-Dspark.kubernetes.test.pythonImage=spark-py
-Dspark.kubernetes.test.rImage=spark-r
```

After this PR, the empty parameters like the followings will be skipped like the original design.
```
-Dspark.kubernetes.test.namespace=
-Dspark.kubernetes.test.serviceAccountName=
-Dspark.kubernetes.test.kubeConfigContext=
-Dspark.kubernetes.test.master=
-Dtest.exclude.tags=
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins K8S integration test.

Closes #27566 from dongjoon-hyun/SPARK-30816.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:42:00 -08:00
Dongjoon Hyun 859699135c [SPARK-30807][K8S][TESTS] Support Java 11 in K8S integration tests
### What changes were proposed in this pull request?

This PR aims to support JDK11 test in K8S integration tests.
- This is an update in testing framework instead of individual tests.
- This will enable JDK11 runtime test when you didn't installed JDK11 on your local system.

### Why are the changes needed?

Apache Spark 3.0.0 adds JDK11 support, but K8s integration tests use JDK8 until now.

### Does this PR introduce any user-facing change?

No. This is a dev-only test-related PR.

### How was this patch tested?

This is irrelevant to Jenkins UT, but Jenkins K8S IT (JDK8) should pass.
- https://github.com/apache/spark/pull/27559#issuecomment-585903489 (JDK8 Passed)

And, manually do the following for JDK11 test.
```
$ NO_MANUAL=1 ./dev/make-distribution.sh --r --pip --tgz -Phadoop-3.2 -Pkubernetes
$ resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --java-image-tag 11-jre-slim --spark-tgz $PWD/spark-*.tgz
```

```
$ docker run -it --rm kubespark/spark:1318DD8A-2B15-4A00-BC69-D0E90CED235B /usr/local/openjdk-11/bin/java --version | tail -n1
OpenJDK 64-Bit Server VM 18.9 (build 11.0.6+10, mixed mode)
```

Closes #27559 from dongjoon-hyun/SPARK-30807.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-13 11:17:27 -08:00
Thomas Graves 496f6ac860 [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes
### What changes were proposed in this pull request?

This is another PR for stage level scheduling. In particular this adds changes to the dynamic allocation manager and the scheduler backend to be able to track what executors are needed per ResourceProfile.  Note the api is still private to Spark until the entire feature gets in, so this functionality will be there but only usable by tests for profiles other then the DefaultProfile.

The main changes here are simply tracking things on a ResourceProfile basis as well as sending the executor requests to the scheduler backend for all ResourceProfiles.

I introduce a ResourceProfileManager in this PR that will track all the actual ResourceProfile objects so that we can keep them all in a single place and just pass around and use in datastructures the resource profile id. The resource profile id can be used with the ResourceProfileManager to get the actual ResourceProfile contents.

There are various places in the code that use executor "slots" for things.  The ResourceProfile adds functionality to keep that calculation in it.   This logic is more complex then it should due to standalone mode and mesos coarse grained not setting the executor cores config. They default to all cores on the worker, so calculating slots is harder there.
This PR keeps the functionality to make the cores the limiting resource because the scheduler still uses that for "slots" for a few things.

This PR does also add the resource profile id to the Stage and stage info classes to be able to test things easier.   That full set of changes will come with the scheduler PR that will be after this one.

The PR stops at the scheduler backend pieces for the cluster manager and the real YARN support hasn't been added in this PR, that again will be in a separate PR, so this has a few of the API changes up to the cluster manager and then just uses the default profile requests to continue.

The code for the entire feature is here for reference: https://github.com/apache/spark/pull/27053/files although it needs to be upmerged again as well.

### Why are the changes needed?

Needed for stage level scheduling feature.

### Does this PR introduce any user-facing change?

No user facing api changes added here.

### How was this patch tested?

Lots of unit tests and manually testing. I tested on yarn, k8s, standalone, local modes. Ran both failure and success cases.

Closes #27313 from tgravescs/SPARK-29148.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-12 16:45:42 -06:00
Dongjoon Hyun 9d907bc84d [SPARK-30743][K8S][TESTS] Use JRE instead of JDK in K8S test docker image
### What changes were proposed in this pull request?

This PR aims to replace JDK to JRE in K8S integration test docker images.

### Why are the changes needed?

This will save some resources and make it sure that we only need JRE at runtime and testing.
- https://lists.apache.org/thread.html/3145150b711d7806a86bcd3ab43e18bcd0e4892ab5f11600689ba087%40%3Cdev.spark.apache.org%3E

### Does this PR introduce any user-facing change?

No. This is a dev-only test environment.

### How was this patch tested?

Pass the Jenkins K8s Integration Test.
- https://github.com/apache/spark/pull/27469#issuecomment-582681125

Closes #27469 from dongjoon-hyun/SPARK-30743.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 16:55:45 -08:00
yudovin f9f06eee98 [SPARK-30122][K8S] Support spark.kubernetes.authenticate.executor.serviceAccountName
### What changes were proposed in this pull request?

Currently, it doesn't seem to be possible to have Spark Driver set the serviceAccountName for executor pods it launches.

### Why are the changes needed?

it will allow settings serviceAccountName for executors pods.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

It was covered by unit test.

Closes #27034 from ayudovin/srevice-account-name-for-executor-pods.

Authored-by: yudovin <artsiom.yudovin@profitero.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 14:16:59 -08:00
Dongjoon Hyun 9d90c8b898 [SPARK-30738][K8S] Use specific image version in "Launcher client dependencies" test
### What changes were proposed in this pull request?

This PR use a specific version of docker image instead of `latest`. As of today, when I run K8s integration test locally, this test case fails always.

Also, in this PR, I shows two consecutive failures with a dummy change.
- https://github.com/apache/spark/pull/27465#issuecomment-582326614
- https://github.com/apache/spark/pull/27465#issuecomment-582329114
```
- Launcher client dependencies *** FAILED ***
```

After that, I added the patch and K8s Integration test passed.
- https://github.com/apache/spark/pull/27465#issuecomment-582361696

### Why are the changes needed?

[SPARK-28465](https://github.com/apache/spark/pull/25222) switched from `v4.0.0-stable-4.0-master-centos-7-x86_64` to `latest` to catch up the API change. However, the API change seems to occur again. We had better use a specific version to prevent accidental failures.

```scala
- .withImage("ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64")
+ .withImage("ceph/daemon:latest")
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass `Launcher client dependencies` test in Jenkins K8s Integration Suite.
Or, run K8s Integration test locally.

Closes #27465 from dongjoon-hyun/SPARK-K8S-IT.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 11:01:53 -08:00
Onur Satici 86fdb818bf [SPARK-30715][K8S] Bump fabric8 to 4.7.1
### What changes were proposed in this pull request?
Bump fabric8 kubernetes-client to 4.7.1

### Why are the changes needed?
New fabric8 version brings support for Kubernetes 1.17 clusters.
Full release notes:
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.0
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.1

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing unit and integration tests cover creation of K8S objects. Adjusted them to work with the new fabric8 version

Closes #27443 from onursatici/os/bump-fabric8.

Authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-05 01:17:30 -08:00
Thomas Graves 878094f972 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
### What changes were proposed in this pull request?

This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment.
This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource.

I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs.

I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment.

The user would configured this like:
--conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin"

Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission.

Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources.

### Why are the changes needed?

to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments.

### Does this PR introduce any user-facing change?

Yes a plugin api

### How was this patch tested?

Unit tests added and manual testing done on yarn and standalone modes.

Closes #27410 from tgravescs/hadoop27spark3.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 22:20:28 -06:00
Thomas Graves 3d2b8d8b13 [SPARK-30638][CORE] Add resources allocated to PluginContext
### What changes were proposed in this pull request?

Add the allocated resources to parameters to the PluginContext so that any plugins in driver or executor could use this information to initialize devices or use this information in a useful manner.

### Why are the changes needed?

To allow users to initialize/track devices once at the executor level before each task runs to use them.

### Does this PR introduce any user-facing change?

Yes to the people using the Executor/Driver plugin interface.

### How was this patch tested?

Unit tests and manually by writing a plugin that initialized GPU's using this interface.

Closes #27367 from tgravescs/pluginWithResources.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 08:25:32 -06:00
Jiaxin Shan f86a1b9590 [SPARK-30626][K8S] Add SPARK_APPLICATION_ID into driver pod env
### What changes were proposed in this pull request?
Add SPARK_APPLICATION_ID environment when spark configure driver pod.

### Why are the changes needed?
Currently, driver doesn't have this in environments and it's no convenient to retrieve spark id.
The use case is we want to look up spark application id and create application folder and redirect driver logs to application folder.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
unit tested. I also build new distribution and container image to kick off a job in Kubernetes and I do see SPARK_APPLICATION_ID added there. .

Closes #27347 from Jeffwan/SPARK-30626.

Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-24 12:00:30 -08:00
xushiwei 00425595 f14061c6a4 [SPARK-30371][K8S] Add spark.kubernetes.driver.master conf
### What changes were proposed in this pull request?

make KUBERNETES_MASTER_INTERNAL_URL configurable

### Why are the changes needed?

we do not always use the default port number 443 to access our kube-apiserver, and even in some mulit-tenant cluster,  people do not use the service `kubernetes.default.svc` to access the kube-apiserver, so make the internal master configurable is necessary。

### Does this PR introduce any user-facing change?

user can configure the internal master url by
```
--conf spark.kubernetes.internal.master=https://kubernetes.default.svc:6443
```

### How was this patch tested?

run in multi-cluster that do not use the https://kubernetes.default.svc to access the kube-apiserver

Closes #27029 from wackxu/internalmaster.

Authored-by: xushiwei 00425595 <xushiwei5@huawei.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-01-19 14:14:45 -08:00
Thomas Graves 6dbfa2bb9c [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with
### What changes were proposed in this pull request?

This is the second PR for the Stage Level Scheduling. This is adding in the necessary executor side changes:
1) executors to know what ResourceProfile they should be using
2) handle parsing the resource profile settings - these are not in the global configs
3) then reporting back to the driver what resource profile it was started with.

This PR adds all the piping for YARN to pass the information all the way to executors, but it just uses the default ResourceProfile (which is the global applicatino level configs).

At a high level these changes include:
1) adding a new --resourceProfileId option to the CoarseGrainedExecutorBackend
2) Add the ResourceProfile settings to new internal confs that gets passed into the Executor
3) Executor changes that use the resource profile id passed in to read the corresponding ResourceProfile confs and then parse those requests and discover resources as necessary
4) Executor registers to Driver with the Resource profile id so that the ExecutorMonitor can track how many executor with each profile are running
5) YARN side changes to show that passing the resource profile id and confs actually works. Just uses the DefaultResourceProfile for now.

I also removed a check from the CoarseGrainedExecutorBackend that used to check to make sure there were task requirements before parsing any custom resource executor requests.  With the resource profiles this becomes much more expensive because we would then have to pass the task requests to each executor and the check was just a short cut and not really needed. It was much cleaner just to remove it.

Note there were some changes to the ResourceProfile, ExecutorResourceRequests, and TaskResourceRequests in this PR as well because I discovered some issues with things not being immutable. That api now look like:

val rpBuilder = new ResourceProfileBuilder()
val ereq = new ExecutorResourceRequests()
val treq = new TaskResourceRequests()

ereq.cores(2).memory("6g").memoryOverhead("2g").pysparkMemory("2g").resource("gpu", 2, "/home/tgraves/getGpus")
treq.cpus(2).resource("gpu", 2)

val resourceProfile = rpBuilder.require(ereq).require(treq).build

This makes is so that ResourceProfile is immutable and Spark can use it directly without worrying about the user changing it.

### Why are the changes needed?

These changes are needed for the executor to report which ResourceProfile they are using so that ultimately the dynamic allocation manager can use that information to know how many with a profile are running and how many more it needs to request.  Its also needed to get the resource profile confs to the executor so that it can run the appropriate discovery script if needed.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests and manually on YARN.

Closes #26682 from tgravescs/SPARK-29306.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-17 08:15:25 -06:00
Marcelo Vanzin dca838058f [SPARK-29950][K8S] Blacklist deleted executors in K8S with dynamic allocation
The issue here is that when Spark is downscaling the application and deletes
a few pod requests that aren't needed anymore, it may actually race with the
K8S scheduler, who may be bringing up those executors. So they may have enough
time to connect back to the driver, register, to just be deleted soon after.
This wastes resources and causes misleading entries in the driver log.

The change (ab)uses the blacklisting mechanism to consider the deleted excess
pods as blacklisted, so that if they try to connect back, the driver will deny
it.

It also changes the executor registration slightly, since even with the above
change there were misleading logs. That was because the executor registration
message was an RPC that always succeeded (bar network issues), so the executor
would always try to send an unregistration message to the driver, which would
then log several messages about not knowing anything about the executor. The
change makes the registration RPC succeed or fail directly, instead of using
the separate failure message that would lead to this issue.

Note the last change required some changes in a standalone test suite related
to dynamic allocation, since it relied on the driver not throwing exceptions
when a duplicate executor registration happened.

Tested with existing unit tests, and with live cluster with dyn alloc on.

Closes #26586 from vanzin/SPARK-29950.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2020-01-16 13:37:11 -08:00
yi.wu 4a093176ea [SPARK-30359][CORE] Don't clear executorsPendingToRemove at the beginning of CoarseGrainedSchedulerBackend.reset
### What changes were proposed in this pull request?

Remove `executorsPendingToRemove.clear()` from `CoarseGrainedSchedulerBackend.reset()`.

### Why are the changes needed?

Clear `executorsPendingToRemove` before remove executors will cause all tasks running on those "pending to remove" executors to count failures. But that's not true for the case of `executorsPendingToRemove(execId)=true`.

Besides, `executorsPendingToRemove` will be cleaned up within `removeExecutor()` at the end just as same as `executorsPendingLossReason`.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added a new test in `TaskSetManagerSuite`.

Closes #27017 from Ngone51/dont-clear-eptr-in-reset.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-03 22:54:05 +08:00
Jobit Mathew 1b0570c6af [SPARK-30387] Improving stop hook log message
### What changes were proposed in this pull request?

ShutdownHook of YarnClientSchedulerBackend prints just "Stopped" which can be improved to "YarnClientSchedulerBackend Stopped" for better understanding.

### Why are the changes needed?

While stopping or gracefully exiting the spark-shell/spark-sql --master yarn, only printing `stopped` is useless.
### Does this PR introduce any user-facing change?

Yes. Log info message change.

### How was this patch tested?

Manually

Closes #27049 from jobitmathew/imp_stop_message.

Authored-by: Jobit Mathew <jobit.mathew@huawei.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-02 14:48:36 -06:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
Shahin Shakeri b573f23ed1 [SPARK-29574][K8S] Add SPARK_DIST_CLASSPATH to the executor class path
### What changes were proposed in this pull request?
Include `$SPARK_DIST_CLASSPATH` in class path when launching `CoarseGrainedExecutorBackend` on Kubernetes executors using the provided `entrypoint.sh`

### Why are the changes needed?
For user provided Hadoop, `$SPARK_DIST_CLASSPATH` contains the required jars.

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to  `-cp ` param of entrypoint.sh enables launching the executors correctly.

Closes #26493 from sshakeri/master.

Authored-by: Shahin Shakeri <shahin.shakeri@pwc.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-12-16 10:11:50 -08:00
Dongjoon Hyun cc276f8a6e [SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4
### What changes were proposed in this pull request?

This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with K8s integration test.

Closes #26874 from dongjoon-hyun/SPARK-30243.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-13 08:25:51 -08:00
Ilan Filonenko 708cf16be9 [SPARK-30111][K8S] Apt-get update to fix debian issues
### What changes were proposed in this pull request?
Added apt-get update as per [docker best-practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get)

### Why are the changes needed?
Builder is failing because:
Without doing apt-get update, the APT lists get outdated and begins referring to package versions that no longer exist, hence the 404 trying to download them (Debian does not keep old versions in the archive when a package is updated).

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
k8s builder

Closes #26753 from ifilonenko/SPARK-30111.

Authored-by: Ilan Filonenko <ifilonenko@bloomberg.net>
Signed-off-by: shane knapp <incomplete@gmail.com>
2019-12-03 17:59:02 -08:00
Sean Owen 1febd373ea [MINOR][TESTS] Replace JVM assert with JUnit Assert in tests
### What changes were proposed in this pull request?

Use JUnit assertions in tests uniformly, not JVM assert() statements.

### Why are the changes needed?

assert() statements do not produce as useful errors when they fail, and, if they were somehow disabled, would fail to test anything.

### Does this PR introduce any user-facing change?

No. The assertion logic should be identical.

### How was this patch tested?

Existing tests.

Closes #26581 from srowen/assertToJUnit.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-20 14:04:15 -06:00
ulysses c0507e0f75 [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars
### What changes were proposed in this pull request?

When set `spark.yarn.jars=/xxx/xxx` which is just a no schema path, spark will throw a NullPointerException.

The reason is hdfs will return null if pathFs.globStatus(path) is not exist, and spark just use `pathFs.globStatus(path).filter(_.isFile())` without check it.

### Why are the changes needed?

Avoid NullPointerException.

### Does this PR introduce any user-facing change?

Yes. User will get a FileNotFoundException instead NullPointerException when `spark.yarn.jars` does not have schema and not exists.

### How was this patch tested?

Add UT.

Closes #26462 from ulysses-you/check-yarn-jars-path-exist.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-15 16:17:24 -08:00
Marcelo Vanzin b095232f63 [SPARK-29865][K8S] Ensure client-mode executors have same name prefix
This basically does what BasicDriverFeatureStep already does to achieve the
same thing in cluster mode; but since that class (or any other feature) is
not invoked in client mode, it needs to be done elsewhere.

I also modified the client mode integration test to check the executor name
prefix; while there I had to fix the minikube backend to parse the output
from newer minikube versions (I have 1.5.2).

Closes #26488 from vanzin/SPARK-29865.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Erik Erlandson <eerlands@redhat.com>
2019-11-14 15:52:39 -07:00
Nishchal Venkataramana 833a9f12e2 [SPARK-24203][CORE] Make executor's bindAddress configurable
### What changes were proposed in this pull request?
With this change, executor's bindAddress is passed as an input parameter for RPCEnv.create.
A previous PR https://github.com/apache/spark/pull/21261 which addressed the same, was using a Spark Conf property to get the bindAddress which wouldn't have worked for multiple executors.
This PR is to enable anyone overriding CoarseGrainedExecutorBackend with their custom one to be able to invoke CoarseGrainedExecutorBackend.main() along with the option to configure bindAddress.

### Why are the changes needed?
This is required when Kernel-based Virtual Machine (KVM)'s are used inside Linux container where the hostname is not the same as container hostname.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by running jobs with executors on KVMs inside a linux container.

Closes #26331 from nishchalv/SPARK-29670.

Lead-authored-by: Nishchal Venkataramana <nishchal@apple.com>
Co-authored-by: nishchal <nishchal@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2019-11-13 22:01:48 +00:00
Kent Yao 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling
### What changes were proposed in this pull request?

Priority for YARN to define pending applications ordering policy, those with higher priority have a better opportunity to be activated. YARN CapacityScheduler only.

### Why are the changes needed?

Ordering pending spark apps
### Does this PR introduce any user-facing change?

add a conf
### How was this patch tested?

add ut

Closes #26255 from yaooqinn/SPARK-29603.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-06 10:12:27 -08:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00