## What changes were proposed in this pull request?
Fix build warnings -- see some details below.
But mostly, remove use of postfix syntax where it causes warnings without the `scala.language.postfixOps` import. This is mostly in expressions like "120000 milliseconds". Which, I'd like to simplify to things like "2.minutes" anyway.
## How was this patch tested?
Existing tests.
Closes#24314 from srowen/SPARK-27404.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
When you submit a stage on a large cluster, rack resolving takes a long time when initializing TaskSetManager because a script is invoked to resolve the rack of each host, one by one.
Based on current implementation, it takes 30~40 seconds to resolve the racks in our 5000 nodes' cluster. After applied the patch, it decreased to less than 15 seconds.
YARN-9332 has added an interface to handle multiple hosts in one invocation to save time. But before upgrading to the newest Hadoop, we could construct the same tool in Spark to resolve this issue.
## How was this patch tested?
UT and manually testing on a 5000 node cluster.
Closes#24245 from squito/SPARK-13704_update.
Lead-authored-by: LantaoJin <jinlantao@gmail.com>
Co-authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
## What changes were proposed in this pull request?
Fix testing issues with `yarn` module in Hadoop-3:
1. Upgrade jersey-1 to `1.19` to fix ```Cause: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer```.
2. Copy `ServerSocketUtil` from hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java to fix ```java.lang.NoClassDefFoundError: org/apache/hadoop/net/ServerSocketUtil```.
3. Adapte `SessionHandler` from jetty-9.3.25.v20180904/jetty-server/src/main/java/org/eclipse/jetty/server/session/SessionHandler.java to fix ```java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager```.
## How was this patch tested?
manual tests:
```shell
build/sbt yarn/test -Pyarn
build/sbt yarn/test -Phadoop-3.2 -Pyarn
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn -Phadoop-3.2
```
Closes#24115 from wangyum/hadoop3-yarn.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Use Single Abstract Method syntax where possible (and minor related cleanup). Comments below. No logic should change here.
## How was this patch tested?
Existing tests.
Closes#24241 from srowen/SPARK-27323.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
```
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-kubernetes_2.12🫙3.0.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.mockito:mockito-core:jar -> duplicate declaration of version (?) org.apache.spark:spark-kubernetes_2.12:[unknown-version], /Users/yumwang/spark/resource-managers/kubernetes/core/pom.xml, line 98, column 17
```
This pr remove duplicate declaration of `mockito-core`.
## How was this patch tested?
N/A
Closes#24256 from wangyum/SPARK-24793-FOLLOW-UP.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
- Adds persistent volume integration tests
- Adds a custom tag to the test to exclude it if it is run against a cloud backend.
- Assumes default fs type for the host, AFAIK that is ext4.
## How was this patch tested?
Manually run the tests against minikube as usual:
```
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 192 milliseconds.
Run starting. Expected test count is: 16
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- Test PVs with local storage
```
Closes#23514 from skonto/pvctests.
Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
## What changes were proposed in this pull request?
Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below.
## How was this patch tested?
Existing tests.
Closes#23098 from srowen/SPARK-26132.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
There is some hardcode configs in code, I think it best to modify。
## How was this patch tested?
Existing tests
Closes#24103 from wangjiaochun/yarnHardCode.
Authored-by: 10087686 <wang.jiaochun@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Allow specifying system properties to customise the image names for the images used in the integration testing. Useful if your CI/CD pipeline or policy requires using a different naming format.
This is one part of addressing SPARK-26729, I plan to have a follow up patch that will also make the names configurable when using `docker-image-tool.sh`
## How was this patch tested?
Ran integration tests against custom images generated by our CI/CD pipeline that do not follow Spark's existing hardcoded naming conventions using the new system properties to override the image names appropriately:
```
mvn clean integration-test -pl :spark-kubernetes-integration-tests_${SCALA_VERSION} \
-Pkubernetes -Pkubernetes-integration-tests \
-P${SPARK_HADOOP_PROFILE} -Dhadoop.version=${HADOOP_VERSION} \
-Dspark.kubernetes.test.sparkTgz=${TARBALL} \
-Dspark.kubernetes.test.imageTag=${TAG} \
-Dspark.kubernetes.test.imageRepo=${REPO} \
-Dspark.kubernetes.test.namespace=${K8S_NAMESPACE} \
-Dspark.kubernetes.test.kubeConfigContext=${K8S_CONTEXT} \
-Dspark.kubernetes.test.deployMode=${K8S_TEST_DEPLOY_MODE} \
-Dspark.kubernetes.test.jvmImage=apache-spark \
-Dspark.kubernetes.test.pythonImage=apache-spark-py \
-Dspark.kubernetes.test.rImage=apache-spark-r \
-Dtest.include.tags=k8s
...
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 230 milliseconds.
Run starting. Expected test count is: 15
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
Run completed in 8 minutes, 33 seconds.
Total number of tests run: 15
Suites: completed 2, aborted 0
Tests: succeeded 15, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#23846 from rvesse/SPARK-26729.
Authored-by: Rob Vesse <rvesse@dotnetrdf.org>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
To avoid the case where the YARN libraries would swallow the exception and
prevent YarnAllocator from shutting down, call the offending code in a
separate thread, so that the parent thread can respond appropriately to
the shut down.
As a safeguard, also explicitly stop the executor launch thread pool when
shutting down the application, to prevent new executors from coming up
after the application started its shutdown.
Tested with unit tests + some internal tests on real cluster.
Closes#24017 from vanzin/SPARK-27094.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
while performing some tests on our existing minikube and k8s infrastructure, i noticed that the integration tests were failing. i dug in and discovered the following message buried at the end of the stacktrace:
```
Caused by: java.io.FileNotFoundException: /usr/lib/libnss3.so
at sun.security.pkcs11.Secmod.initialize(Secmod.java:193)
at sun.security.pkcs11.SunPKCS11.<init>(SunPKCS11.java:218)
... 81 more
```
after i added the `nss` package to `resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile`, everything worked.
this is also impacting current builds. see: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8959/console
## How was this patch tested?
i tested locally before pushing, and the build system will test the rest.
Closes#24111 from shaneknapp/add-nss-package-to-dockerfile.
Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
## What changes were proposed in this pull request?
When we run YarnSchedulerBackendSuite, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler
Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler
Refer discussion https://issues.apache.org/jira/browse/SPARK-27122?focusedCommentId=16792318&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16792318
Hence as a fix, org.apache.spark.ui.WebUI must only return a wrapper class instance or references so that Jetty classes can be avoided in getters which are accessed outside spark-core
## How was this patch tested?
Existing UT can pass
Closes#24088 from ajithme/shadebug.
Authored-by: Ajith <ajith2489@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Currently, when enabled streaming dynamic allocation for streaming applications, the maxNumExecutorFailures in ApplicationMaster is still computed with `spark.dynamicAllocation.maxExecutors`.
Actually, we should consider `spark.streaming.dynamicAllocation.maxExecutors` instead.
Related codes:
f87153a3ac/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala (L101)
## How was this patch tested?
NA
Please review http://spark.apache.org/contributing.html before opening a pull request.
Closes#23845 from liupc/Fix-incorrect-maxNumExecutorFailures-for-streaming.
Lead-authored-by: Liupengcheng <liupengcheng@xiaomi.com>
Co-authored-by: liupengcheng <liupengcheng@xiaomi.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Speed up running k8s integration tests locally by allowing folks to skip the tgz dist build and extraction
Run tests locally without a distribution of Spark, just a local build
Closes#23380 from holdenk/SPARK-26343-Speed-up-running-the-kubernetes-integration-tests-locally.
Authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Expose Spark UI port on driver service to access logs from service.
## How was this patch tested?
The patch was tested using unit tests being contributed as a part of the PR
Closes#23990 from chandulal/SPARK-27061.
Authored-by: chandulal.kavar <cckavar@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This pr adds 2 maven properties to help us upgrade the built-in Hive.
| Property Name | Default | In future |
| ------ | ------ | ------ |
| hive.classifier | (none) | core |
| hive.parquet.group | com.twitter | org.apache.parquet |
## How was this patch tested?
existing tests
Closes#23996 from wangyum/add_2_maven_properties.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Make k8s client timeouts configurable. No test suite exists for the client factory class, happy to add one if needed
Closes#23928 from onursatici/os/k8s-client-timeouts.
Lead-authored-by: Onur Satici <osatici@palantir.com>
Co-authored-by: Onur Satici <onursatici@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Escape arguments for submissions sent to a Mesos dispatcher; analogous change to https://issues.apache.org/jira/browse/SPARK-24380 for confs.
Since this changes behavior than some users are undoubtedly already working around, probably best to only only merge into master.
## How was this patch tested?
Added a new unit test, covering some existing behavior as well.
Closes#23967 from mwlon/SPARK-27015.
Authored-by: mwlon <mloncaric@hmc.edu>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Introducing new config for initially blacklisted YARN nodes.
## How was this patch tested?
With existing and a new unit test.
Closes#23616 from attilapiros/SPARK-26688.
Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
## What changes were proposed in this pull request?
Retrieve enableFetcherCache option from submission conf rather than dispatcher conf. This resolves some confusing behavior where Spark drivers currently get this conf from the dispatcher, whereas Spark executors get this conf from the submission. After this change, the conf will only need to be specified once.
## How was this patch tested?
With (updated) existing tests.
Closes#23924 from mwlon/SPARK-26192.
Authored-by: mwlon <mloncaric@hmc.edu>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy.
The issue can be reproduced for example as follows: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")`
The key part of the error stack is as follows `SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)`
The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64.
Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10
## What changes were proposed in this pull request?
A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here).
## How was this patch tested?
Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")`
Added a test to the KubernetesSuite / BasicTestsSuite
Closes#23898 from LucaCanali/dockerfileUpdateSPARK26995.
Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
`MetricsSystem` instance creations have a scattered distribution in the project code. So do their names. It may cause some inconvenience for browsing and management.
This PR tries to put them together. In this way, we can have a uniform location for adding or removing them, and have a overall view of `MetircsSystem `instances in current project.
It's also helpful for maintaining user documents by avoiding missing something.
## How was this patch tested?
Existing unit tests.
Closes#23869 from SongYadong/metrics_system_inst_manage.
Lead-authored-by: SongYadong <song.yadong1@zte.com.cn>
Co-authored-by: walter2001 <ydsong2007@163.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Using the current time as an ID is more prone to clashes than people generally
realize, so try to make things a bit more unique without necessarily using a
UUID, which would eat too much space in the names otherwise.
The implemented approach uses some bits from the current time, plus some random
bits, which should be more resistant to clashes.
Closes#23805 from vanzin/SPARK-26420.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Before this change, there was some code in the k8s backend to deal
with how to resolve dependencies and make them available to the
Spark application. It turns out that none of that code is necessary,
since spark-submit already handles all that for applications started
in client mode - like the k8s driver that is run inside a Spark-created
pod.
For that reason, specifically for pyspark, there's no need for the
k8s backend to deal with PYTHONPATH; or, in general, to change the URIs
provided by the user at all. spark-submit takes care of that.
For testing, I created a pyspark script that depends on another module
that is shipped with --py-files. Then I used:
- --py-files http://.../dep.pyhttp://.../test.py
- --py-files http://.../dep.ziphttp://.../test.py
- --py-files local:/.../dep.py local:/.../test.py
- --py-files local:/.../dep.zip local:/.../test.py
Without this change, all of the above commands fail. With the change, the
driver is able to see the dependencies in all the above cases; but executors
don't see the dependencies in the last two. That's a bug in shared Spark code
that deals with local: dependencies in pyspark (SPARK-26934).
I also tested a Scala app using the main jar from an http server.
Closes#23793 from vanzin/SPARK-24736.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Comparing whether Boolean expression is equal to true is redundant
For example:
The datatype of `a` is boolean.
Before:
if (a == true)
After:
if (a)
## How was this patch tested?
N/A
Closes#23884 from 10110346/simplifyboolean.
Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This patch applies redaction to command line arguments before logging them. This applies to two resource managers: standalone cluster and YARN.
This patch only concerns about arguments starting with `-D` since Spark is likely passing the Spark configuration to command line arguments as `-Dspark.blabla=blabla`. More change is necessary if we also want to handle the case of `--conf spark.blabla=blabla`.
## How was this patch tested?
Added UT for redact logic. This patch only touches how to log so not easy to add UT regarding it.
Closes#23820 from HeartSaVioR/MINOR-redact-command-line-args-for-running-driver-executor.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Since the yarn module is actually private to Spark, this interface was never
actually "public". Since it has no use inside of Spark, let's avoid adding
a yarn-specific extension that isn't public, and point any potential users
are more general solutions (like using a SparkListener).
Closes#23839 from vanzin/SPARK-26788.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
Changed the `kubernetes-client` version to 4.1.2. Latest version fix error with exec credentials (used by aws eks) and this will be used to talk with kubernetes API server. Users can submit spark job to EKS api endpoint now with this patch.
## How was this patch tested?
unit tests and manual tests.
Closes#23814 from Jeffwan/update_k8s_sdk.
Authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This PR aims to fix some outdated comments about task schedulers.
1. Change "ClusterScheduler" to "YarnScheduler" in comments of `YarnClusterScheduler`
According to [SPARK-1140 Remove references to ClusterScheduler](https://issues.apache.org/jira/browse/SPARK-1140), ClusterScheduler is not used anymore.
I also searched "ClusterScheduler" within the whole project, no other occurrences are found in comments or test cases. Note classes like `YarnClusterSchedulerBackend` or `MesosClusterScheduler` are not relevant.
2. Update comments about `statusUpdate` from `TaskSetManager`
`statusUpdate` has been moved to `TaskSchedulerImpl`. StatusUpdate event handling is delegated to `handleSuccessfulTask`/`handleFailedTask`.
## How was this patch tested?
N/A. Fix comments only.
Closes#23844 from seancxmao/taskscheduler-comments.
Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Currently, when running applications on yarn mode, the app staging directory of is controlled by `spark.yarn.stagingDir` config if specified, and this directory cannot separate different users, sometimes, it's inconvenient for file and quota management for users.
Sometimes, there might be an unexpected increasing of the staging files, two possible reasons are:
1. The `spark.yarn.preserve.staging.files` provided can be misused by users
2. cron task constantly starting new applications on non-existent yarn queue(wrong configuration).
But now, we are not easy to find out the which user obtains the most HDFS files or spaces.
what's more, even we want set HDFS name quota or space quota for each user to limit the increase is impossible.
So I propose to add user sub directories under this app staging directory which is more clear.
existing UT
Closes#23786 from liupc/Support-user-level-app-staging-dir.
Authored-by: Liupengcheng <liupengcheng@xiaomi.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Since the host name is derived from the app name, which can contain arbitrary
characters, it needs to be sanitized so that only valid characters are allowed.
On top of that, take extra care that truncation doesn't leave characters that
are valid except at the start of a host name.
Closes#23781 from vanzin/SPARK-24894.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Add the kubernetes integration tests to the scalastyle profiles.
## How was this patch tested?
Run ./dev/scalastyle with a bad change manually
## Follow on work
See SPARK-26898 to add scalastyle for k8s integration to the CI
Closes#23792 from holdenk/SPARK-26882-check-k8s-integration-tests-when-linting.
Authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The test "RequestExecutors reflects node blacklist and is serializable" is flaky because of multi threaded access of the mock task scheduler. For details check [Mockito FAQ (occasional exceptions like: WrongTypeOfReturnValue)](https://github.com/mockito/mockito/wiki/FAQ#is-mockito-thread-safe). So instead of mocking the task scheduler in the test TaskSchedulerImpl is simply subclassed.
This multithreaded access of the `nodeBlacklist()` method is coming from:
1) the unit test thread via calling of the method `prepareRequestExecutors()`
2) the `DriverEndpoint.onStart` which runs a periodic task that ends up calling this method
Existing unittest.
Closes#23801 from attilapiros/SPARK-26891.
Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
`HadoopDelegationTokenProvider` has basically the same functionality just like `ServiceCredentialProvider` so the interfaces can be merged.
`YARNHadoopDelegationTokenManager` now loads `ServiceCredentialProvider`s in one step. The drawback of this if one provider fails all others are not loaded. `HadoopDelegationTokenManager` loads `HadoopDelegationTokenProvider`s independently so it provides more robust behaviour.
In this PR I've I've made the following changes:
* Deleted `YARNHadoopDelegationTokenManager` and `ServiceCredentialProvider`
* Made `HadoopDelegationTokenProvider` a `DeveloperApi`
## How was this patch tested?
Existing unit tests.
Closes#23686 from gaborgsomogyi/SPARK-26772.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This patch proposes to change the approach on extracting log urls as well as attributes from YARN executor:
- AS-IS: extract information from `Container` API and include them to container launch context
- TO-BE: let YARN executor self-extracting information
This approach leads us to populate more attributes like nodemanager's IPC port which can let us configure custom log url to JHS log url directly.
## How was this patch tested?
Existing unit tests.
Closes#23706 from HeartSaVioR/SPARK-26790.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
In the PR, I propose to use `System.nanoTime()` instead of `System.currentTimeMillis()` in measurements of time intervals.
`System.currentTimeMillis()` returns current wallclock time and will follow changes to the system clock. Thus, negative wallclock adjustments can cause timeouts to "hang" for a long time (until wallclock time has caught up to its previous value again). This can happen when ntpd does a "step" after the network has been disconnected for some time. The most canonical example is during system bootup when DHCP takes longer than usual. This can lead to failures that are really hard to understand/reproduce. `System.nanoTime()` is guaranteed to be monotonically increasing irrespective of wallclock changes.
## How was this patch tested?
By existing test suites.
Closes#23727 from MaxGekk/system-nanotime.
Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
This patch makes hardcoded configs in "mesos" module to use ConfigEntry, avoiding issues on mistake like SPARK-26082.
Please note that there're some changes on type while migrating to ConfigEntry: specifically "comma-separated list on a string" becomes "sequence of strings". While SparkConf smoothly handles on the change (comma-separated list on a string is still supported so backward compatible), there're some methods in utility class (`mesos` package private) to depend on the type change, so this patch also modifies the method signature for them a bit.
## How was this patch tested?
Existing tests.
Closes#23743 from HeartSaVioR/SPARK-26843.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
#23744 added a UT to prevent a future regression. However, it breaks Scala-2.11 build. This fixes that.
## How was this patch tested?
Manual test with Scala-2.11 profile.
Closes#23755 from HeartSaVioR/SPARK-26082-FOLLOW-UP-V2.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Delegation token providers interface now has a parameter `fileSystems` but this is needed only for `HadoopFSDelegationTokenProvider`.
In this PR I've addressed this issue in the following way:
* Removed `fileSystems` parameter from `HadoopDelegationTokenProvider`
* Moved `YarnSparkHadoopUtil.hadoopFSsToAccess` into `HadoopFSDelegationTokenProvider`
* Moved `spark.yarn.stagingDir` into core
* Moved `spark.yarn.access.namenodes` into core and renamed to `spark.kerberos.access.namenodes`
* Moved `spark.yarn.access.hadoopFileSystems` into core and renamed to `spark.kerberos.access.hadoopFileSystems`
## How was this patch tested?
Existing unit tests.
Closes#23698 from gaborgsomogyi/SPARK-26766.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
This patch adds UT on testing SPARK-26082 to avoid regression. While #23743 reduces the possibility to make a similar mistake, the needed lines of code for adding tests are not that huge, so I guess it might be worth to add them.
## How was this patch tested?
Newly added UTs. Test "supports setting fetcher cache" fails when #23743 is not applied and succeeds when #23743 is applied.
Closes#23744 from HeartSaVioR/SPARK-26082-add-unit-test.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
* change MesosClusterScheduler to use correct argument name for Mesos fetch cache (spark.mesos.fetchCache.enable -> spark.mesos.fetcherCache.enable)
## How was this patch tested?
Not sure this requires a test, since it's just a string change.
Closes#23734 from mwlon/SPARK-26082.
Authored-by: mwlon <mloncaric@hmc.edu>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Merge both case statements, and remove unused variables that
are not set by the Scala code anymore.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#23655 from vanzin/SPARK-26733.
## What changes were proposed in this pull request?
- Covers latest minikube versions.
- keeps the older version support
Note: While I was facing disk pressure issues locally on machine, I noticed minikube status command would report that everything was working fine even if some kube-system pods were not up. I don't think the output is 100% reliable but it is good enough for most cases.
## How was this patch tested?
Run it against latest version of minikube (v0.32.0).
Author: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Closes#23520 from skonto/update-mini-backend.
## What changes were proposed in this pull request?
This patch proposes adding a new configuration on SHS: custom executor log URL pattern. This will enable end users to replace executor logs to other than RM provide, like external log service, which enables to serve executor logs when NodeManager becomes unavailable in case of YARN.
End users can build their own of custom executor log URLs with pre-defined patterns which would be vary on each resource manager. This patch adds some patterns to YARN resource manager. (For others, there's even no executor log url available so cannot define patterns as well.)
Please refer the doc change as well as added UTs in this patch to see how to set up the feature.
## How was this patch tested?
Added UT, as well as manual test with YARN cluster
Closes#23260 from HeartSaVioR/SPARK-26311.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This change addes a new mode for credential renewal that does not require
a keytab; it uses the local ticket cache instead, so it works while the
user keeps the cache valid.
This can be useful for, e.g., people running long spark-shell sessions where
their kerberos login is kept up-to-date.
The main change to enable this behavior is in HadoopDelegationTokenManager,
with a small change in the HDFS token provider. The other changes are to avoid
creating duplicate tokens when submitting the application to YARN; they allow
the tokens from the scheduler to be sent to the YARN AM, reducing the round trips
to HDFS.
For that, the scheduler initialization code was changed a little bit so that
the tokens are available when the YARN client is initialized. That basically
takes care of a long-standing TODO that was in the code to clean up configuration
propagation to the driver's RPC endpoint (in CoarseGrainedSchedulerBackend).
Tested with an app designed to stress this functionality, with both keytab and
cache-based logins. Some basic kerberos tests on k8s also.
Closes#23525 from vanzin/SPARK-26595.
Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?
Providing a new configuration "spark.yarn.un-managed-am" (defaults to false) to enable the Unmanaged AM Application in Yarn Client mode which launches the Application Master service as part of the Client. It utilizes the existing code for communicating between the Application Master <-> Task Scheduler for the container requests/allocations/launch, and eliminates these,
1. Allocating and launching the Application Master container
2. Remote Node/Process communication between Application Master <-> Task Scheduler
## How was this patch tested?
I verified manually running the applications in yarn-client mode with "spark.yarn.un-managed-am" enabled, and also ensured that there is no impact to the existing execution flows.
I would like to hear others feedback/thoughts on this.
Closes#19616 from devaraj-kavali/SPARK-22404.
Authored-by: Devaraj K <devaraj@apache.org>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>