Commit graph

198 commits

Author SHA1 Message Date
Cong Du 54b97b2e14 [MINOR][DOCS] Fix a typo in ContainerPlacementStrategy's class comment
### What changes were proposed in this pull request?
This PR fixes a typo in deploy/yarn/LocalityPreferredContainerPlacementStrategy.scala file.

### Why are the changes needed?
To deliver correct explanation about how the placement policy works.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
UT as specified, although shouldn't influence any functionality since it's in the comment.

Closes #28267 from asclepiusaka/master.

Authored-by: Cong Du <asclepius1993@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-22 09:44:43 -05:00
beliefer 0722dc5fb8 [SPARK-31092][YARN][DOC] Add version information to the configuration of Yarn
### What changes were proposed in this pull request?
Add version information to the configuration of `Yarn`.

I sorted out some information show below.

Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.yarn.tags | 1.5.0 | SPARK-9782 | 9b731fad2b43ca18f3c5274062d4c7bc2622ab72#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.priority | 3.0.0 | SPARK-29603 | 4615769736f4c052ae1a2de26e715e229154cd2f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.attemptFailuresValidityInterval | 1.6.0 | SPARK-10739 | f97e9323b526b3d0b0fee0ca03f4276f37bb5750#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.executor.failuresValidityInterval | 2.0.0 | SPARK-6735 | 8b44bd52fa40c0fc7d34798c3654e31533fd3008#diff-14b8ed2ef4e3da985300b8d796a38fa9 |
spark.yarn.maxAppAttempts | 1.3.0 | SPARK-2165 | 8fdd48959c93b9cf809f03549e2ae6c4687d1fcd#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.user.classpath.first | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.gatewayPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.config.replacementPath | 1.5.0 | SPARK-8302 | 37bf76a2de2143ec6348a3d43b782227849520cc#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.queue | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.historyServer.address | 1.0.0 | SPARK-1408 | 0058b5d2c74147d24b127a5432f89ebc7050dc18#diff-923ae58523a12397f74dd590744b8b41 |  
spark.yarn.historyServer.allowTracking | 2.2.0 | SPARK-19554 | 4661d30b988bf773ab45a15b143efb2908d33743#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.archive | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.jars | 2.0.0 | SPARK-13577 | 07f1c5447753a3d593cd6ececfcb03c11b1cf8ff#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.dist.archives | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.files | 1.0.0 | SPARK-1126 | 1617816090e7b20124a512a43860a21232ebf511#diff-ae6a41a938a767e5bb97b5d738371a5b |  
spark.yarn.dist.jars | 2.0.0 | SPARK-12343 | 8ba2b7f28fee39c4839e5ea125bd25f5091a3a1e#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.preserve.staging.files | 1.1.0 | SPARK-2933 | b92d823ad13f6fcc325eeb99563bea543871c6aa#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.submit.file.replication | 0.8.1 | None | 4668fcb9ff8f9c176c4866480d52dde5d67c8522#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.submit.waitAppCompletion | 1.4.0 | SPARK-3591 | b65bad65c3500475b974ca0219f218eef296db2c#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.report.interval | 0.9.0 | None | ebdfa6bb9766209bc5a3c4241fa47141c5e9c5cb#diff-e0a7ae95b6d8e04a67ebca0945d27b65 |  
spark.yarn.clientLaunchMonitorInterval | 2.3.0 | SPARK-16019 | 1cad31f00644d899d8e74d58c6eb4e9f72065473#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.waitTime | 1.3.0 | SPARK-3779 | 253b72b56fe908bbab5d621eae8a5f359c639dfd#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.metrics.namespace | 2.4.0 | SPARK-24594 | d2436a85294a178398525c37833dae79d45c1452#diff-4804e0f83ca7f891183eb0db229b4b9a |
spark.yarn.am.nodeLabelExpression | 1.6.0 | SPARK-7173 | 7db3610327d0725ec2ad378bc873b127a59bb87a#diff-b050df3f55b82065803d6e83453b9706 |
spark.yarn.containerLauncherMaxThreads | 1.2.0 | SPARK-1713 | 1f4a648d4e30e837d6cf3ea8de1808e2254ad70b#diff-801a04f9e67321f3203399f7f59234c1 |  
spark.yarn.max.executor.failures | 1.0.0 | SPARK-1183 | 698373211ef3cdf841c82d48168cd5dbe00a57b4#diff-0c239e58b37779967e0841fb42f3415a |  
spark.yarn.scheduler.reporterThread.maxFailures | 1.2.0 | SPARK-3304 | 11c10df825419372df61a8d23c51e8c3cc78047f#diff-85a1f4b2810b3e11b8434dcefac5bb85 |  
spark.yarn.scheduler.heartbeat.interval-ms | 0.8.1 | None | ee22be0e6c302fb2cdb24f83365c2b8a43a1baab#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.scheduler.initial-allocation.interval | 1.4.0 | SPARK-7533 | 3ddf051ee7256f642f8a17768d161c7b5f55c7e1#diff-87125050a2e2eaf87ea83aac9c19b200 |  
spark.yarn.am.finalMessageLimit | 2.4.0 | SPARK-25174 | f8346d2fc01f1e881e4e3f9c4499bf5f9e3ceb3f#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.am.cores | 1.3.0 | SPARK-1507 | 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.extraJavaOptions | 1.3.0 | SPARK-5087 | 8d45834debc6986e61831d0d6e982d5528dccc51#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.extraLibraryPath | 1.4.0 | SPARK-7281 | 7b5dd3e3c0030087eea5a8224789352c03717c1d#diff-b050df3f55b82065803d6e83453b9706 |  
spark.yarn.am.memoryOverhead | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.yarn.am.memory | 1.3.0 | SPARK-1953 | e96645206006a009e5c1a23bbd177dcaf3ef9b83#diff-746d34aa06bfa57adb9289011e725472 |  
spark.driver.appUIAddress | 1.1.0 | SPARK-1291 | 72ea56da8e383c61c6f18eeefef03b9af00f5158#diff-2b4617e158e9c5999733759550440b96 |  
spark.yarn.executor.nodeLabelExpression | 1.4.0 | SPARK-6470 | 82fee9d9aad2c9ba2fb4bd658579fe99218cafac#diff-d4620cf162e045960d84c88b2e0aa428 |  
spark.yarn.unmanagedAM.enabled | 3.0.0 | SPARK-22404 | f06bc0cd1dee2a58e04ebf24bf719a2f7ef2dc4e#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.rolledLog.includePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.rolledLog.excludePattern | 2.0.0 | SPARK-15990 | 272a2f78f3ff801b94a81fa8fcc6633190eaa2f4#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.user.jar | 1.1.0 | SPARK-1395 | e380767de344fd6898429de43da592658fd86a39#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.secondary.jars | 0.9.2 | SPARK-1870 | 1d3aab96120c6770399e78a72b5692cf8f61a144#diff-50b743cff4885220c828b16c44eeecfd |  
spark.yarn.cache.filenames | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.sizes | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.timestamps | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.visibilities | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.types | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.cache.confArchive | 2.0.0 | SPARK-14602 | f47dbf27fa034629fab12d0f3c89ab75edb03f86#diff-14b8ed2ef4e3da985300b8d796a38fa9 |  
spark.yarn.blacklist.executor.launch.blacklisting.enabled | 2.4.0 | SPARK-16630 | b56e9c613fb345472da3db1a567ee129621f6bf3#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.exclude.nodes | 3.0.0 | SPARK-26688 | caceaec93203edaea1d521b88e82ef67094cdea9#diff-4804e0f83ca7f891183eb0db229b4b9a |  
The following appears in the document |   |   |   |  
spark.yarn.am.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.driver.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.executor.resource.{resource-type}.amount | 3.0.0 | SPARK-20327 | 3946de773498621f88009c309254b019848ed490#diff-4804e0f83ca7f891183eb0db229b4b9a |  
spark.yarn.appMasterEnv.[EnvironmentVariableName] | 1.1.0 | SPARK-1680 | 7b798e10e214cd407d3399e2cab9e3789f9a929e#diff-50e237ea17ce94c3ccfc44143518a5f7 |  
spark.yarn.kerberos.relogin.period | 2.3.0 | SPARK-22290 | dc2714da50ecba1bf1fdf555a82a4314f763a76e#diff-4804e0f83ca7f891183eb0db229b4b9a |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Exists UT

Closes #27856 from beliefer/add-version-to-yarn-config.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-03-12 09:52:57 +09:00
Thomas Graves 0e2ca11d80 [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling
### What changes were proposed in this pull request?

Yarn side changes for Stage level scheduling.  The previous PR for dynamic allocation changes was https://github.com/apache/spark/pull/27313

Modified the data structures to store things on a per ResourceProfile basis.
 I tried to keep the code changes to a minimum, the main loop that requests just goes through each Resourceprofile and the logic inside for each one stayed very close to the same.
On submission we now have to give each ResourceProfile a separate yarn Priority because yarn doesn't support asking for containers with different resources at the same Priority. We just use the profile id as the priority level.
Using a different Priority actually makes things easier when the containers come back to match them again which ResourceProfile they were requested for.
The expectation is that yarn will only give you a container with resource amounts you requested or more. It should never give you a container if it doesn't satisfy your resource requests.

If you want to see the full feature changes you can look at https://github.com/apache/spark/pull/27053/files for reference

### Why are the changes needed?

For stage level scheduling YARN support.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Tested manually on YARN cluster and then unit tests.

Closes #27583 from tgravescs/SPARK-29149.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-28 15:23:33 -06:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
Thomas Graves 496f6ac860 [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes
### What changes were proposed in this pull request?

This is another PR for stage level scheduling. In particular this adds changes to the dynamic allocation manager and the scheduler backend to be able to track what executors are needed per ResourceProfile.  Note the api is still private to Spark until the entire feature gets in, so this functionality will be there but only usable by tests for profiles other then the DefaultProfile.

The main changes here are simply tracking things on a ResourceProfile basis as well as sending the executor requests to the scheduler backend for all ResourceProfiles.

I introduce a ResourceProfileManager in this PR that will track all the actual ResourceProfile objects so that we can keep them all in a single place and just pass around and use in datastructures the resource profile id. The resource profile id can be used with the ResourceProfileManager to get the actual ResourceProfile contents.

There are various places in the code that use executor "slots" for things.  The ResourceProfile adds functionality to keep that calculation in it.   This logic is more complex then it should due to standalone mode and mesos coarse grained not setting the executor cores config. They default to all cores on the worker, so calculating slots is harder there.
This PR keeps the functionality to make the cores the limiting resource because the scheduler still uses that for "slots" for a few things.

This PR does also add the resource profile id to the Stage and stage info classes to be able to test things easier.   That full set of changes will come with the scheduler PR that will be after this one.

The PR stops at the scheduler backend pieces for the cluster manager and the real YARN support hasn't been added in this PR, that again will be in a separate PR, so this has a few of the API changes up to the cluster manager and then just uses the default profile requests to continue.

The code for the entire feature is here for reference: https://github.com/apache/spark/pull/27053/files although it needs to be upmerged again as well.

### Why are the changes needed?

Needed for stage level scheduling feature.

### Does this PR introduce any user-facing change?

No user facing api changes added here.

### How was this patch tested?

Lots of unit tests and manually testing. I tested on yarn, k8s, standalone, local modes. Ran both failure and success cases.

Closes #27313 from tgravescs/SPARK-29148.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-02-12 16:45:42 -06:00
Thomas Graves 878094f972 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
### What changes were proposed in this pull request?

This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment.
This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource.

I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs.

I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment.

The user would configured this like:
--conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin"

Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission.

Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources.

### Why are the changes needed?

to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments.

### Does this PR introduce any user-facing change?

Yes a plugin api

### How was this patch tested?

Unit tests added and manual testing done on yarn and standalone modes.

Closes #27410 from tgravescs/hadoop27spark3.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-31 22:20:28 -06:00
Thomas Graves 6dbfa2bb9c [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with
### What changes were proposed in this pull request?

This is the second PR for the Stage Level Scheduling. This is adding in the necessary executor side changes:
1) executors to know what ResourceProfile they should be using
2) handle parsing the resource profile settings - these are not in the global configs
3) then reporting back to the driver what resource profile it was started with.

This PR adds all the piping for YARN to pass the information all the way to executors, but it just uses the default ResourceProfile (which is the global applicatino level configs).

At a high level these changes include:
1) adding a new --resourceProfileId option to the CoarseGrainedExecutorBackend
2) Add the ResourceProfile settings to new internal confs that gets passed into the Executor
3) Executor changes that use the resource profile id passed in to read the corresponding ResourceProfile confs and then parse those requests and discover resources as necessary
4) Executor registers to Driver with the Resource profile id so that the ExecutorMonitor can track how many executor with each profile are running
5) YARN side changes to show that passing the resource profile id and confs actually works. Just uses the DefaultResourceProfile for now.

I also removed a check from the CoarseGrainedExecutorBackend that used to check to make sure there were task requirements before parsing any custom resource executor requests.  With the resource profiles this becomes much more expensive because we would then have to pass the task requests to each executor and the check was just a short cut and not really needed. It was much cleaner just to remove it.

Note there were some changes to the ResourceProfile, ExecutorResourceRequests, and TaskResourceRequests in this PR as well because I discovered some issues with things not being immutable. That api now look like:

val rpBuilder = new ResourceProfileBuilder()
val ereq = new ExecutorResourceRequests()
val treq = new TaskResourceRequests()

ereq.cores(2).memory("6g").memoryOverhead("2g").pysparkMemory("2g").resource("gpu", 2, "/home/tgraves/getGpus")
treq.cpus(2).resource("gpu", 2)

val resourceProfile = rpBuilder.require(ereq).require(treq).build

This makes is so that ResourceProfile is immutable and Spark can use it directly without worrying about the user changing it.

### Why are the changes needed?

These changes are needed for the executor to report which ResourceProfile they are using so that ultimately the dynamic allocation manager can use that information to know how many with a profile are running and how many more it needs to request.  Its also needed to get the resource profile confs to the executor so that it can run the appropriate discovery script if needed.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests and manually on YARN.

Closes #26682 from tgravescs/SPARK-29306.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2020-01-17 08:15:25 -06:00
yi.wu 4a093176ea [SPARK-30359][CORE] Don't clear executorsPendingToRemove at the beginning of CoarseGrainedSchedulerBackend.reset
### What changes were proposed in this pull request?

Remove `executorsPendingToRemove.clear()` from `CoarseGrainedSchedulerBackend.reset()`.

### Why are the changes needed?

Clear `executorsPendingToRemove` before remove executors will cause all tasks running on those "pending to remove" executors to count failures. But that's not true for the case of `executorsPendingToRemove(execId)=true`.

Besides, `executorsPendingToRemove` will be cleaned up within `removeExecutor()` at the end just as same as `executorsPendingLossReason`.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added a new test in `TaskSetManagerSuite`.

Closes #27017 from Ngone51/dont-clear-eptr-in-reset.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-03 22:54:05 +08:00
Jobit Mathew 1b0570c6af [SPARK-30387] Improving stop hook log message
### What changes were proposed in this pull request?

ShutdownHook of YarnClientSchedulerBackend prints just "Stopped" which can be improved to "YarnClientSchedulerBackend Stopped" for better understanding.

### Why are the changes needed?

While stopping or gracefully exiting the spark-shell/spark-sql --master yarn, only printing `stopped` is useless.
### Does this PR introduce any user-facing change?

Yes. Log info message change.

### How was this patch tested?

Manually

Closes #27049 from jobitmathew/imp_stop_message.

Authored-by: Jobit Mathew <jobit.mathew@huawei.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-02 14:48:36 -06:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
ulysses c0507e0f75 [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars
### What changes were proposed in this pull request?

When set `spark.yarn.jars=/xxx/xxx` which is just a no schema path, spark will throw a NullPointerException.

The reason is hdfs will return null if pathFs.globStatus(path) is not exist, and spark just use `pathFs.globStatus(path).filter(_.isFile())` without check it.

### Why are the changes needed?

Avoid NullPointerException.

### Does this PR introduce any user-facing change?

Yes. User will get a FileNotFoundException instead NullPointerException when `spark.yarn.jars` does not have schema and not exists.

### How was this patch tested?

Add UT.

Closes #26462 from ulysses-you/check-yarn-jars-path-exist.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-15 16:17:24 -08:00
Nishchal Venkataramana 833a9f12e2 [SPARK-24203][CORE] Make executor's bindAddress configurable
### What changes were proposed in this pull request?
With this change, executor's bindAddress is passed as an input parameter for RPCEnv.create.
A previous PR https://github.com/apache/spark/pull/21261 which addressed the same, was using a Spark Conf property to get the bindAddress which wouldn't have worked for multiple executors.
This PR is to enable anyone overriding CoarseGrainedExecutorBackend with their custom one to be able to invoke CoarseGrainedExecutorBackend.main() along with the option to configure bindAddress.

### Why are the changes needed?
This is required when Kernel-based Virtual Machine (KVM)'s are used inside Linux container where the hostname is not the same as container hostname.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by running jobs with executors on KVMs inside a linux container.

Closes #26331 from nishchalv/SPARK-29670.

Lead-authored-by: Nishchal Venkataramana <nishchal@apple.com>
Co-authored-by: nishchal <nishchal@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
2019-11-13 22:01:48 +00:00
Kent Yao 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling
### What changes were proposed in this pull request?

Priority for YARN to define pending applications ordering policy, those with higher priority have a better opportunity to be activated. YARN CapacityScheduler only.

### Why are the changes needed?

Ordering pending spark apps
### Does this PR introduce any user-facing change?

add a conf
### How was this patch tested?

add ut

Closes #26255 from yaooqinn/SPARK-29603.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-06 10:12:27 -08:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00
Xingbo Jiang 007c873ae3 Prepare Spark release v3.0.0-preview-rc2
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the sparkR version number check logic to allow jvm version like `3.0.0-preview`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A
2019-10-30 17:42:59 -07:00
Xingbo Jiang b33a58c0c6 Revert "Prepare Spark release v3.0.0-preview-rc1"
This reverts commit 5eddbb5f1d.
2019-10-28 22:32:34 -07:00
Xingbo Jiang 5eddbb5f1d Prepare Spark release v3.0.0-preview-rc1
### What changes were proposed in this pull request?

To push the built jars to maven release repository, we need to remove the 'SNAPSHOT' tag from the version name.

Made the following changes in this PR:
* Update all the `3.0.0-SNAPSHOT` version name to `3.0.0-preview`
* Update the PySpark version from `3.0.0.dev0` to `3.0.0`

**Please note those changes were generated by the release script in the past, but this time since we manually add tags on master branch, we need to manually apply those changes too.**

We shall revert the changes after 3.0.0-preview release passed.

### Why are the changes needed?

To make the maven release repository to accept the built jars.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

N/A

Closes #26243 from jiangxb1987/3.0.0-preview-prepare.

Lead-authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-10-28 22:31:29 -07:00
Dongjoon Hyun bd031c2173 [SPARK-29307][BUILD][TESTS] Remove scalatest deprecation warnings
### What changes were proposed in this pull request?

This PR aims to remove `scalatest` deprecation warnings with the following changes.
- `org.scalatest.mockito.MockitoSugar` -> `org.scalatestplus.mockito.MockitoSugar`
- `org.scalatest.selenium.WebBrowser` -> `org.scalatestplus.selenium.WebBrowser`
- `org.scalatest.prop.Checkers` -> `org.scalatestplus.scalacheck.Checkers`
- `org.scalatest.prop.GeneratorDrivenPropertyChecks` -> `org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks`

### Why are the changes needed?

According to the Jenkins logs, there are 118 warnings about this.
```
 grep "is deprecated" ~/consoleText | grep scalatest | wc -l
     118
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

After Jenkins passes, we need to check the Jenkins log.

Closes #25982 from dongjoon-hyun/SPARK-29307.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 21:00:11 -07:00
Sean Owen e1ea806b30 [SPARK-29291][CORE][SQL][STREAMING][MLLIB] Change procedure-like declaration to function + Unit for 2.13
### What changes were proposed in this pull request?

Scala 2.13 emits a deprecation warning for procedure-like declarations:

```
def foo() {
 ...
```

This is equivalent to the following, so should be changed to avoid a warning:

```
def foo(): Unit = {
  ...
```

### Why are the changes needed?

It will avoid about a thousand compiler warnings when we start to support Scala 2.13. I wanted to make the change in 3.0 as there are less likely to be back-ports from 3.0 to 2.4 than 3.1 to 3.0, for example, minimizing that downside to touching so many files.

Unfortunately, that makes this quite a big change.

### Does this PR introduce any user-facing change?

No behavior change at all.

### How was this patch tested?

Existing tests.

Closes #25968 from srowen/SPARK-29291.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-30 10:03:23 -07:00
LantaoJin 0b6775e6e9 [SPARK-29112][YARN] Expose more details when ApplicationMaster reporter faces a fatal exception
### What changes were proposed in this pull request?
In `ApplicationMaster.Reporter` thread, fatal exception information is swallowed. It's better to expose it.
We found our thrift server was shutdown due to a fatal exception but no useful information from log.

> 19/09/16 06:59:54,498 INFO [Reporter] yarn.ApplicationMaster:54 : Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
19/09/16 06:59:54,500 ERROR [Driver] thriftserver.HiveThriftServer2:91 : Error starting HiveThriftServer2
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:160)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:708)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manual test

Closes #25810 from LantaoJin/SPARK-29112.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: jerryshao <jerryshao@tencent.com>
2019-09-18 14:11:39 +08:00
Andy Zhang 956f6e988c [SPARK-29080][CORE][SPARKR] Support R file extension case-insensitively
### What changes were proposed in this pull request?

Make r file extension check case insensitive for spark-submit.

### Why are the changes needed?

spark-submit does not accept `.r` files as R scripts. Some codebases have r files that end with lowercase file extensions. It is inconvenient to use spark-submit with lowercase extension R files. The error is not very clear (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L232).

```
$ ./bin/spark-submit examples/src/main/r/dataframe.r
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/Users/dongjoon/APACHE/spark-release/spark-2.4.4-bin-hadoop2.7/examples/src/main/r/dataframe.r
```

### Does this PR introduce any user-facing change?

Yes. spark-submit can now be used to run R scripts with `.r` file extension.

### How was this patch tested?

Manual.

```
$ mv examples/src/main/r/dataframe.R examples/src/main/r/dataframe.r
$ ./bin/spark-submit examples/src/main/r/dataframe.r
```

Closes #25778 from Loquats/r-case.

Authored-by: Andy Zhang <yue.zhang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 00:17:11 -07:00
Sean Owen 6378d4bc06 [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3
### What changes were proposed in this pull request?

- Remove SQLContext.createExternalTable and Catalog.createExternalTable, deprecated in favor of createTable since 2.2.0, plus tests of deprecated methods
- Remove HiveContext, deprecated in 2.0.0, in favor of `SparkSession.builder.enableHiveSupport`
- Remove deprecated KinesisUtils.createStream methods, plus tests of deprecated methods, deprecate in 2.2.0
- Remove deprecated MLlib (not Spark ML) linear method support, mostly utility constructors and 'train' methods, and associated docs. This includes methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. These have been deprecated since 2.0.0
- Remove deprecated Pyspark MLlib linear method support, including LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD
- Remove 'runs' argument in KMeans.train() method, which has been a no-op since 2.0.0
- Remove deprecated ChiSqSelector isSorted protected method
- Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor of 'yarn' and deploy mode 'cluster', etc

Notes:

- I was not able to remove deprecated DataFrameReader.json(RDD) in favor of DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is still needed to support Pyspark's .json() method, which can't use a Dataset.
- Looks like SQLContext.createExternalTable was not actually deprecated in Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable was.
- I afterwards noted that the toDegrees, toRadians functions were almost removed fully in SPARK-25908, but Felix suggested keeping just the R version as they hadn't been technically deprecated. I'd like to revisit that. Do we really want the inconsistency? I'm not against reverting it again, but then that implies leaving SQLContext.createExternalTable just in Pyspark too, which seems weird.
- I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove them (still used by StreamingLogisticRegressionWithSGD?) and they are not fully removed in Scala. Maybe should not have been deprecated.

### Why are the changes needed?

Deprecated items are easiest to remove in a major release, so we should do so as much as possible for Spark 3. This does not target items deprecated 'recently' as of Spark 2.3, which is still 18 months old.

### Does this PR introduce any user-facing change?

Yes, in that deprecated items are removed from some public APIs.

### How was this patch tested?

Existing tests.

Closes #25684 from srowen/SPARK-28980.

Lead-authored-by: Sean Owen <sean.owen@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-09-09 10:19:40 -05:00
yangjie01 a07f795aea [SPARK-28577][YARN] Resource capability requested for each executor add offHeapMemorySize
## What changes were proposed in this pull request?

If MEMORY_OFFHEAP_ENABLED is true, add MEMORY_OFFHEAP_SIZE to resource requested for executor to ensure instance has enough memory to use.

In this pr add a helper method `executorOffHeapMemorySizeAsMb` in `YarnSparkHadoopUtil`.

## How was this patch tested?
Add 3 new test suite to test `YarnSparkHadoopUtil#executorOffHeapMemorySizeAsMb`

Closes #25309 from LuciferYang/spark-28577.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-09-04 09:00:12 -05:00
Alessandro Bellina dd0725d7ea [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling
## What changes were proposed in this pull request?

This fixes issues that can arise when the jars for different hadoop versions mix, and short-circuits the case where we are running with a spark that was not built for yarn 3 (resource support).

## How was this patch tested?

I tested it manually.

Closes #25403 from abellina/SPARK-28679.

Authored-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-08-26 12:00:33 -07:00
Marcelo Vanzin 5f6eb5d20d [SPARK-28634][YARN] Ignore kerberos login config in client mode AM
This change makes the client mode AM ignore any login configuration,
which is now always handled by the driver. The previous code tried
to achieve that by modifying the configuration visible to the AM, but
that missed the case where old configuration names were being used.

Tested in real cluster with reproduction provided in the bug.

Closes #25467 from vanzin/SPARK-28634.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-08-19 11:06:02 -07:00
Yuanjian Li db39f45baf [SPARK-28593][CORE] Rename ShuffleClient to BlockStoreClient which more close to its usage
## What changes were proposed in this pull request?

After SPARK-27677, the shuffle client not only handles the shuffle block but also responsible for local persist RDD blocks. For better code scalability and precise semantics(as the [discussion](https://github.com/apache/spark/pull/24892#discussion_r300173331)), here we did several changes:

- Rename ShuffleClient to BlockStoreClient.
- Correspondingly rename the ExternalShuffleClient to ExternalBlockStoreClient, also change the server-side class from ExternalShuffleBlockHandler to ExternalBlockHandler.
- Move MesosExternalBlockStoreClient to Mesos package.

Note, we still keep the name of BlockTransferService, because the `Service` contains both client and server, also the name of BlockTransferService is not referencing shuffle client only.

## How was this patch tested?

Existing UT.

Closes #25327 from xuanyuanking/SPARK-28593.

Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Co-authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-08-05 14:54:45 +08:00
Thomas Graves 43d68cd4ff [SPARK-27959][YARN] Change YARN resource configs to use .amount
## What changes were proposed in this pull request?

we are adding in generic resource support into spark where we have suffix for the amount of the resource so that we could support other configs.

Spark on yarn already had added configs to request resources via the configs spark.yarn.{executor/driver/am}.resource=<some amount>, where the <some amount> is value and unit together.  We should change those configs to have a `.amount` suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. it would allow for a spark.yarn.{executor/driver/am}.resource.{resource}.tag= type config.

## How was this patch tested?

Tested via unit tests and manually on a yarn 3.x cluster with GPU resources configured on.

Closes #24989 from tgravescs/SPARK-27959-yarn-resourceconfigs.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-07-16 10:56:07 -07:00
Gabor Somogyi 8313015e8d [SPARK-28005][YARN] Remove unnecessary log from SparkRackResolver
## What changes were proposed in this pull request?

SparkRackResolver generates an INFO message every time is called with 0 arguments.
In this PR I've deleted it because it's too verbose.

## How was this patch tested?

Existing unit tests + spark-shell.

Closes #24935 from gaborgsomogyi/SPARK-28005.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-06-26 09:50:54 -05:00
Xiangrui Meng 7056e004ee [SPARK-27823][CORE] Refactor resource handling code
## What changes were proposed in this pull request?

Continue the work from https://github.com/apache/spark/pull/24821. Refactor resource handling code to make the code more readable. Major changes:

* Moved resource-related classes to `spark.resource` from `spark`.
* Added ResourceUtils and helper classes so we don't need to directly deal with Spark conf.
 * ResourceID: resource identifier and it provides conf keys
 * ResourceRequest/Allocation: abstraction for requested and allocated resources
* Added `TestResourceIDs` to reference commonly used resource IDs in tests like `spark.executor.resource.gpu`.

cc: tgravescs jiangxb1987 Ngone51

## How was this patch tested?

Unit tests for added utils and existing unit tests.

Closes #24856 from mengxr/SPARK-27823.

Lead-authored-by: Xiangrui Meng <meng@databricks.com>
Co-authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
2019-06-18 17:18:17 -07:00
Thomas Graves d30284b5a5 [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount
## What changes were proposed in this pull request?

Change the resource config spark.{executor/driver}.resource.{resourceName}.count to .amount to allow future usage of containing both a count and a unit.  Right now we only support counts - # of gpus for instance, but in the future we may want to support units for things like memory - 25G. I think making the user only have to specify a single config .amount is better then making them specify 2 separate configs of a .count and then a .unit.  Change it now since its a user facing config.

Amount also matches how the spark on yarn configs are setup.

## How was this patch tested?

Unit tests and manually verified on yarn and local cluster mode

Closes #24810 from tgravescs/SPARK-27760-amount.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-06-06 14:16:05 -05:00
HyukjinKwon 8b18ef5c7b [MINOR] Avoid hardcoded py4j-0.10.8.1-src.zip in Scala
## What changes were proposed in this pull request?

This PR targets to deduplicate hardcoded `py4j-0.10.8.1-src.zip` in order to make py4j upgrade easier.

## How was this patch tested?

N/A

Closes #24770 from HyukjinKwon/minor-py4j-dedup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-02 21:23:17 -07:00
Steven Rand 568512cc82 [SPARK-27773][SHUFFLE] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
## What changes were proposed in this pull request?

Add a metric for number of exceptions caught in the `ExternalShuffleBlockHandler`, the idea being that spikes in this metric over some time window (or more desirably, the lack thereof) can be used as an indicator of the health of an external shuffle service. (Where "health" refers to its ability to successfully respond to client requests.)

## How was this patch tested?

Deployed a build of this PR to a YARN cluster, and confirmed that the NodeManagers' JMX metrics include `numCaughtExceptions`.

Closes #24645 from sjrand/SPARK-27773.

Authored-by: Steven Rand <srand@palantir.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-30 13:57:15 -07:00
Thomas Graves 0ced4c0b13 [SPARK-27378][YARN] YARN support for GPU-aware scheduling
## What changes were proposed in this pull request?

Add yarn support for GPU-aware scheduling. Since SPARK-20327 already added yarn custom resource support, this jira is really just making sure the spark resource configs get mapped into the yarn resource configs and user doesn't specify both yarn and spark config for the known types of resources (gpu and fpga are the known types on yarn).

You can find more details on the design and requirements documented: https://issues.apache.org/jira/browse/SPARK-27376

Note that the running on yarn docs already state to use it, it must be yarn 3.0+. We will add any further documentation under SPARK-20327

## How was this patch tested?

Unit tests and manually testing on yarn cluster

Closes #24634 from tgravescs/SPARK-27361.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-30 13:23:46 -07:00
wenxuanguan e7443d6412 [SPARK-27774][CORE][MLLIB] Avoid hardcoded configs
## What changes were proposed in this pull request?

avoid hardcoded configs in `SparkConf` and `SparkSubmit` and test

## How was this patch tested?

N/A

Closes #24631 from wenxuanguan/minor-fix.

Authored-by: wenxuanguan <choose_home@126.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-22 10:45:11 +09:00
Thomas Graves db2e3c4341 [SPARK-27024] Executor interface for cluster managers to support GPU and other resources
## What changes were proposed in this pull request?

Add in GPU and generic resource type allocation to the executors.

Note this is part of a bigger feature for gpu-aware scheduling and is just how the executor find the resources. The general flow :

   - users ask for a certain set of resources, for instance number of gpus - each cluster manager has a specific way to do this.
  -  cluster manager allocates a container or set of resources (standalone mode)
-    When spark launches the executor in that container, the executor either has to be told what resources it has or it has to auto discover them.
  -  Executor has to register with Driver and tell the driver the set of resources it has so the scheduler can use that to schedule tasks that requires a certain amount of each of those resources

In this pr I added configs and arguments to the executor to be able discover resources. The argument to the executor is intended to be used by standalone mode or other cluster managers that don't have isolation so that it can assign specific resources to specific executors in case there are multiple executors on a node. The argument is a file contains JSON Array of ResourceInformation objects.

The discovery script is meant to be used in an isolated environment where the executor only sees the resources it should use.

Note that there will be follow on PRs to add other parts like the scheduler part. See the epic high level jira: https://issues.apache.org/jira/browse/SPARK-24615

## How was this patch tested?

Added unit tests and manually tested.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #24406 from tgravescs/gpu-sched-executor-clean.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-05-14 08:41:41 -05:00
Adi Muraru 8ef4da753d [SPARK-27610][YARN] Shade netty native libraries
## What changes were proposed in this pull request?

Fixed the `spark-<version>-yarn-shuffle.jar` artifact packaging to shade the native netty libraries:
- shade the `META-INF/native/libnetty_*` native libraries when packagin
the yarn shuffle service jar. This is required as netty library loader
derives that based on shaded package name.
- updated the `org/spark_project` shade package prefix to `org/sparkproject`
(i.e. removed underscore) as the former breaks the netty native lib loading.

This was causing the yarn external shuffle service to fail
when spark.shuffle.io.mode=EPOLL

## How was this patch tested?
Manual tests

Closes #24502 from amuraru/SPARK-27610_master.

Authored-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-05-07 10:47:36 -07:00
Sean Owen a6716d3f03 [SPARK-27571][CORE][YARN][EXAMPLES] Avoid scala.language.reflectiveCalls
## What changes were proposed in this pull request?

This PR avoids usage of reflective calls in Scala. It removes the import that suppresses the warnings and rewrites code in small ways to avoid accessing methods that aren't technically accessible.

## How was this patch tested?

Existing tests.

Closes #24463 from srowen/SPARK-27571.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-29 11:16:45 -05:00
gatorsmile cd4a284030 [SPARK-27460][FOLLOW-UP][TESTS] Fix flaky tests
## What changes were proposed in this pull request?

This patch makes several test flakiness fixes.

## How was this patch tested?
N/A

Closes #24434 from gatorsmile/fixFlakyTest.

Lead-authored-by: gatorsmile <gatorsmile@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-04-24 17:36:29 +08:00
Sean Owen 4ec7f631aa [SPARK-27404][CORE][SQL][STREAMING][YARN] Fix build warnings for 3.0: postfixOps edition
## What changes were proposed in this pull request?

Fix build warnings -- see some details below.

But mostly, remove use of postfix syntax where it causes warnings without the `scala.language.postfixOps` import. This is mostly in expressions like "120000 milliseconds". Which, I'd like to simplify to things like "2.minutes" anyway.

## How was this patch tested?

Existing tests.

Closes #24314 from srowen/SPARK-27404.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-11 13:43:44 -05:00
LantaoJin 52838e74af [SPARK-13704][CORE][YARN] Reduce rack resolution time
## What changes were proposed in this pull request?

When you submit a stage on a large cluster, rack resolving takes a long time when initializing TaskSetManager because a script is invoked to resolve the rack of each host, one by one.
Based on current implementation, it takes 30~40 seconds to resolve the racks in our 5000 nodes' cluster. After applied the patch, it decreased to less than 15 seconds.

YARN-9332 has added an interface to handle multiple hosts in one invocation to save time. But before upgrading to the newest Hadoop, we could construct the same tool in Spark to resolve this issue.

## How was this patch tested?

UT and manually testing on a 5000 node cluster.

Closes #24245 from squito/SPARK-13704_update.

Lead-authored-by: LantaoJin <jinlantao@gmail.com>
Co-authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-04-08 10:47:06 -05:00
Yuming Wang 13c5c1fb4b [SPARK-27180][BUILD][YARN] Fix testing issues with yarn module in Hadoop-3
## What changes were proposed in this pull request?

Fix testing issues with `yarn` module in Hadoop-3:

1. Upgrade jersey-1 to `1.19` to fix ```Cause: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer```.
2. Copy `ServerSocketUtil` from hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java to fix ```java.lang.NoClassDefFoundError: org/apache/hadoop/net/ServerSocketUtil```.
3. Adapte `SessionHandler` from jetty-9.3.25.v20180904/jetty-server/src/main/java/org/eclipse/jetty/server/session/SessionHandler.java  to fix ```java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager```.

## How was this patch tested?

manual tests:
```shell
build/sbt yarn/test -Pyarn
build/sbt yarn/test -Phadoop-3.2 -Pyarn

build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -pl resource-managers/yarn test -Pyarn -Phadoop-3.2
```

Closes #24115 from wangyum/hadoop3-yarn.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-04-02 15:38:26 -05:00
Sean Owen d4420b455a [SPARK-27323][CORE][SQL][STREAMING] Use Single-Abstract-Method support in Scala 2.12 to simplify code
## What changes were proposed in this pull request?

Use Single Abstract Method syntax where possible (and minor related cleanup). Comments below. No logic should change here.

## How was this patch tested?

Existing tests.

Closes #24241 from srowen/SPARK-27323.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-04-02 07:37:05 -07:00
10087686 8204dc1e54 [SPARK-27141][YARN] Use ConfigEntry for hardcoded configs for Yarn
## What changes were proposed in this pull request?
There is some hardcode configs in code, I think it best to modify。

## How was this patch tested?
Existing tests

Closes #24103 from wangjiaochun/yarnHardCode.

Authored-by: 10087686 <wang.jiaochun@zte.com.cn>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-22 05:29:29 -05:00
Marcelo Vanzin ec5e34205a [SPARK-27094][YARN] Work around RackResolver swallowing thread interrupt.
To avoid the case where the YARN libraries would swallow the exception and
prevent YarnAllocator from shutting down, call the offending code in a
separate thread, so that the parent thread can respond appropriately to
the shut down.

As a safeguard, also explicitly stop the executor launch thread pool when
shutting down the application, to prevent new executors from coming up
after the application started its shutdown.

Tested with unit tests + some internal tests on real cluster.

Closes #24017 from vanzin/SPARK-27094.

Authored-by: Marcelo Vanzin <vanzin@cloudera.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-03-20 11:48:06 -07:00
Ajith c324e1da9d [SPARK-27122][CORE] Jetty classes must not be return via getters in org.apache.spark.ui.WebUI
## What changes were proposed in this pull request?

When we run YarnSchedulerBackendSuite, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler

Here in  org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler
Refer discussion  https://issues.apache.org/jira/browse/SPARK-27122?focusedCommentId=16792318&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16792318

Hence as a fix, org.apache.spark.ui.WebUI must only return a wrapper class instance or references so that Jetty classes can be avoided in getters which are accessed outside spark-core

## How was this patch tested?

Existing UT can pass

Closes #24088 from ajithme/shadebug.

Authored-by: Ajith <ajith2489@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-17 06:44:02 -05:00
Liupengcheng cad475dcc9 [SPARK-26941][YARN] Fix incorrect computation of maxNumExecutorFailures in ApplicationMaster for streaming
## What changes were proposed in this pull request?

Currently, when enabled streaming dynamic allocation for streaming applications, the maxNumExecutorFailures in ApplicationMaster is still computed with `spark.dynamicAllocation.maxExecutors`.

Actually, we should consider `spark.streaming.dynamicAllocation.maxExecutors` instead.

Related codes:
f87153a3ac/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala (L101)

## How was this patch tested?

NA

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #23845 from liupc/Fix-incorrect-maxNumExecutorFailures-for-streaming.

Lead-authored-by: Liupengcheng <liupengcheng@xiaomi.com>
Co-authored-by: liupengcheng <liupengcheng@xiaomi.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-16 19:45:05 -05:00
Yuming Wang d70b6a39e1 [MINOR][BUILD] Add 2 maven properties(hive.classifier and hive.parquet.group)
## What changes were proposed in this pull request?

This pr adds 2 maven properties to help us upgrade the built-in Hive.

| Property Name | Default | In future |
| ------ | ------ | ------ |
| hive.classifier | (none) | core |
| hive.parquet.group | com.twitter | org.apache.parquet |

## How was this patch tested?

existing tests

Closes #23996 from wangyum/add_2_maven_properties.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-07 16:46:07 -06:00
“attilapiros” caceaec932 [SPARK-26688][YARN] Provide configuration of initially blacklisted YARN nodes
## What changes were proposed in this pull request?

Introducing new config for initially blacklisted YARN nodes.

## How was this patch tested?

With existing and a new unit test.

Closes #23616 from attilapiros/SPARK-26688.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-03-04 14:14:20 -06:00