The bug was introduced in SPARK-22290, which changed how the app's user
is impersonated in the AM. The changed missed an initialization function
that needs to be run as the app owner (who has the right credentials to
read from HDFS).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#19566 from vanzin/SPARK-22341.
Hive delegation tokens are only needed when the Spark driver has no access
to the kerberos TGT. That happens only in two situations:
- when using a proxy user
- when using cluster mode without a keytab
This change modifies the Hive provider so that it only generates delegation
tokens in those situations, and tweaks the YARN AM so that it makes the proper
user visible to the Hive code when running with keytabs, so that the TGT
can be used instead of a delegation token.
The effect of this change is that now it's possible to initialize multiple,
non-concurrent SparkContext instances in the same JVM. Before, the second
invocation would fail to fetch a new Hive delegation token, which then could
make the second (or third or...) application fail once the token expired.
With this change, the TGT will be used to authenticate to the HMS instead.
This change also avoids polluting the current logged in user's credentials
when launching applications. The credentials are copied only when running
applications as a proxy user. This makes it possible to implement SPARK-11035
later, where multiple threads might be launching applications, and each app
should have its own set of credentials.
Tested by verifying HDFS and Hive access in following scenarios:
- client and cluster mode
- client and cluster mode with proxy user
- client and cluster mode with principal / keytab
- long-running cluster app with principal / keytab
- pyspark app that creates (and stops) multiple SparkContext instances
through its lifetime
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#19509 from vanzin/SPARK-22290.
## What changes were proposed in this pull request?
When the libraries temp directory(i.e. __spark_libs__*.zip dir) file system and staging dir(destination) file systems are the same then the __spark_libs__*.zip is not copying to the staging directory. But after making this decision the libraries zip file is getting deleted immediately and becoming unavailable for the Node Manager's localization.
With this change, client copies the files to remote always when the source scheme is "file".
## How was this patch tested?
I have verified it manually in yarn/cluster and yarn/client modes with hdfs and local file systems.
Author: Devaraj K <devaraj@apache.org>
Closes#19141 from devaraj-kavali/SPARK-21384.
The live listener bus now cleans up after itself and releases listeners
after stopping, so code cannot get references to listeners after the
Spark context is stopped.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#19297 from vanzin/SPARK-18838.hotfix.
…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
## What changes were proposed in this pull request?
This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
- Scalatest 2.x -> 3.0.3
- Chill 0.8.0 -> 0.8.4
- Clapper 1.0.x -> 1.1.2
- json4s 3.2.x -> 3.4.2
- Jackson 2.6.x -> 2.7.9 (required by json4s)
This change does _not_ fully enable a Scala 2.12 build:
- It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
- It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
## How was this patch tested?
Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
Author: Sean Owen <sowen@cloudera.com>
Closes#18645 from srowen/SPARK-14280.
In the current code, if NM recovery is not enabled then `YarnShuffleService` will write shuffle metadata to NM local dir-1, if this local dir-1 is on bad disk, then `YarnShuffleService` will be failed to start. So to solve this issue, in Spark side if NM recovery is not enabled, then Spark will not persist data into leveldb, in that case yarn shuffle service can still be served but lose the ability for recovery, (it is fine because the failure of NM will kill the containers as well as applications).
Tested in the local cluster with NM recovery off and on to see if folder is created or not. MiniCluster UT isn't added because in MiniCluster NM will always set port to 0, but NM recovery requires non-ephemeral port.
Author: jerryshao <sshao@hortonworks.com>
Closes#19032 from jerryshao/SPARK-17321.
Change-Id: I8f2fe73d175e2ad2c4e380caede3873e0192d027
## What changes were proposed in this pull request?
Fix typos
## How was this patch tested?
Existing tests
Author: Andrew Ash <andrew@andrewash.com>
Closes#18996 from ash211/patch-2.
## What changes were proposed in this pull request?
Add Kerberos Support to Mesos. This includes kinit and --keytab support, but does not include delegation token renewal.
## How was this patch tested?
Manually against a Secure DC/OS Apache HDFS cluster.
Author: ArtRand <arand@soe.ucsc.edu>
Author: Michael Gummelt <mgummelt@mesosphere.io>
Closes#18519 from mgummelt/SPARK-16742-kerberos.
This version fixes a few issues in the import order checker; it provides
better error messages, and detects more improper ordering (thus the need
to change a lot of files in this patch). The main fix is that it correctly
complains about the order of packages vs. classes.
As part of the above, I moved some "SparkSession" import in ML examples
inside the "$example on$" blocks; that didn't seem consistent across
different source files to start with, and avoids having to add more on/off blocks
around specific imports.
The new scalastyle also seems to have a better header detector, so a few
license headers had to be updated to match the expected indentation.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#18943 from vanzin/SPARK-21731.
The main goal of this change is to avoid the situation described
in the bug, where an AM restart in the middle of a job may cause
no new executors to be allocated because of faulty logic in the
reset path.
The change does two things:
- fixes the executor alloc manager's reset() so that it does not
stop allocation after a reset() in the middle of a job
- re-orders the initialization of the YarnAllocator class so that
it fetches the current executor ID before triggering the reset()
above.
This ensures both that the new allocator gets new requests for executors,
and that it starts from the correct executor id.
Tested with unit tests and by manually causing AM restarts while
running jobs using spark-shell in YARN mode.
Closes#17882
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: Guoqiang Li <witgo@qq.com>
Closes#18663 from vanzin/SPARK-20079.
If you run a spark job without creating the SparkSession or SparkContext, the spark job logs says it succeeded but yarn says it fails and retries 3 times. Also, since, Application Master unregisters with Resource Manager and exits successfully, it deletes the spark staging directory, so when yarn makes subsequent retries, it fails to find the staging directory and thus, the retries fail.
Added a flag to check whether user has initialized SparkContext. If it is true, we let Application Master unregister with Resource Manager else, we do not let AM unregister with RM.
## How was this patch tested?
Manually tested the fix.
Before:
<img width="1253" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28647214-69bf81e2-722b-11e7-9ed0-d416d2bf23be.png">
After:
<img width="1319" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28647220-70f9eea2-722b-11e7-85c6-e56276b15614.png">
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: pgandhi <pgandhi@yahoo-inc.com>
Author: pgandhi999 <parthkgandhi9@gmail.com>
Closes#18741 from pgandhi999/SPARK-21541.
There was some code based on the old SASL handler in the new auth client that
was incorrectly using the SASL user as the user to authenticate against the
external shuffle service. This caused the external service to not be able to
find the correct secret to authenticate the connection, failing the connection.
In the course of debugging, I found that some log messages from the YARN shuffle
service were a little noisy, so I silenced some of them, and also added a couple
of new ones that helped find this issue. On top of that, I found that a check
in the code that records app secrets was wrong, causing more log spam and also
using an O(n) operation instead of an O(1) call.
Also added a new integration suite for the YARN shuffle service with auth on,
and verified it failed before, and passes now.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#18706 from vanzin/SPARK-21494.
When NodeManagers launching Executors,
the `missing` value will exceed the
real value when the launch is slow, this can lead to YARN allocates more resource.
We add the `numExecutorsRunning` when calculate the `missing` to avoid this.
Test by experiment.
Author: DjvuLee <lihu@bytedance.com>
Closes#18651 from djvulee/YarnAllocate.
Instead of using the host's cpu count, use the number of cores allocated
for the Spark process when sizing the RPC dispatch thread pool. This avoids
creating large thread pools on large machines when the number of allocated
cores is small.
Tested by verifying number of threads with spark.executor.cores set
to 1 and 4; same thing for YARN AM.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#18639 from vanzin/SPARK-21408.
## What changes were proposed in this pull request?
In the current `YARNHadoopDelegationTokenManager`, `FileSystem` to which to get tokens are created out of KDC logged UGI, using these `FileSystem` to get new tokens will lead to exception. The main thing is that Spark code trying to get new tokens from the FS created with token auth-ed UGI, but Hadoop can only grant new tokens in kerberized UGI. To fix this issue, we should lazily create these FileSystem within KDC logged UGI.
## How was this patch tested?
Manual verification in secure cluster.
CC vanzin mgummelt please help to review, thanks!
Author: jerryshao <sshao@hortonworks.com>
Closes#18633 from jerryshao/SPARK-21411.
## What changes were proposed in this pull request?
In this issue we have a long running Spark application with secure HBase, which requires `HBaseCredentialProvider` to get tokens periodically, we specify HBase related jars with `--packages`, but these dependencies are not added into AM classpath, so when `HBaseCredentialProvider` tries to initialize HBase connections to get tokens, it will be failed.
Currently because jars specified with `--jars` or `--packages` are not added into AM classpath, the only way to extend AM classpath is to use "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
So in this fix, we proposed to use/reuse a classloader for `AMCredentialRenewer` to acquire new tokens.
Also in this patch, we fixed AM cannot get tokens from HDFS issue, it is because FileSystem is gotten before kerberos logged, so using this FS to get tokens will throw exception.
## How was this patch tested?
Manual verification.
Author: jerryshao <sshao@hortonworks.com>
Closes#18616 from jerryshao/SPARK-21377.
## What changes were proposed in this pull request?
The current code is very verbose on shutdown.
The changes I propose is to change the log level when the driver is shutting down and the RPC connections are closed (RpcEnvStoppedException).
## How was this patch tested?
Tested with word count(deploy-mode = cluster, master = yarn, num-executors = 4) with 300GB of data.
Author: John Lee <jlee2@yahoo-inc.com>
Closes#18547 from yoonlee95/SPARK-21321.
When localizing the gateway config files in a YARN application, avoid
overwriting final configs by distributing the gateway files to a separate
directory, and explicitly loading them into the Hadoop config, instead
of placing those files before the cluster's files in the classpath.
This is done by saving the gateway's config to a separate XML file
distributed with the rest of the Spark app's config, and loading that
file when creating a new config through `YarnSparkHadoopUtil`.
Tested with existing unit tests, and by verifying the behavior in a YARN
cluster (final values are not overridden, non-final values are).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#18370 from vanzin/SPARK-9825.
## What changes were proposed in this pull request?
This issue happens in long running application with yarn cluster mode, because yarn#client doesn't sync token with AM, so it will always keep the initial token, this token may be expired in the long running scenario, so when yarn#client tries to clean up staging directory after application finished, it will use this expired token and meet token expire issue.
## How was this patch tested?
Manual verification is secure cluster.
Author: jerryshao <sshao@hortonworks.com>
Closes#18617 from jerryshao/SPARK-21376.
Currently the code monitoring the launch of the client AM uses the value of
spark.yarn.report.interval as the interval for polling the RM; if someone
has that value to a really large interval, it would take that long to detect
that the client AM has started, which is not expected.
Instead, have a separate config for the interval to use when the client AM is
starting. The other config is still used in cluster mode, and to detect the
status of the client AM after it is already running.
Tested by running client and cluster mode apps with a modified value of
spark.yarn.report.interval, verifying client AM launch is detected before
that interval elapses.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#18380 from vanzin/SPARK-16019.
## What changes were proposed in this pull request?
This PR aims to bump Py4J in order to fix the following float/double bug.
Py4J 0.10.5 fixes this (https://github.com/bartdag/py4j/issues/272) and the latest Py4J is 0.10.6.
**BEFORE**
```
>>> df = spark.range(1)
>>> df.select(df['id'] + 17.133574204226083).show()
+--------------------+
|(id + 17.1335742042)|
+--------------------+
| 17.1335742042|
+--------------------+
```
**AFTER**
```
>>> df = spark.range(1)
>>> df.select(df['id'] + 17.133574204226083).show()
+-------------------------+
|(id + 17.133574204226083)|
+-------------------------+
| 17.133574204226083|
+-------------------------+
```
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#18546 from dongjoon-hyun/SPARK-21278.
## What changes were proposed in this pull request?
When I set different clusters for "spark.hadoop.fs.defaultFS" and "spark.yarn.stagingDir" as follows:
```
spark.hadoop.fs.defaultFS hdfs://tl-nn-tdw.tencent-distribute.com:54310
spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark
```
The staging dir can not be deleted, it will prompt following message:
```
java.lang.IllegalArgumentException: Wrong FS: hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310
```
## How was this patch tested?
Existing tests
Author: sharkdtu <sharkdtu@tencent.com>
Closes#18352 from sharkdtu/master.
## What changes were proposed in this pull request?
Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it. In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private. In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained.
Summary:
- Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`. Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`. Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module.
- The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations. Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`.
Old Hierarchy:
```
yarn.security.ServiceCredentialProvider (service loaded)
HadoopFSCredentialProvider
HiveCredentialProvider
HBaseCredentialProvider
yarn.security.ConfigurableCredentialManager
```
New Hierarchy:
```
HadoopDelegationTokenManager
HadoopDelegationTokenProvider (not service loaded)
HadoopFSDelegationTokenProvider
HiveDelegationTokenProvider
HBaseDelegationTokenProvider
yarn.security.ServiceCredentialProvider (service loaded)
yarn.security.YARNHadoopDelegationTokenManager
```
## How was this patch tested?
unit tests
Author: Michael Gummelt <mgummelt@mesosphere.io>
Author: Dr. Stefan Schimanski <sttts@mesosphere.io>
Closes#17723 from mgummelt/SPARK-20434-refactor-kerberos.
In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars started with "local" scheme), we will get inaccurate classpath for AM and containers. This is because we don't remove "local" scheme when concatenating classpath. It is OK to run because classpath is separated with ":" and java treat "local" as a separate jar. But we could improve it to remove the scheme.
Updated `ClientSuite` to check "local" is not in the classpath.
cc jerryshao
Author: Li Yichao <lyc@zhihu.com>
Author: Li Yichao <liyichao.good@gmail.com>
Closes#18129 from liyichao/SPARK-20365.
## What changes were proposed in this pull request?
Deleted generated JARs archive after distribution to HDFS
## How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Lior Regev <lioregev@gmail.com>
Closes#17986 from liorregev/master.
## What changes were proposed in this pull request?
After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port".
## How was this patch tested?
Existing UTs.
Author: jerryshao <sshao@hortonworks.com>
Closes#17866 from jerryshao/SPARK-20605.
## What changes were proposed in this pull request?
Currently, `spark.executor.instances` is deprecated in `spark-env.sh`, because we suggest config it in `spark-defaults.conf` or other config file. And also this parameter is useless even if you set it in `spark-env.sh`, so remove it in this patch.
## How was this patch tested?
Existing tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Xianyang Liu <xianyang.liu@intel.com>
Closes#17881 from ConeyLiu/deprecatedParam.
## What changes were proposed in this pull request?
Fix build warnings primarily related to Breeze 0.13 operator changes, Java style problems
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#17803 from srowen/SPARK-20523.
This change does a more thorough redaction of sensitive information from logs and UI
Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
`"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
`
Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
## How was this patch tested?
New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.
Author: Mark Grover <mark@apache.org>
Closes#17725 from markgrover/spark-20435.
## What changes were proposed in this pull request?
Typo fix: distitrbuted -> distributed
## How was this patch tested?
Existing tests
Author: Andrew Ash <andrew@andrewash.com>
Closes#17664 from ash211/patch-1.
## What changes were proposed in this pull request?
Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
## How was this patch tested?
Existing tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#17527 from srowen/SPARK-20156.
Current test code tries to override the RackResolver used by setting
configuration params, but because YARN libs statically initialize the
resolver the first time it's used, that means that those configs don't
really take effect during Spark tests.
This change adds a wrapper class that easily allows tests to override the
behavior of the resolver for the Spark code that uses it.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#17508 from vanzin/SPARK-20191.
## What changes were proposed in this pull request?
Currently we use system classloader to find HBase jars, if it is specified by `--jars`, then it will be failed with ClassNotFound issue. So here changing to use child classloader.
Also putting added jars and main jar into classpath of submitted application in yarn cluster mode, otherwise HBase jars specified with `--jars` will never be honored in cluster mode, and fetching tokens in client side will always be failed.
## How was this patch tested?
Unit test and local verification.
Author: jerryshao <sshao@hortonworks.com>
Closes#17388 from jerryshao/SPARK-20059.
## What changes were proposed in this pull request?
In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.
So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.
## How was this patch tested?
Local verified in secure cluster.
vanzin tgravescs mridulm dongjoon-hyun please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes#17335 from jerryshao/SPARK-19995.
## What changes were proposed in this pull request?
Adding additional information to existing logging messages:
- YarnAllocator: log the executor ID together with the container id when a container for an executor is launched.
- NettyRpcEnv: log the receiver address when there is a timeout waiting for an answer to a remote call.
- ExecutorAllocationManager: fix a typo in the logging message for the list of executors to be removed.
## How was this patch tested?
Build spark and submit the word count example to a YARN cluster using cluster mode
Author: Juan Rodriguez Hortala <hortala@amazon.com>
Closes#17411 from juanrh/logging-improvements.
Add parentheses so that both lines form a single statement; also add
a log message so that the issue becomes more explicit if it shows up
again.
Tested manually with integration test that exercises the feature.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#17198 from vanzin/SPARK-19857.
## What changes were proposed in this pull request?
self.environment will be propagated to executor. Should set PYTHONHASHSEED as long as the python version is greater than 3.3
## How was this patch tested?
Manually tested it.
Author: Jeff Zhang <zjffdu@apache.org>
Closes#11211 from zjffdu/SPARK-13330.
## What changes were proposed in this pull request?
Because yarn#client will reset the `spark.yarn.keytab` configuration to point to the location in distributed file, so if user still uses the old `SparkConf` to create `SparkSession` with Hive enabled, it will read keytab from the path in distributed cached. This is OK for yarn cluster mode, but in yarn client mode where driver is running out of container, it will be failed to fetch the keytab.
So here we should avoid reseting this configuration in the `yarn#client` and only overwriting it for AM, so using `spark.yarn.keytab` could get correct keytab path no matter running in client (keytab in local fs) or cluster (keytab in distributed cache) mode.
## How was this patch tested?
Verified in security cluster.
Author: jerryshao <sshao@hortonworks.com>
Closes#16923 from jerryshao/SPARK-19038.
Allow an application to use the History Server URL as the tracking
URL in the YARN RM, so there's still a link to the web UI somewhere
in YARN even if the driver's UI is disabled. This is useful, for
example, if an admin wants to disable the driver UI by default for
applications, since it's harder to secure it (since it involves non
trivial ssl certificate and auth management that admins may not want
to expose to user apps).
This needs to be opt-in, because of the way the YARN proxy works, so
a new configuration was added to enable the option.
The YARN RM will proxy requests to live AMs instead of redirecting
the client, so pages in the SHS UI will not render correctly since
they'll reference invalid paths in the RM UI. The proxy base support
in the SHS cannot be used since that would prevent direct access to
the SHS.
So, to solve this problem, for the feature to work end-to-end, a new
YARN-specific filter was added that detects whether the requests come
from the proxy and redirects the client appropriatly. The SHS admin has
to add this filter manually if they want the feature to work.
Tested with new unit test, and by running with the documented configuration
set in a test cluster. Also verified the driver UI is used when it's
enabled.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#16946 from vanzin/SPARK-19554.
## What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/14065, we introduced a configurable credential manager for Spark running on YARN. Also two configs `spark.yarn.credentials.renewalTime` and `spark.yarn.credentials.updateTime` were added, one is for the credential renewer and the other updater. But now we just query `spark.yarn.credentials.renewalTime` by mistake during CREDENTIALS UPDATING, where should be actually `spark.yarn.credentials.updateTime` .
This PR fixes this mistake.
## How was this patch tested?
existing test
cc jerryshao vanzin
Author: Kent Yao <yaooqinn@hotmail.com>
Closes#16955 from yaooqinn/cred_update.
## What changes were proposed in this pull request?
`askSync` is already added in `RpcEndpointRef` (see SPARK-19347 and https://github.com/apache/spark/pull/16690#issuecomment-276850068) and `askWithRetry` is marked as deprecated.
As mentioned SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
>askWithRetry is basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it.
Since `askWithRetry` is just used inside spark and not in user logic. It might make sense to replace all of them with `askSync`.
## How was this patch tested?
This PR doesn't change code logic, existing unit test can cover.
Author: jinxing <jinxing@meituan.com>
Closes#16790 from jinxing64/SPARK-19450.
- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings
For the future:
- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16871 from srowen/SPARK-19493.
## What changes were proposed in this pull request?
Due to the newly added API in Hadoop 2.6.4+, Spark builds against Hadoop 2.6.0~2.6.3 will meet compile error. So here still reverting back to use reflection to handle this issue.
## How was this patch tested?
Manual verification.
Author: jerryshao <sshao@hortonworks.com>
Closes#16884 from jerryshao/SPARK-19545.
Add back mockito test dep in YARN module, as it ends up being required in a Maven build
## How was this patch tested?
PR builder again, but also a local `mvn` run using the command that the broken Jenkins job uses
Author: Sean Owen <sowen@cloudera.com>
Closes#16853 from srowen/SPARK-19464.2.
## What changes were proposed in this pull request?
- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#16810 from srowen/SPARK-19464.
That method is prone to stack overflows when the input map is really
large; instead, use plain "map". Also includes a unit test that was
tested and caused stack overflows without the fix.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#16667 from vanzin/SPARK-18750.
## What changes were proposed in this pull request?
remove ununsed imports and outdated comments, and fix some minor code style issue.
## How was this patch tested?
existing ut
Author: uncleGen <hustyugm@gmail.com>
Closes#16591 from uncleGen/SPARK-19227.
## What changes were proposed in this pull request?
`spark.yarn.access.namenodes` configuration cannot actually reflects the usage of it, inside the code it is the Hadoop filesystems we get tokens, not NNs. So here propose to update the name of this configuration, also change the related code and doc.
## How was this patch tested?
Local verification.
Author: jerryshao <sshao@hortonworks.com>
Closes#16560 from jerryshao/SPARK-19179.
## What changes were proposed in this pull request?
#16092 moves YARN resource manager related code to resource-managers/yarn directory. The test case ```YarnSchedulerBackendSuite``` was added after that but with the wrong place. I move it to correct directory in this PR.
## How was this patch tested?
Existing test.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#16595 from yanboliang/yarn.
Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier.
## How was this patch tested?
Manually verified in security cluster.
Author: jerryshao <sshao@hortonworks.com>
Closes#16432 from jerryshao/SPARK-19021.
## What changes were proposed in this pull request?
There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
## How was this patch tested?
N/A since only docs or comments were updated.
Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
Closes#16455 from neurons/np.structure_streaming_doc.
## What changes were proposed in this pull request?
LauncherState should be only set to SUBMITTED after the application is submitted.
Currently the state is set before the application is actually submitted.
## How was this patch tested?
no test is added in this patch
Author: mingfei <mingfei.smf@alipay.com>
Closes#16459 from shimingfei/fixLauncher.
## What changes were proposed in this pull request?
The configuration `spark.yarn.security.tokens.{service}.enabled` is deprecated. Now we should use `spark.yarn.security.credentials.{service}.enabled`. Some places in the doc is not updated yet.
## How was this patch tested?
N/A. Just doc change.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#16444 from viirya/minor-credential-provider-doc.
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
Alternative to #16303.
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes#16311 from ryan-williams/tt.
## What changes were proposed in this pull request?
93cdb8a7d0 Introduced a compile error under scala 2.10, this fixes that error.
## How was this patch tested?
locally ran
```
dev/change-version-to-2.10.sh
build/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Dscala-2.10 "project yarn" "test-only *YarnAllocatorSuite"
```
(which failed at test compilation before this change)
Author: Imran Rashid <irashid@cloudera.com>
Closes#16298 from squito/blacklist-2.10.
## What changes were proposed in this pull request?
This builds upon the blacklisting introduced in SPARK-17675 to add blacklisting of executors and nodes for an entire Spark application. Resources are blacklisted based on tasks that fail, in tasksets that eventually complete successfully; they are automatically returned to the pool of active resources based on a timeout. Full details are available in a design doc attached to the jira.
## How was this patch tested?
Added unit tests, ran them via Jenkins, also ran a handful of them in a loop to check for flakiness.
The added tests include:
- verifying BlacklistTracker works correctly
- verifying TaskSchedulerImpl interacts with BlacklistTracker correctly (via a mock BlacklistTracker)
- an integration test for the entire scheduler with blacklisting in a few different scenarios
Author: Imran Rashid <irashid@cloudera.com>
Author: mwws <wei.mao@intel.com>
Closes#14079 from squito/blacklist-SPARK-8425.
## What changes were proposed in this pull request?
Fix `java.util.NoSuchElementException` when running Spark in non-hdfs security environment.
In the current code, we assume `HDFS_DELEGATION_KIND` token will be found in Credentials. But in some cloud environments, HDFS is not required, so we should avoid this exception.
## How was this patch tested?
Manually verified in local environment.
Author: jerryshao <sshao@hortonworks.com>
Closes#16265 from jerryshao/SPARK-18840.