SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
---
|
|
|
|
layout: global
|
2015-02-05 14:12:50 -05:00
|
|
|
displayTitle: Spark Security
|
|
|
|
title: Security
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
---
|
2018-03-26 15:45:45 -04:00
|
|
|
* This will become a table of contents (this text will be scraped).
|
|
|
|
{:toc}
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
# Spark RPC
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## Authentication
|
2014-08-06 03:07:40 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Spark currently supports authentication for RPC channels using a shared secret. Authentication can
|
|
|
|
be turned on by setting the `spark.authenticate` configuration parameter.
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
The exact mechanism used to generate and distribute the shared secret is deployment-specific.
|
2014-08-05 13:52:52 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
For Spark on [YARN](running-on-yarn.html) and local deployments, Spark will automatically handle
|
|
|
|
generating and distributing the shared secret. Each application will use a unique shared secret. In
|
|
|
|
the case of YARN, this feature relies on YARN RPC encryption being enabled for the distribution of
|
|
|
|
secrets to be secure.
|
2014-08-05 13:52:52 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
For other resource managers, `spark.authenticate.secret` must be configured on each of the nodes.
|
|
|
|
This secret will be shared by all the daemons and applications, so this deployment configuration is
|
2018-08-14 14:02:33 -04:00
|
|
|
not as secure as the above, especially when considering multi-tenant clusters. In this
|
|
|
|
configuration, a user with the secret can effectively impersonate any other user.
|
|
|
|
|
|
|
|
The Rest Submission Server and the MesosClusterDispatcher do not support authentication. You should
|
|
|
|
ensure that all network access to the REST API & MesosClusterDispatcher (port 6066 and 7077
|
|
|
|
respectively by default) are restricted to hosts that are trusted to submit jobs.
|
2016-01-19 17:49:55 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.authenticate</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>Whether Spark authenticates its internal connections.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.authenticate.secret</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
The secret key used authentication. See above for when this configuration should be set.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
## Encryption
|
2014-04-29 10:19:48 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Spark supports AES-based encryption for RPC connections. For encryption to be enabled, RPC
|
|
|
|
authentication must also be enabled and properly configured. AES encryption uses the
|
2018-08-21 13:02:17 -04:00
|
|
|
[Apache Commons Crypto](https://commons.apache.org/proper/commons-crypto/) library, and Spark's
|
2018-03-26 15:45:45 -04:00
|
|
|
configuration system allows access to that library's configuration for advanced users.
|
2014-08-06 03:07:40 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
There is also support for SASL-based encryption, although it should be considered deprecated. It
|
|
|
|
is still required when talking to shuffle services from Spark versions older than 2.2.0.
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
The following table describes the different options available for configuring this feature.
|
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.crypto.enabled</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Enable AES-based RPC encryption, including the new authentication protocol added in 2.2.0.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.crypto.keyLength</code></td>
|
|
|
|
<td>128</td>
|
|
|
|
<td>
|
|
|
|
The length in bits of the encryption key to generate. Valid values are 128, 192 and 256.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.crypto.keyFactoryAlgorithm</code></td>
|
|
|
|
<td>PBKDF2WithHmacSHA1</td>
|
|
|
|
<td>
|
|
|
|
The key factory algorithm to use when generating encryption keys. Should be one of the
|
|
|
|
algorithms supported by the javax.crypto.SecretKeyFactory class in the JRE being used.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.crypto.config.*</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Configuration values for the commons-crypto library, such as which cipher implementations to
|
|
|
|
use. The config name should be the name of commons-crypto configuration without the
|
|
|
|
<code>commons.crypto</code> prefix.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.crypto.saslFallback</code></td>
|
|
|
|
<td>true</td>
|
|
|
|
<td>
|
|
|
|
Whether to fall back to SASL authentication if authentication fails using Spark's internal
|
|
|
|
mechanism. This is useful when the application is connecting to old shuffle services that
|
|
|
|
do not support the internal Spark authentication protocol. On the shuffle service side,
|
|
|
|
disabling this feature will block older clients from authenticating.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.authenticate.enableSaslEncryption</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Enable SASL-based encrypted communication.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.network.sasl.serverAlwaysEncrypt</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Disable unencrypted connections for ports using SASL authentication. This will deny connections
|
|
|
|
from clients that have authentication enabled, but do not request SASL-based encryption.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
|
|
# Local Storage Encryption
|
|
|
|
|
|
|
|
Spark supports encrypting temporary data written to local disks. This covers shuffle files, shuffle
|
|
|
|
spills and data blocks stored on disk (for both caching and broadcast variables). It does not cover
|
|
|
|
encrypting output data generated by applications with APIs such as `saveAsHadoopFile` or
|
|
|
|
`saveAsTable`.
|
|
|
|
|
|
|
|
The following settings cover enabling encryption for data written to disk:
|
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.io.encryption.enabled</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Enable local disk I/O encryption. Currently supported by all modes except Mesos. It's strongly
|
|
|
|
recommended that RPC encryption be enabled when using this feature.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.io.encryption.keySizeBits</code></td>
|
|
|
|
<td>128</td>
|
|
|
|
<td>
|
|
|
|
IO encryption key size in bits. Supported values are 128, 192 and 256.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.io.encryption.keygen.algorithm</code></td>
|
|
|
|
<td>HmacSHA1</td>
|
|
|
|
<td>
|
|
|
|
The algorithm to use when generating the IO encryption key. The supported algorithms are
|
|
|
|
described in the KeyGenerator section of the Java Cryptography Architecture Standard Algorithm
|
|
|
|
Name Documentation.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.io.encryption.commons.config.*</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Configuration values for the commons-crypto library, such as which cipher implementations to
|
|
|
|
use. The config name should be the name of commons-crypto configuration without the
|
|
|
|
<code>commons.crypto</code> prefix.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
|
|
# Web UI
|
|
|
|
|
|
|
|
## Authentication and Authorization
|
|
|
|
|
2018-08-21 13:02:17 -04:00
|
|
|
Enabling authentication for the Web UIs is done using [javax servlet filters](https://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html).
|
2018-03-26 15:45:45 -04:00
|
|
|
You will need a filter that implements the authentication method you want to deploy. Spark does not
|
|
|
|
provide any built-in authentication filters.
|
|
|
|
|
|
|
|
Spark also supports access control to the UI when an authentication filter is present. Each
|
|
|
|
application can be configured with its own separate access control lists (ACLs). Spark
|
|
|
|
differentiates between "view" permissions (who is allowed to see the application's UI), and "modify"
|
|
|
|
permissions (who can do things like kill jobs in a running application).
|
|
|
|
|
|
|
|
ACLs can be configured for either users or groups. Configuration entries accept comma-separated
|
|
|
|
lists as input, meaning multiple users or groups can be given the desired privileges. This can be
|
|
|
|
used if you run on a shared cluster and have a set of administrators or developers who need to
|
|
|
|
monitor applications they may not have started themselves. A wildcard (`*`) added to specific ACL
|
2018-06-22 13:14:12 -04:00
|
|
|
means that all users will have the respective privilege. By default, only the user submitting the
|
2018-03-26 15:45:45 -04:00
|
|
|
application is added to the ACLs.
|
|
|
|
|
|
|
|
Group membership is established by using a configurable group mapping provider. The mapper is
|
|
|
|
configured using the <code>spark.user.groups.mapping</code> config option, described in the table
|
|
|
|
below.
|
|
|
|
|
|
|
|
The following options control the authentication of Web UIs:
|
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.filters</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
See the <a href="configuration.html#spark-ui">Spark UI</a> configuration for how to configure
|
|
|
|
filters.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.acls.enable</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Whether UI ACLs should be enabled. If enabled, this checks to see if the user has access
|
|
|
|
permissions to view or modify the application. Note this requires the user to be authenticated,
|
|
|
|
so if no authentication filter is installed, this option does not do anything.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.admin.acls</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of users that have view and modify access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.admin.acls.groups</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of groups that have view and modify access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.modify.acls</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of users that have modify access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.modify.acls.groups</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of groups that have modify access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.view.acls</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of users that have view access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.view.acls.groups</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma-separated list of groups that have view access to the Spark application.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.user.groups.mapping</code></td>
|
|
|
|
<td><code>org.apache.spark.security.ShellBasedGroupsMappingProvider</code></td>
|
|
|
|
<td>
|
|
|
|
The list of groups for a user is determined by a group mapping service defined by the trait
|
|
|
|
<code>org.apache.spark.security.GroupMappingServiceProvider</code>, which can be configured by
|
|
|
|
this property.
|
|
|
|
|
|
|
|
<br />By default, a Unix shell-based implementation is used, which collects this information
|
|
|
|
from the host OS.
|
|
|
|
|
|
|
|
<br /><em>Note:</em> This implementation supports only Unix/Linux-based environments.
|
|
|
|
Windows environment is currently <b>not</b> supported. However, a new platform/protocol can
|
|
|
|
be supported by implementing the trait mentioned above.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
On YARN, the view and modify ACLs are provided to the YARN service when submitting applications, and
|
|
|
|
control who has the respective privileges via YARN interfaces.
|
|
|
|
|
|
|
|
## Spark History Server ACLs
|
2015-02-02 20:18:54 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Authentication for the SHS Web UI is enabled the same way as for regular applications, using
|
|
|
|
servlet filters.
|
2015-09-21 16:15:44 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
To enable authorization in the SHS, a few extra options are used:
|
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
2018-08-01 22:22:52 -04:00
|
|
|
<td><code>spark.history.ui.acls.enable</code></td>
|
2018-03-26 15:45:45 -04:00
|
|
|
<td>false</td>
|
|
|
|
<td>
|
|
|
|
Specifies whether ACLs should be checked to authorize users viewing the applications in
|
|
|
|
the history server. If enabled, access control checks are performed regardless of what the
|
|
|
|
individual applications had set for <code>spark.ui.acls.enable</code>. The application owner
|
|
|
|
will always have authorization to view their own application and any users specified via
|
|
|
|
<code>spark.ui.view.acls</code> and groups specified via <code>spark.ui.view.acls.groups</code>
|
|
|
|
when the application was run will also have authorization to view that application.
|
|
|
|
If disabled, no access control checks are made for any application UIs available through
|
|
|
|
the history server.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
2018-08-01 22:22:52 -04:00
|
|
|
<td><code>spark.history.ui.admin.acls</code></td>
|
2018-03-26 15:45:45 -04:00
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma separated list of users that have view access to all the Spark applications in history
|
|
|
|
server.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
2018-08-01 22:22:52 -04:00
|
|
|
<td><code>spark.history.ui.admin.acls.groups</code></td>
|
2018-03-26 15:45:45 -04:00
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Comma separated list of groups that have view access to all the Spark applications in history
|
|
|
|
server.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
The SHS uses the same options to configure the group mapping provider as regular applications.
|
|
|
|
In this case, the group mapping provider will apply to all UIs server by the SHS, and individual
|
|
|
|
application configurations will be ignored.
|
|
|
|
|
|
|
|
## SSL Configuration
|
2015-09-21 16:15:44 -04:00
|
|
|
|
2016-01-19 17:49:55 -05:00
|
|
|
Configuration for SSL is organized hierarchically. The user can configure the default SSL settings
|
|
|
|
which will be used for all the supported communication protocols unless they are overwritten by
|
|
|
|
protocol-specific settings. This way the user can easily provide the common settings for all the
|
2018-03-26 15:45:45 -04:00
|
|
|
protocols without disabling the ability to configure each one individually. The following table
|
|
|
|
describes the the SSL configuration namespaces:
|
2016-01-19 17:49:55 -05:00
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr>
|
|
|
|
<th>Config Namespace</th>
|
|
|
|
<th>Component</th>
|
|
|
|
</tr>
|
2018-03-26 15:45:45 -04:00
|
|
|
<tr>
|
|
|
|
<td><code>spark.ssl</code></td>
|
|
|
|
<td>
|
|
|
|
The default SSL configuration. These values will apply to all namespaces below, unless
|
|
|
|
explicitly overridden at the namespace level.
|
|
|
|
</td>
|
|
|
|
</tr>
|
2016-01-19 17:49:55 -05:00
|
|
|
<tr>
|
|
|
|
<td><code>spark.ssl.ui</code></td>
|
|
|
|
<td>Spark application Web UI</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ssl.standalone</code></td>
|
|
|
|
<td>Standalone Master / Worker Web UI</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ssl.historyServer</code></td>
|
|
|
|
<td>History Server Web UI</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2015-02-02 20:18:54 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
The full breakdown of available SSL options can be found below. The `${ns}` placeholder should be
|
|
|
|
replaced with one of the above namespaces.
|
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.enabled</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>Enables SSL. When enabled, <code>${ns}.ssl.protocol</code> is required.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.port</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
The port where the SSL service will listen on.
|
|
|
|
|
|
|
|
<br />The port must be defined within a specific namespace configuration. The default
|
|
|
|
namespace is ignored when reading this configuration.
|
|
|
|
|
|
|
|
<br />When not set, the SSL port will be derived from the non-SSL port for the
|
|
|
|
same service. A value of "0" will make the service bind to an ephemeral port.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.enabledAlgorithms</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
2018-04-06 01:37:08 -04:00
|
|
|
A comma-separated list of ciphers. The specified ciphers must be supported by JVM.
|
2018-03-26 15:45:45 -04:00
|
|
|
|
|
|
|
<br />The reference list of protocols can be found in the "JSSE Cipher Suite Names" section
|
|
|
|
of the Java security guide. The list for Java 8 can be found at
|
|
|
|
<a href="https://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites">this</a>
|
|
|
|
page.
|
|
|
|
|
|
|
|
<br />Note: If not set, the default cipher suite for the JRE will be used.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.keyPassword</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
The password to the private key in the key store.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.keyStore</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Path to the key store file. The path can be absolute or relative to the directory in which the
|
|
|
|
process is started.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.keyStorePassword</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>Password to the key store.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.keyStoreType</code></td>
|
|
|
|
<td>JKS</td>
|
|
|
|
<td>The type of the key store.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.protocol</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
TLS protocol to use. The protocol must be supported by JVM.
|
|
|
|
|
|
|
|
<br />The reference list of protocols can be found in the "Additional JSSE Standard Names"
|
|
|
|
section of the Java security guide. For Java 8, the list can be found at
|
|
|
|
<a href="https://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#jssenames">this</a>
|
|
|
|
page.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.needClientAuth</code></td>
|
|
|
|
<td>false</td>
|
|
|
|
<td>Whether to require client authentication.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.trustStore</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Path to the trust store file. The path can be absolute or relative to the directory in which
|
|
|
|
the process is started.
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.trustStorePassword</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>Password for the trust store.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>${ns}.trustStoreType</code></td>
|
|
|
|
<td>JKS</td>
|
|
|
|
<td>The type of the trust store.</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
2018-06-22 13:14:12 -04:00
|
|
|
Spark also supports retrieving `${ns}.keyPassword`, `${ns}.keyStorePassword` and `${ns}.trustStorePassword` from
|
|
|
|
[Hadoop Credential Providers](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
|
|
|
|
User could store password into credential file and make it accessible by different components, like:
|
|
|
|
|
|
|
|
```
|
|
|
|
hadoop credential create spark.ssl.keyPassword -value password \
|
|
|
|
-provider jceks://hdfs@nn1.example.com:9001/user/backup/ssl.jceks
|
|
|
|
```
|
|
|
|
|
|
|
|
To configure the location of the credential provider, set the `hadoop.security.credential.provider.path`
|
|
|
|
config option in the Hadoop configuration used by Spark, like:
|
|
|
|
|
|
|
|
```
|
|
|
|
<property>
|
|
|
|
<name>hadoop.security.credential.provider.path</name>
|
|
|
|
<value>jceks://hdfs@nn1.example.com:9001/user/backup/ssl.jceks</value>
|
|
|
|
</property>
|
|
|
|
```
|
|
|
|
|
|
|
|
Or via SparkConf "spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs@nn1.example.com:9001/user/backup/ssl.jceks".
|
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## Preparing the key stores
|
|
|
|
|
|
|
|
Key stores can be generated by `keytool` program. The reference documentation for this tool for
|
|
|
|
Java 8 is [here](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html).
|
|
|
|
The most basic steps to configure the key stores and the trust store for a Spark Standalone
|
|
|
|
deployment mode is as follows:
|
|
|
|
|
|
|
|
* Generate a key pair for each node
|
|
|
|
* Export the public key of the key pair to a file on each node
|
|
|
|
* Import all exported public keys into a single trust store
|
|
|
|
* Distribute the trust store to the cluster nodes
|
2015-02-02 20:18:54 -05:00
|
|
|
|
|
|
|
### YARN mode
|
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
To provide a local trust store or key store file to drivers running in cluster mode, they can be
|
|
|
|
distributed with the application using the `--files` command line argument (or the equivalent
|
|
|
|
`spark.files` configuration). The files will be placed on the driver's working directory, so the TLS
|
|
|
|
configuration should just reference the file name with no absolute path.
|
|
|
|
|
|
|
|
Distributing local key stores this way may require the files to be staged in HDFS (or other similar
|
2018-08-21 13:02:17 -04:00
|
|
|
distributed file system used by the cluster), so it's recommended that the underlying file system be
|
2018-03-26 15:45:45 -04:00
|
|
|
configured with security in mind (e.g. by enabling authentication and wire encryption).
|
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS
Take 2. Does the same thing as #4688, but fixes Hadoop-1 build.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes #5823 from harishreedharan/kerberos-longrunning and squashes the following commits:
3c86bba [Hari Shreedharan] Import fixes. Import postfixOps explicitly.
4d04301 [Hari Shreedharan] Minor formatting fixes.
b5e7a72 [Hari Shreedharan] Remove reflection, use a method in SparkHadoopUtil to update the token renewer.
7bff6e9 [Hari Shreedharan] Make sure all required classes are present in the jar. Fix import order.
e851f70 [Hari Shreedharan] Move the ExecutorDelegationTokenRenewer to yarn module. Use reflection to use it.
36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
62c45ce [Hari Shreedharan] Relogin from keytab periodically.
fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
f4fd711 [Hari Shreedharan] Fix SparkConf usage.
2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
2015-05-01 16:32:09 -04:00
|
|
|
|
2015-02-02 20:18:54 -05:00
|
|
|
### Standalone mode
|
2018-03-26 15:45:45 -04:00
|
|
|
|
|
|
|
The user needs to provide key stores and configuration options for master and workers. They have to
|
|
|
|
be set by attaching appropriate Java system properties in `SPARK_MASTER_OPTS` and in
|
|
|
|
`SPARK_WORKER_OPTS` environment variables, or just in `SPARK_DAEMON_JAVA_OPTS`.
|
|
|
|
|
|
|
|
The user may allow the executors to use the SSL settings inherited from the worker process. That
|
|
|
|
can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. In that case, the settings
|
|
|
|
provided by the user on the client side are not used.
|
2015-02-02 20:18:54 -05:00
|
|
|
|
2017-08-31 13:58:13 -04:00
|
|
|
### Mesos mode
|
2018-08-01 22:22:52 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Mesos 1.3.0 and newer supports `Secrets` primitives as both file-based and environment based
|
|
|
|
secrets. Spark allows the specification of file-based and environment variable based secrets with
|
|
|
|
`spark.mesos.driver.secret.filenames` and `spark.mesos.driver.secret.envkeys`, respectively.
|
2017-08-31 13:58:13 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Depending on the secret store backend secrets can be passed by reference or by value with the
|
|
|
|
`spark.mesos.driver.secret.names` and `spark.mesos.driver.secret.values` configuration properties,
|
|
|
|
respectively.
|
2017-01-20 09:15:18 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Reference type secrets are served by the secret store and referred to by name, for example
|
|
|
|
`/mysecret`. Value type secrets are passed on the command line and translated into their
|
|
|
|
appropriate files or environment variables.
|
2015-02-02 20:18:54 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## HTTP Security Headers
|
2015-09-21 16:15:44 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
Apache Spark can be configured to include HTTP headers to aid in preventing Cross Site Scripting
|
|
|
|
(XSS), Cross-Frame Scripting (XFS), MIME-Sniffing, and also to enforce HTTP Strict Transport
|
|
|
|
Security.
|
2015-09-21 16:15:44 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.xXssProtection</code></td>
|
|
|
|
<td><code>1; mode=block</code></td>
|
|
|
|
<td>
|
|
|
|
Value for HTTP X-XSS-Protection response header. You can choose appropriate value
|
|
|
|
from below:
|
|
|
|
<ul>
|
|
|
|
<li><code>0</code> (Disables XSS filtering)</li>
|
|
|
|
<li><code>1</code> (Enables XSS filtering. If a cross-site scripting attack is detected,
|
|
|
|
the browser will sanitize the page.)</li>
|
|
|
|
<li><code>1; mode=block</code> (Enables XSS filtering. The browser will prevent rendering
|
|
|
|
of the page if an attack is detected.)</li>
|
|
|
|
</ul>
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.xContentTypeOptions.enabled</code></td>
|
|
|
|
<td><code>true</code></td>
|
|
|
|
<td>
|
|
|
|
When enabled, X-Content-Type-Options HTTP response header will be set to "nosniff".
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.ui.strictTransportSecurity</code></td>
|
|
|
|
<td>None</td>
|
|
|
|
<td>
|
|
|
|
Value for HTTP Strict Transport Security (HSTS) Response Header. You can choose appropriate
|
|
|
|
value from below and set <code>expire-time</code> accordingly. This option is only used when
|
|
|
|
SSL/TLS is enabled.
|
|
|
|
<ul>
|
|
|
|
<li><code>max-age=<expire-time></code></li>
|
|
|
|
<li><code>max-age=<expire-time>; includeSubDomains</code></li>
|
|
|
|
<li><code>max-age=<expire-time>; preload</code></li>
|
|
|
|
</ul>
|
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2015-09-21 16:15:44 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
|
|
|
|
# Configuring Ports for Network Security
|
2014-08-06 03:07:40 -04:00
|
|
|
|
2018-08-01 22:22:52 -04:00
|
|
|
Generally speaking, a Spark cluster and its services are not deployed on the public internet.
|
|
|
|
They are generally private services, and should only be accessible within the network of the
|
|
|
|
organization that deploys Spark. Access to the hosts and ports used by Spark services should
|
|
|
|
be limited to origin hosts that need to access the services.
|
|
|
|
|
|
|
|
Below are the primary ports that Spark uses for its communication and how to
|
2014-08-06 03:07:40 -04:00
|
|
|
configure those ports.
|
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## Standalone mode only
|
2014-08-06 03:07:40 -04:00
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr>
|
|
|
|
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
|
|
|
|
Setting</th><th>Notes</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Browser</td>
|
|
|
|
<td>Standalone Master</td>
|
|
|
|
<td>8080</td>
|
|
|
|
<td>Web UI</td>
|
|
|
|
<td><code>spark.master.ui.port /<br> SPARK_MASTER_WEBUI_PORT</code></td>
|
|
|
|
<td>Jetty-based. Standalone mode only.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Browser</td>
|
|
|
|
<td>Standalone Worker</td>
|
|
|
|
<td>8081</td>
|
|
|
|
<td>Web UI</td>
|
|
|
|
<td><code>spark.worker.ui.port /<br> SPARK_WORKER_WEBUI_PORT</code></td>
|
|
|
|
<td>Jetty-based. Standalone mode only.</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Driver /<br> Standalone Worker</td>
|
|
|
|
<td>Standalone Master</td>
|
|
|
|
<td>7077</td>
|
|
|
|
<td>Submit job to cluster /<br> Join cluster</td>
|
|
|
|
<td><code>SPARK_MASTER_PORT</code></td>
|
2016-01-23 00:20:04 -05:00
|
|
|
<td>Set to "0" to choose a port randomly. Standalone mode only.</td>
|
2014-08-06 03:07:40 -04:00
|
|
|
</tr>
|
2018-08-01 22:22:52 -04:00
|
|
|
<tr>
|
|
|
|
<td>External Service</td>
|
|
|
|
<td>Standalone Master</td>
|
|
|
|
<td>6066</td>
|
|
|
|
<td>Submit job to cluster via REST API</td>
|
|
|
|
<td><code>spark.master.rest.port</code></td>
|
|
|
|
<td>Use <code>spark.master.rest.enabled</code> to enable/disable this service. Standalone mode only.</td>
|
|
|
|
</tr>
|
2014-08-06 03:07:40 -04:00
|
|
|
<tr>
|
|
|
|
<td>Standalone Master</td>
|
|
|
|
<td>Standalone Worker</td>
|
|
|
|
<td>(random)</td>
|
|
|
|
<td>Schedule executors</td>
|
|
|
|
<td><code>SPARK_WORKER_PORT</code></td>
|
2016-01-23 00:20:04 -05:00
|
|
|
<td>Set to "0" to choose a port randomly. Standalone mode only.</td>
|
2014-08-06 03:07:40 -04:00
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## All cluster managers
|
2014-08-06 03:07:40 -04:00
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr>
|
|
|
|
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
|
|
|
|
Setting</th><th>Notes</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Browser</td>
|
|
|
|
<td>Application</td>
|
|
|
|
<td>4040</td>
|
|
|
|
<td>Web UI</td>
|
|
|
|
<td><code>spark.ui.port</code></td>
|
|
|
|
<td>Jetty-based</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Browser</td>
|
|
|
|
<td>History Server</td>
|
|
|
|
<td>18080</td>
|
|
|
|
<td>Web UI</td>
|
|
|
|
<td><code>spark.history.ui.port</code></td>
|
|
|
|
<td>Jetty-based</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Executor /<br> Standalone Master</td>
|
|
|
|
<td>Driver</td>
|
|
|
|
<td>(random)</td>
|
|
|
|
<td>Connect to application /<br> Notify executor state changes</td>
|
|
|
|
<td><code>spark.driver.port</code></td>
|
2016-01-23 00:20:04 -05:00
|
|
|
<td>Set to "0" to choose a port randomly.</td>
|
2014-08-06 03:07:40 -04:00
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Executor / Driver</td>
|
|
|
|
<td>Executor / Driver</td>
|
|
|
|
<td>(random)</td>
|
|
|
|
<td>Block Manager port</td>
|
|
|
|
<td><code>spark.blockManager.port</code></td>
|
|
|
|
<td>Raw socket via ServerSocketChannel</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2017-10-19 03:33:14 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
# Kerberos
|
|
|
|
|
|
|
|
Spark supports submitting applications in environments that use Kerberos for authentication.
|
|
|
|
In most cases, Spark relies on the credentials of the current logged in user when authenticating
|
|
|
|
to Kerberos-aware services. Such credentials can be obtained by logging in to the configured KDC
|
|
|
|
with tools like `kinit`.
|
|
|
|
|
|
|
|
When talking to Hadoop-based services, Spark needs to obtain delegation tokens so that non-local
|
|
|
|
processes can authenticate. Spark ships with support for HDFS and other Hadoop file systems, Hive
|
|
|
|
and HBase.
|
|
|
|
|
|
|
|
When using a Hadoop filesystem (such HDFS or WebHDFS), Spark will acquire the relevant tokens
|
|
|
|
for the service hosting the user's home directory.
|
|
|
|
|
|
|
|
An HBase token will be obtained if HBase is in the application's classpath, and the HBase
|
|
|
|
configuration has Kerberos authentication turned (`hbase.security.authentication=kerberos`).
|
|
|
|
|
|
|
|
Similarly, a Hive token will be obtained if Hive is in the classpath, and the configuration includes
|
|
|
|
URIs for remote metastore services (`hive.metastore.uris` is not empty).
|
|
|
|
|
|
|
|
Delegation token support is currently only supported in YARN and Mesos modes. Consult the
|
|
|
|
deployment-specific page for more information.
|
|
|
|
|
|
|
|
The following options provides finer-grained control for this feature:
|
2017-10-19 03:33:14 -04:00
|
|
|
|
|
|
|
<table class="table">
|
|
|
|
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
|
|
|
|
<tr>
|
2018-03-26 15:45:45 -04:00
|
|
|
<td><code>spark.security.credentials.${service}.enabled</code></td>
|
2017-10-19 03:33:14 -04:00
|
|
|
<td><code>true</code></td>
|
|
|
|
<td>
|
2018-03-26 15:45:45 -04:00
|
|
|
Controls whether to obtain credentials for services when security is enabled.
|
|
|
|
By default, credentials for all supported services are retrieved when those services are
|
|
|
|
configured, but it's possible to disable that behavior if it somehow conflicts with the
|
|
|
|
application being run.
|
2017-10-19 03:33:14 -04:00
|
|
|
</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets
resubmit pull request. was https://github.com/apache/incubator-spark/pull/332.
Author: Thomas Graves <tgraves@apache.org>
Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits:
dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser
05eebed [Thomas Graves] Fix dependency lost in upmerge
d1040ec [Thomas Graves] Fix up various imports
05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase
13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests.
4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets
2f77147 [Thomas Graves] Rework from comments
50dd9f2 [Thomas Graves] fix header in SecurityManager
ecbfb65 [Thomas Graves] Fix spacing and formatting
b514bec [Thomas Graves] Fix reference to config
ed3d1c1 [Thomas Graves] Add security.md
6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments
2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework
5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils
f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
## Long-Running Applications
|
|
|
|
|
|
|
|
Long-running applications may run into issues if their run time exceeds the maximum delegation
|
|
|
|
token lifetime configured in services it needs to access.
|
|
|
|
|
|
|
|
Spark supports automatically creating new tokens for these applications when running in YARN mode.
|
|
|
|
Kerberos credentials need to be provided to the Spark application via the `spark-submit` command,
|
|
|
|
using the `--principal` and `--keytab` parameters.
|
|
|
|
|
|
|
|
The provided keytab will be copied over to the machine running the Application Master via the Hadoop
|
|
|
|
Distributed Cache. For this reason, it's strongly recommended that both YARN and HDFS be secured
|
|
|
|
with encryption, at least.
|
|
|
|
|
|
|
|
The Kerberos login will be periodically renewed using the provided credentials, and new delegation
|
|
|
|
tokens for supported will be created.
|
|
|
|
|
|
|
|
|
|
|
|
# Event Logging
|
|
|
|
|
|
|
|
If your applications are using event logging, the directory where the event logs go
|
|
|
|
(`spark.eventLog.dir`) should be manually created with proper permissions. To secure the log files,
|
|
|
|
the directory permissions should be set to `drwxrwxrwxt`. The owner and group of the directory
|
|
|
|
should correspond to the super user who is running the Spark History Server.
|
2017-10-19 03:33:14 -04:00
|
|
|
|
2018-03-26 15:45:45 -04:00
|
|
|
This will allow all users to write to the directory but will prevent unprivileged users from
|
|
|
|
reading, removing or renaming a file unless they own it. The event log files will be created by
|
|
|
|
Spark with permissions such that only the user and group have read and write access.
|