--- layout: global displayTitle: Spark Security title: Security license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --- * This will become a table of contents (this text will be scraped). {:toc} # Spark Security: Things You Need To Know Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications from running on the cluster. Spark supports multiple deployments types and each one supports different levels of security. Not all deployment types will be secure in all environments and none are secure by default. Be sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure your Spark deployment. There are many different types of security concerns. Spark does not necessarily protect against all things. Listed below are some of the things Spark supports. Also check the deployment documentation for the type of deployment you are using for deployment specific settings. Anything not documented, Spark does not support. # Spark RPC (Communication protocol between Spark processes) ## Authentication Spark currently supports authentication for RPC channels using a shared secret. Authentication can be turned on by setting the `spark.authenticate` configuration parameter. The exact mechanism used to generate and distribute the shared secret is deployment-specific. Unless specified below, the secret must be defined by setting the `spark.authenticate.secret` config option. The same secret is shared by all Spark applications and daemons in that case, which limits the security of these deployments, especially on multi-tenant clusters. The REST Submission Server and the MesosClusterDispatcher do not support authentication. You should ensure that all network access to the REST API & MesosClusterDispatcher (port 6066 and 7077 respectively by default) are restricted to hosts that are trusted to submit jobs. ### YARN For Spark on [YARN](running-on-yarn.html), Spark will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret. In the case of YARN, this feature relies on YARN RPC encryption being enabled for the distribution of secrets to be secure. ### Kubernetes On Kubernetes, Spark will also automatically generate an authentication secret unique to each application. The secret is propagated to executor pods using environment variables. This means that any user that can list pods in the namespace where the Spark application is running can also see their authentication secret. Access control rules should be properly set up by the Kubernetes admin to ensure that Spark authentication is secure.
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.authenticate |
false | Whether Spark authenticates its internal connections. | 1.0.0 |
spark.authenticate.secret |
None | The secret key used authentication. See above for when this configuration should be set. | 1.0.0 |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.authenticate.secret.file |
None | Path pointing to the secret key to use for securing connections. Ensure that the contents of the file have been securely generated. This file is loaded on both the driver and the executors unless other settings override this (see below). | 3.0.0 |
spark.authenticate.secret.driver.file |
The value of spark.authenticate.secret.file |
When specified, overrides the location that the Spark driver reads to load the secret.
Useful when in client mode, when the location of the secret file may differ in the pod versus
the node the driver is running in. When this is specified,
spark.authenticate.secret.executor.file must be specified so that the driver
and the executors can both use files to load the secret key. Ensure that the contents of the file
on the driver is identical to the contents of the file on the executors.
|
3.0.0 |
spark.authenticate.secret.executor.file |
The value of spark.authenticate.secret.file |
When specified, overrides the location that the Spark executors read to load the secret.
Useful in client mode, when the location of the secret file may differ in the pod versus
the node the driver is running in. When this is specified,
spark.authenticate.secret.driver.file must be specified so that the driver
and the executors can both use files to load the secret key. Ensure that the contents of the file
on the driver is identical to the contents of the file on the executors.
|
3.0.0 |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.network.crypto.enabled |
false | Enable AES-based RPC encryption, including the new authentication protocol added in 2.2.0. | 2.2.0 |
spark.network.crypto.keyLength |
128 | The length in bits of the encryption key to generate. Valid values are 128, 192 and 256. | 2.2.0 |
spark.network.crypto.keyFactoryAlgorithm |
PBKDF2WithHmacSHA1 | The key factory algorithm to use when generating encryption keys. Should be one of the algorithms supported by the javax.crypto.SecretKeyFactory class in the JRE being used. | 2.2.0 |
spark.network.crypto.config.* |
None |
Configuration values for the commons-crypto library, such as which cipher implementations to
use. The config name should be the name of commons-crypto configuration without the
commons.crypto prefix.
|
2.2.0 |
spark.network.crypto.saslFallback |
true | Whether to fall back to SASL authentication if authentication fails using Spark's internal mechanism. This is useful when the application is connecting to old shuffle services that do not support the internal Spark authentication protocol. On the shuffle service side, disabling this feature will block older clients from authenticating. | 2.2.0 |
spark.authenticate.enableSaslEncryption |
false | Enable SASL-based encrypted communication. | 2.2.0 |
spark.network.sasl.serverAlwaysEncrypt |
false | Disable unencrypted connections for ports using SASL authentication. This will deny connections from clients that have authentication enabled, but do not request SASL-based encryption. | 1.4.0 |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.io.encryption.enabled |
false | Enable local disk I/O encryption. Currently supported by all modes except Mesos. It's strongly recommended that RPC encryption be enabled when using this feature. | 2.1.0 |
spark.io.encryption.keySizeBits |
128 | IO encryption key size in bits. Supported values are 128, 192 and 256. | 2.1.0 |
spark.io.encryption.keygen.algorithm |
HmacSHA1 | The algorithm to use when generating the IO encryption key. The supported algorithms are described in the KeyGenerator section of the Java Cryptography Architecture Standard Algorithm Name Documentation. | 2.1.0 |
spark.io.encryption.commons.config.* |
None |
Configuration values for the commons-crypto library, such as which cipher implementations to
use. The config name should be the name of commons-crypto configuration without the
commons.crypto prefix.
|
2.1.0 |
spark.user.groups.mapping
config option, described in the table
below.
The following options control the authentication of Web UIs:
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.ui.filters |
None | See the Spark UI configuration for how to configure filters. | 1.0.0 |
spark.acls.enable |
false | Whether UI ACLs should be enabled. If enabled, this checks to see if the user has access permissions to view or modify the application. Note this requires the user to be authenticated, so if no authentication filter is installed, this option does not do anything. | 1.1.0 |
spark.admin.acls |
None | Comma-separated list of users that have view and modify access to the Spark application. | 1.1.0 |
spark.admin.acls.groups |
None | Comma-separated list of groups that have view and modify access to the Spark application. | 2.0.0 |
spark.modify.acls |
None | Comma-separated list of users that have modify access to the Spark application. | 1.1.0 |
spark.modify.acls.groups |
None | Comma-separated list of groups that have modify access to the Spark application. | 2.0.0 |
spark.ui.view.acls |
None | Comma-separated list of users that have view access to the Spark application. | 1.0.0 |
spark.ui.view.acls.groups |
None | Comma-separated list of groups that have view access to the Spark application. | 2.0.0 |
spark.user.groups.mapping |
org.apache.spark.security.ShellBasedGroupsMappingProvider |
The list of groups for a user is determined by a group mapping service defined by the trait
org.apache.spark.security.GroupMappingServiceProvider , which can be configured by
this property.
By default, a Unix shell-based implementation is used, which collects this information from the host OS. Note: This implementation supports only Unix/Linux-based environments. Windows environment is currently not supported. However, a new platform/protocol can be supported by implementing the trait mentioned above. |
2.0.0 |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.history.ui.acls.enable |
false |
Specifies whether ACLs should be checked to authorize users viewing the applications in
the history server. If enabled, access control checks are performed regardless of what the
individual applications had set for spark.ui.acls.enable . The application owner
will always have authorization to view their own application and any users specified via
spark.ui.view.acls and groups specified via spark.ui.view.acls.groups
when the application was run will also have authorization to view that application.
If disabled, no access control checks are made for any application UIs available through
the history server.
|
1.0.1 |
spark.history.ui.admin.acls |
None | Comma separated list of users that have view access to all the Spark applications in history server. | 2.1.1 |
spark.history.ui.admin.acls.groups |
None | Comma separated list of groups that have view access to all the Spark applications in history server. | 2.1.1 |
Config Namespace | Component |
---|---|
spark.ssl |
The default SSL configuration. These values will apply to all namespaces below, unless explicitly overridden at the namespace level. |
spark.ssl.ui |
Spark application Web UI |
spark.ssl.standalone |
Standalone Master / Worker Web UI |
spark.ssl.historyServer |
History Server Web UI |
Property Name | Default | Meaning |
---|---|---|
${ns}.enabled |
false | Enables SSL. When enabled, ${ns}.ssl.protocol is required. |
${ns}.port |
None |
The port where the SSL service will listen on.
The port must be defined within a specific namespace configuration. The default namespace is ignored when reading this configuration. When not set, the SSL port will be derived from the non-SSL port for the same service. A value of "0" will make the service bind to an ephemeral port. |
${ns}.enabledAlgorithms |
None |
A comma-separated list of ciphers. The specified ciphers must be supported by JVM.
The reference list of protocols can be found in the "JSSE Cipher Suite Names" section of the Java security guide. The list for Java 8 can be found at this page. Note: If not set, the default cipher suite for the JRE will be used. |
${ns}.keyPassword |
None | The password to the private key in the key store. |
${ns}.keyStore |
None | Path to the key store file. The path can be absolute or relative to the directory in which the process is started. |
${ns}.keyStorePassword |
None | Password to the key store. |
${ns}.keyStoreType |
JKS | The type of the key store. |
${ns}.protocol |
None |
TLS protocol to use. The protocol must be supported by JVM.
The reference list of protocols can be found in the "Additional JSSE Standard Names" section of the Java security guide. For Java 8, the list can be found at this page. |
${ns}.needClientAuth |
false | Whether to require client authentication. |
${ns}.trustStore |
None | Path to the trust store file. The path can be absolute or relative to the directory in which the process is started. |
${ns}.trustStorePassword |
None | Password for the trust store. |
${ns}.trustStoreType |
JKS | The type of the trust store. |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.ui.xXssProtection |
1; mode=block |
Value for HTTP X-XSS-Protection response header. You can choose appropriate value
from below:
|
2.3.0 |
spark.ui.xContentTypeOptions.enabled |
true |
When enabled, X-Content-Type-Options HTTP response header will be set to "nosniff". | 2.3.0 |
spark.ui.strictTransportSecurity |
None |
Value for HTTP Strict Transport Security (HSTS) Response Header. You can choose appropriate
value from below and set expire-time accordingly. This option is only used when
SSL/TLS is enabled.
|
2.3.0 |
From | To | Default Port | Purpose | Configuration Setting | Notes |
---|---|---|---|---|---|
Browser | Standalone Master | 8080 | Web UI | spark.master.ui.port / |
Jetty-based. Standalone mode only. |
Browser | Standalone Worker | 8081 | Web UI | spark.worker.ui.port / |
Jetty-based. Standalone mode only. |
Driver / Standalone Worker |
Standalone Master | 7077 | Submit job to cluster / Join cluster |
SPARK_MASTER_PORT |
Set to "0" to choose a port randomly. Standalone mode only. |
External Service | Standalone Master | 6066 | Submit job to cluster via REST API | spark.master.rest.port |
Use spark.master.rest.enabled to enable/disable this service. Standalone mode only. |
Standalone Master | Standalone Worker | (random) | Schedule executors | SPARK_WORKER_PORT |
Set to "0" to choose a port randomly. Standalone mode only. |
From | To | Default Port | Purpose | Configuration Setting | Notes |
---|---|---|---|---|---|
Browser | Application | 4040 | Web UI | spark.ui.port |
Jetty-based |
Browser | History Server | 18080 | Web UI | spark.history.ui.port |
Jetty-based |
Executor / Standalone Master |
Driver | (random) | Connect to application / Notify executor state changes |
spark.driver.port |
Set to "0" to choose a port randomly. |
Executor / Driver | Executor / Driver | (random) | Block Manager port | spark.blockManager.port |
Raw socket via ServerSocketChannel |
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.security.credentials.${service}.enabled |
true |
Controls whether to obtain credentials for services when security is enabled. By default, credentials for all supported services are retrieved when those services are configured, but it's possible to disable that behavior if it somehow conflicts with the application being run. | 2.3.0 |
spark.kerberos.access.hadoopFileSystems |
(none) |
A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
example, spark.kerberos.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
webhdfs://nn3.com:50070 . The Spark application must have access to the filesystems listed
and Kerberos must be properly configured to be able to access them (either in the same realm
or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
the Spark application can access those remote Hadoop filesystems.
|
3.0.0 |