spark-instrumented-optimizer/docs/security.md

149 lines
7 KiB
Markdown
Raw Normal View History

SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
---
layout: global
title: Spark Security
---
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
Spark currently supports authentication via a shared secret. Authentication can be configured to be on via the `spark.authenticate` configuration parameter. This parameter controls whether the Spark communication protocols do authentication using the shared secret. This authentication is a basic handshake to make sure both sides have the same shared secret and are allowed to communicate. If the shared secret is not identical they will not be allowed to communicate. The shared secret is created as follows:
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
* For Spark on [YARN](running-on-yarn.html) deployments, configuring `spark.authenticate` to `true` will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.
* For other types of Spark deployments, the Spark parameter `spark.authenticate.secret` should be configured on each of the nodes. This secret will be used by all the Master/Workers and applications.
[SPARK-2157] Enable tight firewall rules for Spark The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107. The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs. My spark-env.sh looks like this: ``` export SPARK_MASTER_PORT=6060 export SPARK_WORKER_PORT=7070 export SPARK_MASTER_WEBUI_PORT=9090 export SPARK_WORKER_WEBUI_PORT=9091 ``` and my spark-defaults.conf looks like this: ``` spark.master spark://andrews-mbp:6060 spark.driver.port 5001 spark.fileserver.port 5011 spark.broadcast.port 5021 spark.replClassServer.port 5031 spark.blockManager.port 5041 spark.executor.port 5051 ``` Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Ash <andrew@andrewash.com> Closes #1777 from andrewor14/configure-ports and squashes the following commits: 621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 8a6b820 [Andrew Or] Use a random UI port during tests 7da0493 [Andrew Or] Fix tests 523c30e [Andrew Or] Add test for isBindCollision b97b02a [Andrew Or] Minor fixes c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 93d359f [Andrew Or] Executors connect to wrong port when collision occurs d502e5f [Andrew Or] Handle port collisions when creating Akka systems a2dd05c [Andrew Or] Patrick's comment nit 86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port 1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.) e837cde [Andrew Or] Remove outdated TODOs bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports de1b207 [Andrew Or] Update docs to reflect new ports b565079 [Andrew Or] Add spark.ports.maxRetries 2551eb2 [Andrew Or] Remove spark.worker.watcher.port 151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 9868358 [Andrew Or] Add a few miscellaneous ports 6016e77 [Andrew Or] Add spark.executor.port 8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT 4d9e6f3 [Andrew Or] Fix super subtle bug 3f8e51b [Andrew Or] Correct erroneous docs... e111d08 [Andrew Or] Add names for UI services 470f38c [Andrew Or] Special case non-"Address already in use" exceptions 1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port ba32280 [Andrew Or] Minor fixes 6b550b0 [Andrew Or] Assorted fixes 73fbe89 [Andrew Or] Move start service logic to Utils ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports 038a579 [Andrew Ash] Trust the server start function to report the port the service started on 7c5bdc4 [Andrew Ash] Fix style issue 0347aef [Andrew Ash] Unify port fallback logic to a single place 24a4c32 [Andrew Ash] Remove type on val to match surrounding style 9e4ad96 [Andrew Ash] Reformat for style checker 5d84e0e [Andrew Ash] Document new port configuration options 066dc7a [Andrew Ash] Fix up HttpServer port increments cad16da [Andrew Ash] Add fallover increment logic for HttpServer c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment b80d2fd [Andrew Ash] Make Spark's block manager port configurable 17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast 49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer 1c0981a [Andrew Ash] Make port in HttpServer configurable
2014-08-06 03:07:40 -04:00
* **IMPORTANT NOTE:** *The experimental Netty shuffle path (`spark.shuffle.use.netty`) is not secured, so do not use Netty for shuffles if running with authentication.*
## Web UI
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
The Spark UI can also be secured by using [javax servlet filters](http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html) via the `spark.ui.filters` setting. A user may want to secure the UI if it has data that other users should not be allowed to see. The javax servlet filter specified by the user can authenticate the user and then once the user is logged in, Spark can compare that user versus the view ACLs to make sure they are authorized to view the UI. The configs `spark.acls.enable` and `spark.ui.view.acls` control the behavior of the ACLs. Note that the user who started the application always has view access to the UI. On YARN, the Spark UI uses the standard YARN web application proxy mechanism and will authenticate via any installed Hadoop filters.
Spark also supports modify ACLs to control who has access to modify a running Spark application. This includes things like killing the application or a task. This is controlled by the configs `spark.acls.enable` and `spark.modify.acls`. Note that if you are authenticating the web UI, in order to use the kill button on the web UI it might be necessary to add the users in the modify acls to the view acls also. On YARN, the modify acls are passed in and control who has modify access via YARN interfaces.
Spark allows for a set of administrators to be specified in the acls who always have view and modify permissions to all the applications. is controlled by the config `spark.admin.acls`. This is useful on a shared cluster where you might have administrators or support staff who help users debug applications.
[SPARK-2157] Enable tight firewall rules for Spark The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107. The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs. My spark-env.sh looks like this: ``` export SPARK_MASTER_PORT=6060 export SPARK_WORKER_PORT=7070 export SPARK_MASTER_WEBUI_PORT=9090 export SPARK_WORKER_WEBUI_PORT=9091 ``` and my spark-defaults.conf looks like this: ``` spark.master spark://andrews-mbp:6060 spark.driver.port 5001 spark.fileserver.port 5011 spark.broadcast.port 5021 spark.replClassServer.port 5031 spark.blockManager.port 5041 spark.executor.port 5051 ``` Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Ash <andrew@andrewash.com> Closes #1777 from andrewor14/configure-ports and squashes the following commits: 621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 8a6b820 [Andrew Or] Use a random UI port during tests 7da0493 [Andrew Or] Fix tests 523c30e [Andrew Or] Add test for isBindCollision b97b02a [Andrew Or] Minor fixes c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 93d359f [Andrew Or] Executors connect to wrong port when collision occurs d502e5f [Andrew Or] Handle port collisions when creating Akka systems a2dd05c [Andrew Or] Patrick's comment nit 86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port 1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.) e837cde [Andrew Or] Remove outdated TODOs bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports de1b207 [Andrew Or] Update docs to reflect new ports b565079 [Andrew Or] Add spark.ports.maxRetries 2551eb2 [Andrew Or] Remove spark.worker.watcher.port 151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 9868358 [Andrew Or] Add a few miscellaneous ports 6016e77 [Andrew Or] Add spark.executor.port 8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT 4d9e6f3 [Andrew Or] Fix super subtle bug 3f8e51b [Andrew Or] Correct erroneous docs... e111d08 [Andrew Or] Add names for UI services 470f38c [Andrew Or] Special case non-"Address already in use" exceptions 1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port ba32280 [Andrew Or] Minor fixes 6b550b0 [Andrew Or] Assorted fixes 73fbe89 [Andrew Or] Move start service logic to Utils ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports 038a579 [Andrew Ash] Trust the server start function to report the port the service started on 7c5bdc4 [Andrew Ash] Fix style issue 0347aef [Andrew Ash] Unify port fallback logic to a single place 24a4c32 [Andrew Ash] Remove type on val to match surrounding style 9e4ad96 [Andrew Ash] Reformat for style checker 5d84e0e [Andrew Ash] Document new port configuration options 066dc7a [Andrew Ash] Fix up HttpServer port increments cad16da [Andrew Ash] Add fallover increment logic for HttpServer c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment b80d2fd [Andrew Ash] Make Spark's block manager port configurable 17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast 49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer 1c0981a [Andrew Ash] Make port in HttpServer configurable
2014-08-06 03:07:40 -04:00
## Event Logging
[SPARK-1566] consolidate programming guide, and general doc updates This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 03:34:33 -04:00
If your applications are using event logging, the directory where the event logs go (`spark.eventLog.dir`) should be manually created and have the proper permissions set on it. If you want those log files secured, the permissions should be set to `drwxrwxrwxt` for that directory. The owner of the directory should be the super user who is running the history server and the group permissions should be restricted to super user group. This will allow all users to write to the directory but will prevent unprivileged users from removing or renaming a file unless they own the file or directory. The event log files will be created by Spark with permissions such that only the user and group have read and write access.
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
[SPARK-2157] Enable tight firewall rules for Spark The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107. The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs. My spark-env.sh looks like this: ``` export SPARK_MASTER_PORT=6060 export SPARK_WORKER_PORT=7070 export SPARK_MASTER_WEBUI_PORT=9090 export SPARK_WORKER_WEBUI_PORT=9091 ``` and my spark-defaults.conf looks like this: ``` spark.master spark://andrews-mbp:6060 spark.driver.port 5001 spark.fileserver.port 5011 spark.broadcast.port 5021 spark.replClassServer.port 5031 spark.blockManager.port 5041 spark.executor.port 5051 ``` Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Ash <andrew@andrewash.com> Closes #1777 from andrewor14/configure-ports and squashes the following commits: 621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 8a6b820 [Andrew Or] Use a random UI port during tests 7da0493 [Andrew Or] Fix tests 523c30e [Andrew Or] Add test for isBindCollision b97b02a [Andrew Or] Minor fixes c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 93d359f [Andrew Or] Executors connect to wrong port when collision occurs d502e5f [Andrew Or] Handle port collisions when creating Akka systems a2dd05c [Andrew Or] Patrick's comment nit 86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port 1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.) e837cde [Andrew Or] Remove outdated TODOs bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports de1b207 [Andrew Or] Update docs to reflect new ports b565079 [Andrew Or] Add spark.ports.maxRetries 2551eb2 [Andrew Or] Remove spark.worker.watcher.port 151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 9868358 [Andrew Or] Add a few miscellaneous ports 6016e77 [Andrew Or] Add spark.executor.port 8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT 4d9e6f3 [Andrew Or] Fix super subtle bug 3f8e51b [Andrew Or] Correct erroneous docs... e111d08 [Andrew Or] Add names for UI services 470f38c [Andrew Or] Special case non-"Address already in use" exceptions 1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port ba32280 [Andrew Or] Minor fixes 6b550b0 [Andrew Or] Assorted fixes 73fbe89 [Andrew Or] Move start service logic to Utils ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports 038a579 [Andrew Ash] Trust the server start function to report the port the service started on 7c5bdc4 [Andrew Ash] Fix style issue 0347aef [Andrew Ash] Unify port fallback logic to a single place 24a4c32 [Andrew Ash] Remove type on val to match surrounding style 9e4ad96 [Andrew Ash] Reformat for style checker 5d84e0e [Andrew Ash] Document new port configuration options 066dc7a [Andrew Ash] Fix up HttpServer port increments cad16da [Andrew Ash] Add fallover increment logic for HttpServer c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment b80d2fd [Andrew Ash] Make Spark's block manager port configurable 17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast 49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer 1c0981a [Andrew Ash] Make port in HttpServer configurable
2014-08-06 03:07:40 -04:00
## Configuring Ports for Network Security
Spark makes heavy use of the network, and some environments have strict requirements for using tight
firewall settings. Below are the primary ports that Spark uses for its communication and how to
configure those ports.
### Standalone mode only
<table class="table">
<tr>
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
Setting</th><th>Notes</th>
</tr>
<tr>
<td>Browser</td>
<td>Standalone Master</td>
<td>8080</td>
<td>Web UI</td>
<td><code>spark.master.ui.port /<br> SPARK_MASTER_WEBUI_PORT</code></td>
<td>Jetty-based. Standalone mode only.</td>
</tr>
<tr>
<td>Browser</td>
<td>Standalone Worker</td>
<td>8081</td>
<td>Web UI</td>
<td><code>spark.worker.ui.port /<br> SPARK_WORKER_WEBUI_PORT</code></td>
<td>Jetty-based. Standalone mode only.</td>
</tr>
<tr>
<td>Driver /<br> Standalone Worker</td>
<td>Standalone Master</td>
<td>7077</td>
<td>Submit job to cluster /<br> Join cluster</td>
<td><code>SPARK_MASTER_PORT</code></td>
<td>Akka-based. Set to "0" to choose a port randomly. Standalone mode only.</td>
</tr>
<tr>
<td>Standalone Master</td>
<td>Standalone Worker</td>
<td>(random)</td>
<td>Schedule executors</td>
<td><code>SPARK_WORKER_PORT</code></td>
<td>Akka-based. Set to "0" to choose a port randomly. Standalone mode only.</td>
</tr>
</table>
### All cluster managers
<table class="table">
<tr>
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
Setting</th><th>Notes</th>
</tr>
<tr>
<td>Browser</td>
<td>Application</td>
<td>4040</td>
<td>Web UI</td>
<td><code>spark.ui.port</code></td>
<td>Jetty-based</td>
</tr>
<tr>
<td>Browser</td>
<td>History Server</td>
<td>18080</td>
<td>Web UI</td>
<td><code>spark.history.ui.port</code></td>
<td>Jetty-based</td>
</tr>
<tr>
<td>Executor /<br> Standalone Master</td>
<td>Driver</td>
<td>(random)</td>
<td>Connect to application /<br> Notify executor state changes</td>
<td><code>spark.driver.port</code></td>
<td>Akka-based. Set to "0" to choose a port randomly.</td>
</tr>
<tr>
<td>Driver</td>
<td>Executor</td>
<td>(random)</td>
<td>Schedule tasks</td>
<td><code>spark.executor.port</code></td>
<td>Akka-based. Set to "0" to choose a port randomly.</td>
</tr>
<tr>
<td>Executor</td>
<td>Driver</td>
<td>(random)</td>
<td>File server for files and jars</td>
<td><code>spark.fileserver.port</code></td>
<td>Jetty-based</td>
</tr>
<tr>
<td>Executor</td>
<td>Driver</td>
<td>(random)</td>
<td>HTTP Broadcast</td>
<td><code>spark.broadcast.port</code></td>
<td>Jetty-based. Not used by TorrentBroadcast, which sends data through the block manager
instead.</td>
</tr>
<tr>
<td>Executor</td>
<td>Driver</td>
<td>(random)</td>
<td>Class file server</td>
<td><code>spark.replClassServer.port</code></td>
<td>Jetty-based. Only used in Spark shells.</td>
</tr>
<tr>
<td>Executor / Driver</td>
<td>Executor / Driver</td>
<td>(random)</td>
<td>Block Manager port</td>
<td><code>spark.blockManager.port</code></td>
<td>Raw socket via ServerSocketChannel</td>
</tr>
</table>
SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgraves@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets
2014-03-06 19:27:50 -05:00
[SPARK-2157] Enable tight firewall rules for Spark The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107. The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs. My spark-env.sh looks like this: ``` export SPARK_MASTER_PORT=6060 export SPARK_WORKER_PORT=7070 export SPARK_MASTER_WEBUI_PORT=9090 export SPARK_WORKER_WEBUI_PORT=9091 ``` and my spark-defaults.conf looks like this: ``` spark.master spark://andrews-mbp:6060 spark.driver.port 5001 spark.fileserver.port 5011 spark.broadcast.port 5021 spark.replClassServer.port 5031 spark.blockManager.port 5041 spark.executor.port 5051 ``` Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Ash <andrew@andrewash.com> Closes #1777 from andrewor14/configure-ports and squashes the following commits: 621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 8a6b820 [Andrew Or] Use a random UI port during tests 7da0493 [Andrew Or] Fix tests 523c30e [Andrew Or] Add test for isBindCollision b97b02a [Andrew Or] Minor fixes c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 93d359f [Andrew Or] Executors connect to wrong port when collision occurs d502e5f [Andrew Or] Handle port collisions when creating Akka systems a2dd05c [Andrew Or] Patrick's comment nit 86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port 1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.) e837cde [Andrew Or] Remove outdated TODOs bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports de1b207 [Andrew Or] Update docs to reflect new ports b565079 [Andrew Or] Add spark.ports.maxRetries 2551eb2 [Andrew Or] Remove spark.worker.watcher.port 151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports 9868358 [Andrew Or] Add a few miscellaneous ports 6016e77 [Andrew Or] Add spark.executor.port 8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT 4d9e6f3 [Andrew Or] Fix super subtle bug 3f8e51b [Andrew Or] Correct erroneous docs... e111d08 [Andrew Or] Add names for UI services 470f38c [Andrew Or] Special case non-"Address already in use" exceptions 1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port ba32280 [Andrew Or] Minor fixes 6b550b0 [Andrew Or] Assorted fixes 73fbe89 [Andrew Or] Move start service logic to Utils ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports 038a579 [Andrew Ash] Trust the server start function to report the port the service started on 7c5bdc4 [Andrew Ash] Fix style issue 0347aef [Andrew Ash] Unify port fallback logic to a single place 24a4c32 [Andrew Ash] Remove type on val to match surrounding style 9e4ad96 [Andrew Ash] Reformat for style checker 5d84e0e [Andrew Ash] Document new port configuration options 066dc7a [Andrew Ash] Fix up HttpServer port increments cad16da [Andrew Ash] Add fallover increment logic for HttpServer c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment b80d2fd [Andrew Ash] Make Spark's block manager port configurable 17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast 49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer 1c0981a [Andrew Ash] Make port in HttpServer configurable
2014-08-06 03:07:40 -04:00
See the [configuration page](configuration.html) for more details on the security configuration
parameters, and <a href="{{site.SPARK_GITHUB_URL}}/tree/master/core/src/main/scala/org/apache/spark/SecurityManager.scala">
<code>org.apache.spark.SecurityManager</code></a> for implementation details about security.