From 2e1e1f83e49f69e37229c65592c1a2e87b1efe56 Mon Sep 17 00:00:00 2001 From: Sean Owen Date: Sat, 17 Apr 2021 08:44:00 -0500 Subject: [PATCH] [MINOR][DOCS] Soften security warning and keep it in cluster management docs only ### What changes were proposed in this pull request? Soften security warning and keep it in cluster management docs only, not in the main doc page, where it's not necessarily relevant. ### Why are the changes needed? The statement is perhaps unnecessarily 'frightening' as the first section in the main docs page. It applies to clusters not local mode, anyhow. ### Does this PR introduce _any_ user-facing change? Just a docs change. ### How was this patch tested? N/A Closes #32206 from srowen/SecurityStatement. Authored-by: Sean Owen Signed-off-by: Sean Owen --- docs/index.md | 5 ----- docs/quick-start.md | 5 ----- docs/running-on-kubernetes.md | 6 ++++-- docs/running-on-mesos.md | 4 +++- docs/running-on-yarn.md | 4 +++- docs/security.md | 5 ++++- docs/spark-standalone.md | 4 +++- 7 files changed, 17 insertions(+), 16 deletions(-) diff --git a/docs/index.md b/docs/index.md index de30186141..f5a4f9cf43 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,11 +25,6 @@ It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including [Spark SQL](sql-programming-guide.html) for SQL and structured data processing, [MLlib](ml-guide.html) for machine learning, [GraphX](graphx-programming-guide.html) for graph processing, and [Structured Streaming](structured-streaming-programming-guide.html) for incremental computation and stream processing. -# Security - -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. -Please see [Spark Security](security.html) before downloading and running Spark. - # Downloading Get Spark from the [downloads page](https://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. diff --git a/docs/quick-start.md b/docs/quick-start.md index 557fc187fb..958e1ba920 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -32,11 +32,6 @@ you can download a package for any version of Hadoop. Note that, before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood. The RDD interface is still supported, and you can get a more detailed reference at the [RDD programming guide](rdd-programming-guide.html). However, we highly recommend you to switch to use Dataset, which has better performance than RDD. See the [SQL programming guide](sql-programming-guide.html) to get more information about Dataset. -# Security - -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. -Please see [Spark Security](security.html) before running Spark. - # Interactive Analysis with the Spark Shell ## Basics diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index e1d2f96ad0..530951e839 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -25,8 +25,10 @@ Kubernetes scheduler that has been added to Spark. # Security -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. -Please see [Spark Security](security.html) and the specific advice below before running Spark. +Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet +or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications +from running on the cluster. +Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. ## User Identity diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index bd01639ea7..52325f370d 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -32,7 +32,9 @@ The advantages of deploying Spark with Mesos include: # Security -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet +or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications +from running on the cluster. Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. # How it Works diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 73bb76af65..5969ed33f5 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -26,7 +26,9 @@ was added to Spark in version 0.6.0, and improved in subsequent releases. # Security -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet +or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications +from running on the cluster. Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. # Launching Spark on YARN diff --git a/docs/security.md b/docs/security.md index a4ede9f05b..a75ca82e32 100644 --- a/docs/security.md +++ b/docs/security.md @@ -23,7 +23,10 @@ license: | # Spark Security: Things You Need To Know -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet +or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications +from running on the cluster. + Spark supports multiple deployments types and each one supports different levels of security. Not all deployment types will be secure in all environments and none are secure by default. Be sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 4344893fd3..1991d64fe4 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -25,7 +25,9 @@ In addition to running on the Mesos or YARN cluster managers, Spark also provide # Security -Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Security features like authentication are not enabled by default. When deploying a cluster that is open to the internet +or an untrusted network, it's important to secure access to the cluster to prevent unauthorized applications +from running on the cluster. Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. # Installing Spark Standalone to a Cluster