From bc8933faf238bcf14e7976bd1ac1465dc32b8e2b Mon Sep 17 00:00:00 2001 From: hyukjinkwon Date: Tue, 12 Dec 2017 17:02:04 +0900 Subject: [PATCH] [SPARK-3685][CORE] Prints explicit warnings when configured local directories are set to URIs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## What changes were proposed in this pull request? This PR proposes to print warnings before creating local by `java.io.File`. I think we can't just simply disallow and throw an exception for such cases of `hdfs:/tmp/foo` case because it might break compatibility. Note that `hdfs:/tmp/foo` creates a directory called `hdfs:/`. There were many discussion here about whether we should support this in other file systems or not; however, since the JIRA targets "Spark's local dir should accept only local paths", here I tried to resolve it by simply printing warnings. I think we could open another JIRA and design doc if this is something we should support, separately. Another note, for your information, [SPARK-1529](https://issues.apache.org/jira/browse/SPARK-1529) is resolved as `Won't Fix`. **Before** ``` ./bin/spark-shell --conf spark.local.dir=file:/a/b/c ``` This creates a local directory as below: ``` file:/ └── a └── b └── c ... ``` **After** ```bash ./bin/spark-shell --conf spark.local.dir=file:/a/b/c ``` Now, it prints a warning as below: ``` ... 17/12/09 21:58:49 WARN Utils: The configured local directories are not expected to be URIs; however, got suspicious values [file:/a/b/c]. Please check your configured local directories. ... ``` ```bash ./bin/spark-shell --conf spark.local.dir=file:/a/b/c,/tmp/a/b/c,hdfs:/a/b/c ``` It also works with comma-separated ones: ``` ... 17/12/09 22:05:01 WARN Utils: The configured local directories are not expected to be URIs; however, got suspicious values [file:/a/b/c, hdfs:/a/b/c]. Please check your configured local directories. ... ``` ## How was this patch tested? Manually tested: ``` ./bin/spark-shell --conf spark.local.dir=C:\\a\\b\\c ./bin/spark-shell --conf spark.local.dir=/tmp/a/b/c ./bin/spark-shell --conf spark.local.dir=a/b/c ./bin/spark-shell --conf spark.local.dir=a/b/c,/tmp/a/b/c,C:\\a\\b\\c ``` Author: hyukjinkwon Closes #19934 from HyukjinKwon/SPARK-3685. --- .../main/scala/org/apache/spark/util/Utils.scala | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 1ed09dc489..fe5b4ea244 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -829,7 +829,18 @@ private[spark] object Utils extends Logging { } private def getOrCreateLocalRootDirsImpl(conf: SparkConf): Array[String] = { - getConfiguredLocalDirs(conf).flatMap { root => + val configuredLocalDirs = getConfiguredLocalDirs(conf) + val uris = configuredLocalDirs.filter { root => + // Here, we guess if the given value is a URI at its best - check if scheme is set. + Try(new URI(root).getScheme != null).getOrElse(false) + } + if (uris.nonEmpty) { + logWarning( + "The configured local directories are not expected to be URIs; however, got suspicious " + + s"values [${uris.mkString(", ")}]. Please check your configured local directories.") + } + + configuredLocalDirs.flatMap { root => try { val rootDir = new File(root) if (rootDir.exists || rootDir.mkdirs()) {