Merge pull request #462 from mateiz/conf-file-fix

Remove Typesafe Config usage and conf files to fix nested property names

With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

This PR is for branch 0.9 but should be added into master too.
This commit is contained in:
Patrick Wendell 2014-01-18 16:17:34 -08:00
commit 34e911ce9a
6 changed files with 41 additions and 71 deletions

View file

@ -20,19 +20,17 @@ package org.apache.spark
import scala.collection.JavaConverters._ import scala.collection.JavaConverters._
import scala.collection.mutable.HashMap import scala.collection.mutable.HashMap
import com.typesafe.config.ConfigFactory
import java.io.{ObjectInputStream, ObjectOutputStream, IOException} import java.io.{ObjectInputStream, ObjectOutputStream, IOException}
/** /**
* Configuration for a Spark application. Used to set various Spark parameters as key-value pairs. * Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
* *
* Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load * Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load
* values from both the `spark.*` Java system properties and any `spark.conf` on your application's * values from any `spark.*` Java system properties set in your application as well. In this case,
* classpath (if it has one). In this case, system properties take priority over `spark.conf`, and * parameters you set directly on the `SparkConf` object take priority over system properties.
* any parameters you set directly on the `SparkConf` object take priority over both of those.
* *
* For unit tests, you can also call `new SparkConf(false)` to skip loading external settings and * For unit tests, you can also call `new SparkConf(false)` to skip loading external settings and
* get the same configuration no matter what is on the classpath. * get the same configuration no matter what the system properties are.
* *
* All setter methods in this class support chaining. For example, you can write * All setter methods in this class support chaining. For example, you can write
* `new SparkConf().setMaster("local").setAppName("My app")`. * `new SparkConf().setMaster("local").setAppName("My app")`.
@ -40,7 +38,7 @@ import java.io.{ObjectInputStream, ObjectOutputStream, IOException}
* Note that once a SparkConf object is passed to Spark, it is cloned and can no longer be modified * Note that once a SparkConf object is passed to Spark, it is cloned and can no longer be modified
* by the user. Spark does not support modifying the configuration at runtime. * by the user. Spark does not support modifying the configuration at runtime.
* *
* @param loadDefaults whether to load values from the system properties and classpath * @param loadDefaults whether to also load values from Java system properties
*/ */
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging { class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
@ -50,11 +48,9 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
private val settings = new HashMap[String, String]() private val settings = new HashMap[String, String]()
if (loadDefaults) { if (loadDefaults) {
ConfigFactory.invalidateCaches() // Load any spark.* system properties
val typesafeConfig = ConfigFactory.systemProperties() for ((k, v) <- System.getProperties.asScala if k.startsWith("spark.")) {
.withFallback(ConfigFactory.parseResources("spark.conf")) settings(k) = v
for (e <- typesafeConfig.entrySet().asScala if e.getKey.startsWith("spark.")) {
settings(e.getKey) = e.getValue.unwrapped.toString
} }
} }

View file

@ -1,8 +0,0 @@
# A simple spark.conf file used only in our unit tests
spark.test.intTestProperty = 1
spark.test {
stringTestProperty = "hi"
listTestProperty = ["a", "b"]
}

View file

@ -20,35 +20,23 @@ package org.apache.spark
import org.scalatest.FunSuite import org.scalatest.FunSuite
class SparkConfSuite extends FunSuite with LocalSparkContext { class SparkConfSuite extends FunSuite with LocalSparkContext {
// This test uses the spark.conf in core/src/test/resources, which has a few test properties test("loading from system properties") {
test("loading from spark.conf") {
val conf = new SparkConf()
assert(conf.get("spark.test.intTestProperty") === "1")
assert(conf.get("spark.test.stringTestProperty") === "hi")
// NOTE: we don't use list properties yet, but when we do, we'll have to deal with this syntax
assert(conf.get("spark.test.listTestProperty") === "[a, b]")
}
// This test uses the spark.conf in core/src/test/resources, which has a few test properties
test("system properties override spark.conf") {
try { try {
System.setProperty("spark.test.intTestProperty", "2") System.setProperty("spark.test.testProperty", "2")
val conf = new SparkConf() val conf = new SparkConf()
assert(conf.get("spark.test.intTestProperty") === "2") assert(conf.get("spark.test.testProperty") === "2")
assert(conf.get("spark.test.stringTestProperty") === "hi")
} finally { } finally {
System.clearProperty("spark.test.intTestProperty") System.clearProperty("spark.test.testProperty")
} }
} }
test("initializing without loading defaults") { test("initializing without loading defaults") {
try { try {
System.setProperty("spark.test.intTestProperty", "2") System.setProperty("spark.test.testProperty", "2")
val conf = new SparkConf(false) val conf = new SparkConf(false)
assert(!conf.contains("spark.test.intTestProperty")) assert(!conf.contains("spark.test.testProperty"))
assert(!conf.contains("spark.test.stringTestProperty"))
} finally { } finally {
System.clearProperty("spark.test.intTestProperty") System.clearProperty("spark.test.testProperty")
} }
} }
@ -124,4 +112,25 @@ class SparkConfSuite extends FunSuite with LocalSparkContext {
assert(sc.master === "local[2]") assert(sc.master === "local[2]")
assert(sc.appName === "My other app") assert(sc.appName === "My other app")
} }
test("nested property names") {
// This wasn't supported by some external conf parsing libraries
try {
System.setProperty("spark.test.a", "a")
System.setProperty("spark.test.a.b", "a.b")
System.setProperty("spark.test.a.b.c", "a.b.c")
val conf = new SparkConf()
assert(conf.get("spark.test.a") === "a")
assert(conf.get("spark.test.a.b") === "a.b")
assert(conf.get("spark.test.a.b.c") === "a.b.c")
conf.set("spark.test.a.b", "A.B")
assert(conf.get("spark.test.a") === "a")
assert(conf.get("spark.test.a.b") === "A.B")
assert(conf.get("spark.test.a.b.c") === "a.b.c")
} finally {
System.clearProperty("spark.test.a")
System.clearProperty("spark.test.a.b")
System.clearProperty("spark.test.a.b.c")
}
}
} }

View file

@ -18,8 +18,8 @@ Spark provides three locations to configure the system:
Spark properties control most application settings and are configured separately for each application. Spark properties control most application settings and are configured separately for each application.
The preferred way to set them is by passing a [SparkConf](api/core/index.html#org.apache.spark.SparkConf) The preferred way to set them is by passing a [SparkConf](api/core/index.html#org.apache.spark.SparkConf)
class to your SparkContext constructor. class to your SparkContext constructor.
Alternatively, Spark will also load them from Java system properties (for compatibility with old versions Alternatively, Spark will also load them from Java system properties, for compatibility with old versions
of Spark) and from a [`spark.conf` file](#configuration-files) on your classpath. of Spark.
SparkConf lets you configure most of the common properties to initialize a cluster (e.g., master URL and SparkConf lets you configure most of the common properties to initialize a cluster (e.g., master URL and
application name), as well as arbitrary key-value pairs through the `set()` method. For example, we could application name), as well as arbitrary key-value pairs through the `set()` method. For example, we could
@ -468,30 +468,6 @@ Apart from these, the following properties are also available, and may be useful
The application web UI at `http://<driver>:4040` lists Spark properties in the "Environment" tab. The application web UI at `http://<driver>:4040` lists Spark properties in the "Environment" tab.
This is a useful place to check to make sure that your properties have been set correctly. This is a useful place to check to make sure that your properties have been set correctly.
## Configuration Files
You can also configure Spark properties through a `spark.conf` file on your Java classpath.
Because these properties are usually application-specific, we recommend putting this fine *only* on your
application's classpath, and not in a global Spark classpath.
The `spark.conf` file uses Typesafe Config's [HOCON format](https://github.com/typesafehub/config#json-superset),
which is a superset of Java properties files and JSON. For example, the following is a simple config file:
{% highlight awk %}
# Comments are allowed
spark.executor.memory = 512m
spark.serializer = org.apache.spark.serializer.KryoSerializer
{% endhighlight %}
The format also allows hierarchical nesting, as follows:
{% highlight awk %}
spark.akka {
threads = 8
timeout = 200
}
{% endhighlight %}
# Environment Variables # Environment Variables
Certain Spark settings can be configured through environment variables, which are read from the `conf/spark-env.sh` Certain Spark settings can be configured through environment variables, which are read from the `conf/spark-env.sh`

View file

@ -277,7 +277,6 @@ object SparkBuild extends Build {
"com.codahale.metrics" % "metrics-graphite" % "3.0.0", "com.codahale.metrics" % "metrics-graphite" % "3.0.0",
"com.twitter" %% "chill" % "0.3.1", "com.twitter" %% "chill" % "0.3.1",
"com.twitter" % "chill-java" % "0.3.1", "com.twitter" % "chill-java" % "0.3.1",
"com.typesafe" % "config" % "1.0.2",
"com.clearspring.analytics" % "stream" % "2.5.1" "com.clearspring.analytics" % "stream" % "2.5.1"
) )
) )

View file

@ -61,14 +61,12 @@ class SparkConf(object):
Most of the time, you would create a SparkConf object with Most of the time, you would create a SparkConf object with
C{SparkConf()}, which will load values from C{spark.*} Java system C{SparkConf()}, which will load values from C{spark.*} Java system
properties and any C{spark.conf} on your Spark classpath. In this properties as well. In this case, any parameters you set directly on
case, system properties take priority over C{spark.conf}, and any the C{SparkConf} object take priority over system properties.
parameters you set directly on the C{SparkConf} object take priority
over both of those.
For unit tests, you can also call C{SparkConf(false)} to skip For unit tests, you can also call C{SparkConf(false)} to skip
loading external settings and get the same configuration no matter loading external settings and get the same configuration no matter
what is on the classpath. what the system properties are.
All setter methods in this class support chaining. For example, All setter methods in this class support chaining. For example,
you can write C{conf.setMaster("local").setAppName("My app")}. you can write C{conf.setMaster("local").setAppName("My app")}.
@ -82,7 +80,7 @@ class SparkConf(object):
Create a new Spark configuration. Create a new Spark configuration.
@param loadDefaults: whether to load values from Java system @param loadDefaults: whether to load values from Java system
properties and classpath (True by default) properties (True by default)
@param _jvm: internal parameter used to pass a handle to the @param _jvm: internal parameter used to pass a handle to the
Java VM; does not need to be set by users Java VM; does not need to be set by users
""" """