[SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties

### What changes were proposed in this pull request?

This PR proposes to use "mdc.XXX" as the consistent key for both `sc.setLocalProperty` and `log4j.properties` when setting up configurations for MDC.
### Why are the changes needed?

It's weird that we use "mdc.XXX" as key to set MDC value via `sc.setLocalProperty` while we use "XXX" as key to set MDC pattern in log4j.properties. It could also bring extra burden to the user.

### Does this PR introduce _any_ user-facing change?

No, as MDC feature is added in version 3.1, which hasn't been released.

### How was this patch tested?

Tested manually.

Closes #28801 from Ngone51/consistent-mdc.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This commit is contained in:
yi.wu 2020-06-14 14:26:11 -07:00 committed by Dongjoon Hyun
parent 1e40bccf44
commit 54e702c0dd
2 changed files with 6 additions and 9 deletions

View file

@ -323,10 +323,7 @@ private[spark] class Executor(
val threadName = s"Executor task launch worker for task $taskId" val threadName = s"Executor task launch worker for task $taskId"
val taskName = taskDescription.name val taskName = taskDescription.name
val mdcProperties = taskDescription.properties.asScala val mdcProperties = taskDescription.properties.asScala
.filter(_._1.startsWith("mdc.")).map { item => .filter(_._1.startsWith("mdc.")).toSeq
val key = item._1.substring(4)
(key, item._2)
}.toSeq
/** If specified, this task has been killed and this option contains the reason. */ /** If specified, this task has been killed and this option contains the reason. */
@volatile private var reasonIfKilled: Option[String] = None @volatile private var reasonIfKilled: Option[String] = None
@ -705,7 +702,7 @@ private[spark] class Executor(
MDC.clear() MDC.clear()
mdc.foreach { case (key, value) => MDC.put(key, value) } mdc.foreach { case (key, value) => MDC.put(key, value) }
// avoid overriding the takName by the user // avoid overriding the takName by the user
MDC.put("taskName", taskName) MDC.put("mdc.taskName", taskName)
} }
/** /**

View file

@ -2955,11 +2955,11 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can config
`log4j.properties` file in the `conf` directory. One way to start is to copy the existing `log4j.properties` file in the `conf` directory. One way to start is to copy the existing
`log4j.properties.template` located there. `log4j.properties.template` located there.
By default, Spark adds 1 record to the MDC (Mapped Diagnostic Context): `taskName`, which shows something By default, Spark adds 1 record to the MDC (Mapped Diagnostic Context): `mdc.taskName`, which shows something
like `task 1.0 in stage 0.0`. You can add `%X{taskName}` to your patternLayout in like `task 1.0 in stage 0.0`. You can add `%X{mdc.taskName}` to your patternLayout in
order to print it in the logs. order to print it in the logs.
Moreover, you can use `spark.sparkContext.setLocalProperty("mdc." + name, "value")` to add user specific data into MDC. Moreover, you can use `spark.sparkContext.setLocalProperty(s"mdc.$name", "value")` to add user specific data into MDC.
The key in MDC will be the string after the `mdc.` prefix. The key in MDC will be the string of "mdc.$name".
# Overriding configuration directory # Overriding configuration directory