spark-instrumented-optimizer/conf/metrics.properties.template

211 lines
8.9 KiB
Plaintext
Raw Normal View History

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# syntax: [instance].sink|source.[name].[options]=[value]
2013-07-03 08:31:53 -04:00
# This file configures Spark's internal metrics system. The metrics system is
# divided into instances which correspond to internal components.
# Each instance can be configured to report its metrics to one or more sinks.
# Accepted values for [instance] are "master", "worker", "executor", "driver",
# and "applications". A wildcard "*" can be used as an instance name, in
# which case all instances will inherit the supplied property.
2013-07-15 22:57:19 -04:00
#
# Within an instance, a "source" specifies a particular set of grouped metrics.
# there are two kinds of sources:
# 1. Spark internal sources, like MasterSource, WorkerSource, etc, which will
# collect a Spark component's internal state. Each instance is paired with a
# Spark source that is added automatically.
# 2. Common sources, like JvmSource, which will collect low level state.
# These can be added through configuration options and are then loaded
# using reflection.
2013-07-15 22:57:19 -04:00
#
# A "sink" specifies where metrics are delivered to. Each instance can be
# assigned one or more sinks.
2013-07-15 22:57:19 -04:00
#
# The sink|source field specifies whether the property relates to a sink or
# source.
2013-07-15 22:57:19 -04:00
#
# The [name] field specifies the name of source or sink.
2013-07-15 22:57:19 -04:00
#
# The [options] field is the specific property of this source or sink. The
# source or sink is responsible for parsing this property.
2013-07-15 22:57:19 -04:00
#
# Notes:
# 1. To add a new sink, set the "class" option to a fully qualified class
# name (see examples below).
# 2. Some sinks involve a polling period. The minimum allowed polling period
2013-09-08 13:47:45 -04:00
# is 1 second.
# 3. Wildcard properties can be overridden by more specific properties.
# For example, master.sink.console.period takes precedence over
# *.sink.console.period.
2013-07-15 22:57:19 -04:00
# 4. A metrics specific configuration
# "spark.metrics.conf=${SPARK_HOME}/conf/metrics.properties" should be
# added to Java properties using -Dspark.metrics.conf=xxx if you want to
# customize metrics system. You can also put the file in ${SPARK_HOME}/conf
# and it will be loaded automatically.
# 5. The MetricsServlet sink is added by default as a sink in the master,
# worker and driver, and you can send HTTP requests to the "/metrics/json"
# endpoint to get a snapshot of all the registered metrics in JSON format.
# For master, requests to the "/metrics/master/json" and
# "/metrics/applications/json" endpoints can be sent separately to get
# metrics snapshots of the master instance and applications. This
# MetricsServlet does not have to be configured.
# 6. The metrics system can also be configured using Spark configuration
# parameters. The relevant parameter names are formed by adding the
# prefix "spark.metrics.conf." to the configuration entries detailed in
# this file (see examples below).
## List of available common sources and their properties.
# org.apache.spark.metrics.source.JvmSource
# Note: Currently, JvmSource is the only available common source.
# It can be added to an instance by setting the "class" option to its
# fully qualified class name (see examples below).
2013-09-08 13:47:45 -04:00
## List of available sinks and their properties.
# org.apache.spark.metrics.sink.ConsoleSink
# Name: Default: Description:
# period 10 Poll period
# unit seconds Unit of the poll period
2013-09-08 13:47:45 -04:00
# org.apache.spark.metrics.sink.CSVSink
# Name: Default: Description:
# period 10 Poll period
# unit seconds Unit of the poll period
2013-09-08 13:47:45 -04:00
# directory /tmp Where to store CSV files
# org.apache.spark.metrics.sink.GangliaSink
# Name: Default: Description:
# host NONE Hostname or multicast group of the Ganglia server,
# must be set
# port NONE Port of the Ganglia server(s), must be set
2013-09-08 13:47:45 -04:00
# period 10 Poll period
# unit seconds Unit of the poll period
2013-09-08 13:47:45 -04:00
# ttl 1 TTL of messages sent by Ganglia
# dmax 0 Lifetime in seconds of metrics (0 never expired)
# mode multicast Ganglia network mode ('unicast' or 'multicast')
2013-09-08 13:47:45 -04:00
# org.apache.spark.metrics.sink.JmxSink
# org.apache.spark.metrics.sink.MetricsServlet
# Name: Default: Description:
# path VARIES* Path prefix from the web server root
# sample false Whether to show entire set of samples for histograms
# ('false' or 'true')
2013-09-08 13:47:45 -04:00
#
# * Default path is /metrics/json for all instances except the master. The
# master has two paths:
# /metrics/applications/json # App information
# /metrics/master/json # Master information
2013-09-08 13:47:45 -04:00
[SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver ### What changes were proposed in this pull request? This PR aims to simplify `Prometheus` support by adding `PrometheusServlet`. The main use cases are `K8s` and `Spark Standalone` cluster environments. ### Why are the changes needed? Prometheus.io is a CNCF project used widely with K8s. - https://github.com/prometheus/prometheus For `Master/Worker/Driver`, `Spark JMX Sink` and `Prometheus JMX Converter` combination is used in many cases. One way to achieve that is having the followings. **JMX Sink (conf/metrics.properties)** ``` *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink ``` **JMX Converter(conf/spark-env.sh)** - https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar ``` export SPARK_DAEMON_JAVA_OPTS= "-javaagent:${PWD}/jmx_prometheus_javaagent-${JMX_PROMETHEUS_VERSION}.jar= ${PORT_AGENT}:jmx_prometheus.yaml" ``` This agent approach requires `PORT_AGENT` additionally. Instead, this PR natively support `Prometheus` format exporting with reusing REST API port for the better UX. ### Does this PR introduce any user-facing change? Yes. New web interfaces are added along with the existing JSON API. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------------- | ---------------------------------- | | Master | /metrics/master/json/ | /metrics/master/prometheus/ | | Master | /metrics/applications/json/ | /metrics/applications/prometheus/ | | Worker | /metrics/json/ | /metrics/prometheus/ | | Driver | /metrics/json/ | /metrics/prometheus/ | ### How was this patch tested? Manually connect the new end-points with `curl`. **Setup (Master/Worker/Driver)** Add the followings at `conf/metrics.properties` (`conf/metrics.properties.template` has these examples) ``` *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet *.sink.prometheusServlet.path=/metrics/prometheus master.sink.prometheusServlet.path=/metrics/master/prometheus applications.sink.prometheusServlet.path=/metrics/applications/prometheus ``` ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 ``` ``` $ curl -s http://localhost:8080/metrics/master/json/ | jq { "version": "3.1.3", "gauges": { "master.aliveWorkers": { "value": 1 }, "master.apps": { "value": 1 }, "master.waitingApps": { "value": 0 }, "master.workers": { "value": 1 } }, ... $ curl -s http://localhost:8080/metrics/master/prometheus/ | grep master metrics_master_aliveWorkers_Value 1 metrics_master_apps_Value 1 metrics_master_waitingApps_Value 0 metrics_master_workers_Value 1 ``` ``` $ curl -s http://localhost:8080/metrics/applications/json/ | jq { "version": "3.1.3", "gauges": { "application.Spark shell.1568261490667.cores": { "value": 16 }, "application.Spark shell.1568261490667.runtime_ms": { "value": 108966 }, "application.Spark shell.1568261490667.status": { "value": "RUNNING" } }, ... $ curl -s http://localhost:8080/metrics/applications/prometheus/ | grep application metrics_application_Spark_shell_1568261490667_cores_Value 16 metrics_application_Spark_shell_1568261490667_runtime_ms_Value 143174 ``` ``` $ curl -s http://localhost:8081/metrics/json/ | jq { "version": "3.1.3", "gauges": { "worker.coresFree": { "value": 0 }, "worker.coresUsed": { "value": 16 }, "worker.executors": { "value": 1 }, "worker.memFree_MB": { "value": 30720 }, "worker.memUsed_MB": { "value": 1024 } }, ... $ curl -s http://localhost:8081/metrics/prometheus/ | grep worker metrics_worker_coresFree_Value 0 metrics_worker_coresUsed_Value 16 metrics_worker_executors_Value 1 metrics_worker_memFree_MB_Value 30720 metrics_worker_memUsed_MB_Value 1024 ``` ``` $ curl -s http://localhost:4040/metrics/json/ | jq { "version": "3.1.3", "gauges": { "app-20190911211130-0000.driver.BlockManager.disk.diskSpaceUsed_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxMem_MB": { "value": 732 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOffHeapMem_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOnHeapMem_MB": { "value": 732 }, ... $ curl -s http://localhost:4040/metrics/prometheus/ | head -n5 metrics_app_20190911211130_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_memUsed_MB_Value 0 ``` Closes #25769 from dongjoon-hyun/SPARK-29032-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
2019-09-13 17:31:21 -04:00
# org.apache.spark.metrics.sink.PrometheusServlet
# Name: Default: Description:
# path VARIES* Path prefix from the web server root
#
# * Default path is /metrics/prometheus for all instances except the master. The
# master has two paths:
# /metrics/applications/prometheus # App information
# /metrics/master/prometheus # Master information
# org.apache.spark.metrics.sink.GraphiteSink
# Name: Default: Description:
# host NONE Hostname of the Graphite server, must be set
# port NONE Port of the Graphite server, must be set
# period 10 Poll period
# unit seconds Unit of the poll period
# prefix EMPTY STRING Prefix to prepend to every metric's name
# protocol tcp Protocol ("tcp" or "udp") to use
# regex NONE Optional filter to send only metrics matching this regex string
# org.apache.spark.metrics.sink.StatsdSink
# Name: Default: Description:
# host 127.0.0.1 Hostname or IP of StatsD server
# port 8125 Port of StatsD server
# period 10 Poll period
# unit seconds Units of poll period
# prefix EMPTY STRING Prefix to prepend to metric name
2013-09-08 13:47:45 -04:00
## Examples
2013-07-15 22:57:19 -04:00
# Enable JmxSink for all instances by class name
2013-09-08 19:06:32 -04:00
#*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
2013-07-15 22:57:19 -04:00
# Enable ConsoleSink for all instances by class name
2013-09-08 19:06:32 -04:00
#*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
2013-07-03 22:11:30 -04:00
# Enable StatsdSink for all instances by class name
#*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
#*.sink.statsd.prefix=spark
# Polling period for the ConsoleSink
2013-07-03 08:31:53 -04:00
#*.sink.console.period=10
# Unit of the polling period for the ConsoleSink
2013-07-15 22:57:19 -04:00
#*.sink.console.unit=seconds
# Polling period for the ConsoleSink specific for the master instance
2013-07-15 22:57:19 -04:00
#master.sink.console.period=15
# Unit of the polling period for the ConsoleSink specific for the master
# instance
2013-07-15 22:57:19 -04:00
#master.sink.console.unit=seconds
# Enable CsvSink for all instances by class name
2013-09-08 19:06:32 -04:00
#*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
2013-07-15 22:57:19 -04:00
# Polling period for the CsvSink
2013-07-15 22:57:19 -04:00
#*.sink.csv.period=1
# Unit of the polling period for the CsvSink
2013-07-15 22:57:19 -04:00
#*.sink.csv.unit=minutes
# Polling directory for CsvSink
#*.sink.csv.directory=/tmp/
# Polling period for the CsvSink specific for the worker instance
2013-07-15 22:57:19 -04:00
#worker.sink.csv.period=10
# Unit of the polling period for the CsvSink specific for the worker instance
2013-07-15 22:57:19 -04:00
#worker.sink.csv.unit=minutes
2013-07-03 08:31:53 -04:00
# Enable Slf4jSink for all instances by class name
#*.sink.slf4j.class=org.apache.spark.metrics.sink.Slf4jSink
# Polling period for the Slf4JSink
#*.sink.slf4j.period=1
# Unit of the polling period for the Slf4jSink
#*.sink.slf4j.unit=minutes
# Example configuration for Graphite sink
#*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
#*.sink.graphite.host=<graphiteEndPoint_hostName>
#*.sink.graphite.port=<listening_port>
#*.sink.graphite.period=10
#*.sink.graphite.unit=seconds
#*.sink.graphite.prefix=<optional_value>
# Enable JvmSource for instance master, worker, driver and executor
2013-09-08 19:06:32 -04:00
#master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
2013-07-03 08:31:53 -04:00
2013-09-08 19:06:32 -04:00
#worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
2013-07-03 08:31:53 -04:00
2013-09-08 19:06:32 -04:00
#driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
2013-07-03 08:31:53 -04:00
[SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver ### What changes were proposed in this pull request? This PR aims to simplify `Prometheus` support by adding `PrometheusServlet`. The main use cases are `K8s` and `Spark Standalone` cluster environments. ### Why are the changes needed? Prometheus.io is a CNCF project used widely with K8s. - https://github.com/prometheus/prometheus For `Master/Worker/Driver`, `Spark JMX Sink` and `Prometheus JMX Converter` combination is used in many cases. One way to achieve that is having the followings. **JMX Sink (conf/metrics.properties)** ``` *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink ``` **JMX Converter(conf/spark-env.sh)** - https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar ``` export SPARK_DAEMON_JAVA_OPTS= "-javaagent:${PWD}/jmx_prometheus_javaagent-${JMX_PROMETHEUS_VERSION}.jar= ${PORT_AGENT}:jmx_prometheus.yaml" ``` This agent approach requires `PORT_AGENT` additionally. Instead, this PR natively support `Prometheus` format exporting with reusing REST API port for the better UX. ### Does this PR introduce any user-facing change? Yes. New web interfaces are added along with the existing JSON API. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------------- | ---------------------------------- | | Master | /metrics/master/json/ | /metrics/master/prometheus/ | | Master | /metrics/applications/json/ | /metrics/applications/prometheus/ | | Worker | /metrics/json/ | /metrics/prometheus/ | | Driver | /metrics/json/ | /metrics/prometheus/ | ### How was this patch tested? Manually connect the new end-points with `curl`. **Setup (Master/Worker/Driver)** Add the followings at `conf/metrics.properties` (`conf/metrics.properties.template` has these examples) ``` *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet *.sink.prometheusServlet.path=/metrics/prometheus master.sink.prometheusServlet.path=/metrics/master/prometheus applications.sink.prometheusServlet.path=/metrics/applications/prometheus ``` ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 ``` ``` $ curl -s http://localhost:8080/metrics/master/json/ | jq { "version": "3.1.3", "gauges": { "master.aliveWorkers": { "value": 1 }, "master.apps": { "value": 1 }, "master.waitingApps": { "value": 0 }, "master.workers": { "value": 1 } }, ... $ curl -s http://localhost:8080/metrics/master/prometheus/ | grep master metrics_master_aliveWorkers_Value 1 metrics_master_apps_Value 1 metrics_master_waitingApps_Value 0 metrics_master_workers_Value 1 ``` ``` $ curl -s http://localhost:8080/metrics/applications/json/ | jq { "version": "3.1.3", "gauges": { "application.Spark shell.1568261490667.cores": { "value": 16 }, "application.Spark shell.1568261490667.runtime_ms": { "value": 108966 }, "application.Spark shell.1568261490667.status": { "value": "RUNNING" } }, ... $ curl -s http://localhost:8080/metrics/applications/prometheus/ | grep application metrics_application_Spark_shell_1568261490667_cores_Value 16 metrics_application_Spark_shell_1568261490667_runtime_ms_Value 143174 ``` ``` $ curl -s http://localhost:8081/metrics/json/ | jq { "version": "3.1.3", "gauges": { "worker.coresFree": { "value": 0 }, "worker.coresUsed": { "value": 16 }, "worker.executors": { "value": 1 }, "worker.memFree_MB": { "value": 30720 }, "worker.memUsed_MB": { "value": 1024 } }, ... $ curl -s http://localhost:8081/metrics/prometheus/ | grep worker metrics_worker_coresFree_Value 0 metrics_worker_coresUsed_Value 16 metrics_worker_executors_Value 1 metrics_worker_memFree_MB_Value 30720 metrics_worker_memUsed_MB_Value 1024 ``` ``` $ curl -s http://localhost:4040/metrics/json/ | jq { "version": "3.1.3", "gauges": { "app-20190911211130-0000.driver.BlockManager.disk.diskSpaceUsed_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxMem_MB": { "value": 732 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOffHeapMem_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOnHeapMem_MB": { "value": 732 }, ... $ curl -s http://localhost:4040/metrics/prometheus/ | head -n5 metrics_app_20190911211130_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_memUsed_MB_Value 0 ``` Closes #25769 from dongjoon-hyun/SPARK-29032-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
2019-09-13 17:31:21 -04:00
#executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
# Example configuration for PrometheusServlet
#*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
#*.sink.prometheusServlet.path=/metrics/prometheus
#master.sink.prometheusServlet.path=/metrics/master/prometheus
#applications.sink.prometheusServlet.path=/metrics/applications/prometheus