8f057a9612
### What changes were proposed in this pull request? This PR aims to simplify `Prometheus` support by adding `PrometheusServlet`. The main use cases are `K8s` and `Spark Standalone` cluster environments. ### Why are the changes needed? Prometheus.io is a CNCF project used widely with K8s. - https://github.com/prometheus/prometheus For `Master/Worker/Driver`, `Spark JMX Sink` and `Prometheus JMX Converter` combination is used in many cases. One way to achieve that is having the followings. **JMX Sink (conf/metrics.properties)** ``` *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink ``` **JMX Converter(conf/spark-env.sh)** - https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar ``` export SPARK_DAEMON_JAVA_OPTS= "-javaagent:${PWD}/jmx_prometheus_javaagent-${JMX_PROMETHEUS_VERSION}.jar= ${PORT_AGENT}:jmx_prometheus.yaml" ``` This agent approach requires `PORT_AGENT` additionally. Instead, this PR natively support `Prometheus` format exporting with reusing REST API port for the better UX. ### Does this PR introduce any user-facing change? Yes. New web interfaces are added along with the existing JSON API. | | JSON End Point | Prometheus End Point | | ------- | ------------------------------------------- | ---------------------------------- | | Master | /metrics/master/json/ | /metrics/master/prometheus/ | | Master | /metrics/applications/json/ | /metrics/applications/prometheus/ | | Worker | /metrics/json/ | /metrics/prometheus/ | | Driver | /metrics/json/ | /metrics/prometheus/ | ### How was this patch tested? Manually connect the new end-points with `curl`. **Setup (Master/Worker/Driver)** Add the followings at `conf/metrics.properties` (`conf/metrics.properties.template` has these examples) ``` *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet *.sink.prometheusServlet.path=/metrics/prometheus master.sink.prometheusServlet.path=/metrics/master/prometheus applications.sink.prometheusServlet.path=/metrics/applications/prometheus ``` ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 ``` ``` $ curl -s http://localhost:8080/metrics/master/json/ | jq { "version": "3.1.3", "gauges": { "master.aliveWorkers": { "value": 1 }, "master.apps": { "value": 1 }, "master.waitingApps": { "value": 0 }, "master.workers": { "value": 1 } }, ... $ curl -s http://localhost:8080/metrics/master/prometheus/ | grep master metrics_master_aliveWorkers_Value 1 metrics_master_apps_Value 1 metrics_master_waitingApps_Value 0 metrics_master_workers_Value 1 ``` ``` $ curl -s http://localhost:8080/metrics/applications/json/ | jq { "version": "3.1.3", "gauges": { "application.Spark shell.1568261490667.cores": { "value": 16 }, "application.Spark shell.1568261490667.runtime_ms": { "value": 108966 }, "application.Spark shell.1568261490667.status": { "value": "RUNNING" } }, ... $ curl -s http://localhost:8080/metrics/applications/prometheus/ | grep application metrics_application_Spark_shell_1568261490667_cores_Value 16 metrics_application_Spark_shell_1568261490667_runtime_ms_Value 143174 ``` ``` $ curl -s http://localhost:8081/metrics/json/ | jq { "version": "3.1.3", "gauges": { "worker.coresFree": { "value": 0 }, "worker.coresUsed": { "value": 16 }, "worker.executors": { "value": 1 }, "worker.memFree_MB": { "value": 30720 }, "worker.memUsed_MB": { "value": 1024 } }, ... $ curl -s http://localhost:8081/metrics/prometheus/ | grep worker metrics_worker_coresFree_Value 0 metrics_worker_coresUsed_Value 16 metrics_worker_executors_Value 1 metrics_worker_memFree_MB_Value 30720 metrics_worker_memUsed_MB_Value 1024 ``` ``` $ curl -s http://localhost:4040/metrics/json/ | jq { "version": "3.1.3", "gauges": { "app-20190911211130-0000.driver.BlockManager.disk.diskSpaceUsed_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxMem_MB": { "value": 732 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOffHeapMem_MB": { "value": 0 }, "app-20190911211130-0000.driver.BlockManager.memory.maxOnHeapMem_MB": { "value": 732 }, ... $ curl -s http://localhost:4040/metrics/prometheus/ | head -n5 metrics_app_20190911211130_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Value 0 metrics_app_20190911211130_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Value 732 metrics_app_20190911211130_0000_driver_BlockManager_memory_memUsed_MB_Value 0 ``` Closes #25769 from dongjoon-hyun/SPARK-29032-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> |
||
---|---|---|
.. | ||
fairscheduler.xml.template | ||
log4j.properties.template | ||
metrics.properties.template | ||
slaves.template | ||
spark-defaults.conf.template | ||
spark-env.sh.template |