spark-instrumented-optimizer/dev/.rat-excludes
Wing Yew Poon 80ab19b9fd [SPARK-26329][CORE] Faster polling of executor memory metrics.
## What changes were proposed in this pull request?

Prior to this change, in an executor, on each heartbeat, memory metrics are polled and sent in the heartbeat. The heartbeat interval is 10s by default. With this change, in an executor, memory metrics can optionally be polled in a separate poller at a shorter interval.

For each executor, we use a map of (stageId, stageAttemptId) to (count of running tasks, executor metric peaks) to track what stages are active as well as the per-stage memory metric peaks. When polling the executor memory metrics, we attribute the memory to the active stage(s), and update the peaks. In a heartbeat, we send the per-stage peaks (for stages active at that time), and then reset the peaks. The semantics would be that the per-stage peaks sent in each heartbeat are the peaks since the last heartbeat.

We also keep a map of taskId to memory metric peaks. This tracks the metric peaks during the lifetime of the task. The polling thread updates this as well. At end of a task, we send the peak metric values in the task result. In case of task failure, we send the peak metric values in the `TaskFailedReason`.

We continue to do the stage-level aggregation in the EventLoggingListener.

For the driver, we still only poll on heartbeats. What the driver sends will be the current values of the metrics in the driver at the time of the heartbeat. This is semantically the same as before.

## How was this patch tested?

Unit tests. Manually tested applications on an actual system and checked the event logs; the metrics appear in the SparkListenerTaskEnd and SparkListenerStageExecutorMetrics events.

Closes #23767 from wypoon/wypoon_SPARK-26329.

Authored-by: Wing Yew Poon <wypoon@cloudera.com>
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2019-08-01 09:09:46 -05:00

119 lines
1.8 KiB
Plaintext

target
cache
.gitignore
.gitattributes
.project
.classpath
.mima-excludes
.generated-mima-excludes
.generated-mima-class-excludes
.generated-mima-member-excludes
.rat-excludes
.*md
derby.log
licenses/*
licenses-binary/*
LICENSE
NOTICE
TAGS
RELEASE
control
docs
slaves
spark-env.cmd
bootstrap-tooltip.js
jquery-3.4.1.min.js
d3.min.js
dagre-d3.min.js
graphlib-dot.min.js
sorttable.js
vis.min.js
vis.min.css
dataTables.bootstrap.css
dataTables.bootstrap.min.js
dataTables.rowsGroup.js
jquery.blockUI.min.js
jquery.cookies.2.2.0.min.js
jquery.dataTables.1.10.18.min.css
jquery.dataTables.1.10.18.min.js
jquery.mustache.js
jsonFormatter.min.css
jsonFormatter.min.js
.*avsc
.*txt
.*json
.*data
.*log
pyspark-coverage-site/
cloudpickle.py
heapq3.py
join.py
SparkExprTyper.scala
SparkILoop.scala
SparkILoopInit.scala
SparkIMain.scala
SparkImports.scala
SparkJLineCompletion.scala
SparkJLineReader.scala
SparkMemberHandlers.scala
SparkReplReporter.scala
sbt
sbt-launch-lib.bash
plugins.sbt
work
.*\.q
.*\.qv
golden
test.out/*
.*iml
service.properties
db.lck
build/*
dist/*
.*out
.*ipr
.*iws
logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
known_translations
json_expectation
app-20180109111548-0000
app-20161115172038-0000
app-20161116163331-0000
application_1516285256255_0012
application_1553914137147_0018
stat
local-1422981759269
local-1422981780767
local-1425081759269
local-1426533911241
local-1426633911242
local-1430917381534
local-1430917381535_1
local-1430917381535_2
DESCRIPTION
NAMESPACE
test_support/*
.*Rd
help/*
html/*
INDEX
.lintr
gen-java.*
.*avpr
.*parquet
spark-deps-.*
.*csv
.*tsv
.*\.sql
.Rbuildignore
META-INF/*
spark-warehouse
structured-streaming/*
kafka-source-initial-offset-version-2.1.0.bin
kafka-source-initial-offset-future-version.bin
vote.tmpl
SessionManager.java
SessionHandler.java