spark-instrumented-optimizer/dev
Venkata krishnan Sowrirajan 73747ecb97 [SPARK-36038][CORE] Speculation metrics summary at stage level
### What changes were proposed in this pull request?

Currently there are no speculation metrics available for Spark either at application/job/stage level. This PR is to add some basic speculation metrics for a stage when speculation execution is enabled.

This is similar to the existing stage level metrics tracking numTotal (total number of speculated tasks), numCompleted (total number of successful speculated tasks), numFailed (total number of failed speculated tasks), numKilled (total number of killed speculated tasks) etc.

With this new set of metrics, it helps further understanding speculative execution feature in the context of the application and also helps in further tuning the speculative execution config knobs.

Screenshot of Spark UI with speculation summary:
![Screen Shot 2021-09-22 at 12 12 20 PM](https://user-images.githubusercontent.com/8871522/135321311-db7699ad-f1ae-4729-afea-d1e2c4e86103.png)

Screenshot of Spark UI with API output:
![Screen Shot 2021-09-22 at 12 10 37 PM](https://user-images.githubusercontent.com/8871522/135321486-4dbb7a67-5580-47f8-bccf-81c758c2e988.png)

### Why are the changes needed?

Additional metrics for speculative execution.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests added and also deployed in our internal platform for quite some time now.

Lead-authored by: Venkata krishnan Sowrirajan <vsowrirajanlinkedin.com>
Co-authored by: Ron Hu <rhulinkedin.com>
Co-authored by: Thejdeep Gudivada <tgudivadalinkedin.com>

Closes #33253 from venkata91/speculation-metrics.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2021-10-01 16:59:29 +09:00
..
ansible-for-test-node [SPARK-32797][SPARK-32391][SPARK-33242][SPARK-32666][ANSIBLE] updating a bunch of python packages 2021-07-21 15:22:06 -07:00
create-release [SPARK-36551][BUILD] Add sphinx-plotly-directive in Spark release Dockerfile 2021-08-20 20:02:24 +08:00
deps [SPARK-36893][BUILD][MESOS] Upgrade mesos into 1.4.3 2021-09-29 21:49:22 -07:00
sparktestsupport [SPARK-36263][SQL][PYTHON] Add Dataframe.observation to PySpark 2021-07-28 01:39:34 +08:00
tests Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-36038][CORE] Speculation metrics summary at stage level 2021-10-01 16:59:29 +09:00
.scalafmt.conf [SPARK-35255][BUILD] Automated formatting for Scala Code for Blank Lines 2021-04-30 11:45:58 +09:00
appveyor-guide.md Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
appveyor-install-dependencies.ps1 [SPARK-33105][INFRA] Change default R arch from i386 to x64 and parametrize BINPREF 2020-10-10 13:48:26 +09:00
change-scala-version.sh [SPARK-36712][BUILD][FOLLOWUP] Improve the regex to avoid breaking pom.xml 2021-09-14 16:26:50 -07:00
check-license [MINOR][INFRA] Suppress warning in check-license 2020-11-23 10:38:40 +09:00
checkstyle-suppressions.xml [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ 2019-11-03 15:13:06 -08:00
checkstyle.xml [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions 2021-06-03 21:52:41 +09:00
eslint.json [SPARK-35175][BUILD] Add linter for JavaScript source files 2021-05-07 21:55:08 +09:00
github_jira_sync.py Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-js [SPARK-35175][BUILD] Add linter for JavaScript source files 2021-05-07 21:55:08 +09:00
lint-python [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins 2021-09-15 09:24:50 +09:00
lint-r [SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors 2019-11-17 10:09:46 -08:00
lint-r.R [MINOR][R] small tidying of sh scripts for R 2020-04-30 16:58:05 -07:00
lint-scala [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
make-distribution.sh [SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh 2020-03-11 08:22:02 -05:00
merge_spark_pr.py [MINOR] Fix usage print to guide pip3 to install jira-python library 2020-09-03 01:10:59 +09:00
mima [SPARK-36780][BUILD] Make dev/mima runs on Java 17 2021-09-17 08:54:49 -07:00
package-lock.json [SPARK-35175][BUILD] Add linter for JavaScript source files 2021-05-07 21:55:08 +09:00
package.json [SPARK-35175][BUILD] Add linter for JavaScript source files 2021-05-07 21:55:08 +09:00
pip-sanity-check.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
reformat-python [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
requirements.txt [SPARK-36092][INFRA][BUILD][PYTHON] Migrate to GitHub Actions with Codecov from Jenkins 2021-08-01 21:37:19 +09:00
run-pip-tests [SPARK-36144][INFRA][TESTS] Use Python 3.9 in run-pip-tests conda environment 2021-07-15 11:45:00 +09:00
run-tests [SPARK-29672][PYSPARK] update spark testing framework to use python3 2019-11-14 10:18:55 -08:00
run-tests-jenkins [SPARK-33535][INFRA][TESTS] Export LANG to en_US.UTF-8 in run-tests-jenkins script 2020-11-24 09:50:10 -08:00
run-tests-jenkins.py [SPARK-36166][TESTS] Support Scala 2.13 test in dev/run-tests.py 2021-07-15 19:26:07 -07:00
run-tests.py [SPARK-36393][BUILD] Try to raise memory for GHA 2021-08-05 01:31:35 -07:00
sbt-checkstyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
scalafmt [SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature 2020-01-23 12:44:43 -08:00
scalastyle Revert "[SPARK-30534][INFRA] Use mvn in dev/scalastyle" 2020-01-21 18:23:03 +09:00
test-dependencies.sh [SPARK-36863][BUILD] Update dependency manifest files for all released Spark artifacts 2021-09-28 19:22:48 +09:00
tox.ini initial commit for skeleton ansible for jenkins worker config 2021-06-30 10:05:27 -07:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.