From 89aba69378cec141fd99dd8ac79adc9e07d90755 Mon Sep 17 00:00:00 2001 From: Liang-Chi Hsieh Date: Fri, 6 Sep 2019 15:56:50 -0700 Subject: [PATCH] [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for Query Plan ### What changes were proposed in this pull request? This patch adds the description of common SQL metrics in web ui document. ### Why are the changes needed? The current web ui document describes query plan but does not describe the meaning SQL metrics. For end users, they might not understand the meaning of the metrics. ### Does this PR introduce any user-facing change? No. This is just documentation change. ### How was this patch tested? Built the docs locally. ![image](https://user-images.githubusercontent.com/11567269/64463485-1583d800-d0b9-11e9-9916-141f5c09f009.png) Closes #25658 from viirya/SPARK-28935. Lead-authored-by: Liang-Chi Hsieh Co-authored-by: Xiao Li Signed-off-by: Xiao Li --- docs/web-ui.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/docs/web-ui.md b/docs/web-ui.md index 9b22926016..72423d9468 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -363,6 +363,41 @@ number of written shuffle records, total data size, etc. Clicking the 'Details' link on the bottom displays the logical plans and the physical plan, which illustrate how Spark parses, analyzes, optimizes and performs the query. +### SQL metrics + +The metrics of SQL operators are shown in the block of physical operators. The SQL metrics can be useful +when we want to dive into the execution details of each operator. For example, "number of output rows" +can answer how many rows are output after a Filter operator, "shuffle bytes written total" in an Exchange +operator shows the number of bytes written by a shuffle. + +Here is the list of SQL metrics: + + + + + + + + + + + + + + + + + + + + + + + + + + +
SQL metricsMeaningOperators
number of output rows the number of output rows of the operator Aggregate operators, Join operators, Sample, Range, Scan operators, Filter, etc.
data size the size of broadcast/shuffled/collected data of the operator BroadcastExchange, ShuffleExchange, Subquery
time to collect the time spent on collecting data BroadcastExchange, Subquery
scan time the time spent on scanning data ColumnarBatchScan, FileSourceScan
metadata time the time spent on getting metadata like number of partitions, number of files FileSourceScan
shuffle bytes written the number of bytes written CollectLimit, TakeOrderedAndProject, ShuffleExchange
shuffle records written the number of records written CollectLimit, TakeOrderedAndProject, ShuffleExchange
shuffle write time the time spent on shuffle writing CollectLimit, TakeOrderedAndProject, ShuffleExchange
remote blocks read the number of blocks read remotely CollectLimit, TakeOrderedAndProject, ShuffleExchange
remote bytes read the number of bytes read remotely CollectLimit, TakeOrderedAndProject, ShuffleExchange
remote bytes read to disk the number of bytes read from remote to local disk CollectLimit, TakeOrderedAndProject, ShuffleExchange
local blocks read the number of blocks read locally CollectLimit, TakeOrderedAndProject, ShuffleExchange
local bytes read the number of bytes read locally CollectLimit, TakeOrderedAndProject, ShuffleExchange
fetch wait time the time spent on fetching data (local and remote) CollectLimit, TakeOrderedAndProject, ShuffleExchange
records read the number of read records CollectLimit, TakeOrderedAndProject, ShuffleExchange
sort time the time spent on sorting Sort
peak memory the peak memory usage in the operator Sort, HashAggregate
spill size number of bytes spilled to disk from memory in the operator Sort, HashAggregate
time in aggregation build the time spent on aggregation HashAggregate, ObjectHashAggregate
avg hash probe bucket list iters the average bucket list iterations per lookup during aggregation HashAggregate
data size of build side the size of built hash map ShuffledHashJoin
time to build hash map the time spent on building hash map ShuffledHashJoin
## Streaming Tab The web UI includes a Streaming tab if the application uses Spark streaming. This tab displays