spark-instrumented-optimizer/common
Marcelo Vanzin 1c70da3bfb [SPARK-20657][CORE] Speed up rendering of the stages page.
There are two main changes to speed up rendering of the tasks list
when rendering the stage page.

The first one makes the code only load the tasks being shown in the
current page of the tasks table, and information related to only
those tasks. One side-effect of this change is that the graph that
shows task-related events now only shows events for the tasks in
the current page, instead of the previously hardcoded limit of "events
for the first 1000 tasks". That ends up helping with readability,
though.

To make sorting efficient when using a disk store, the task wrapper
was extended to include many new indices, one for each of the sortable
columns in the UI, and metrics for which quantiles are calculated.

The second changes the way metric quantiles are calculated for stages.
Instead of using the "Distribution" class to process data for all task
metrics, which requires scanning all tasks of a stage, the code now
uses the KVStore "skip()" functionality to only read tasks that contain
interesting information for the quantiles that are desired.

This is still not cheap; because there are many metrics that the UI
and API track, the code needs to scan the index for each metric to
gather the information. Savings come mainly from skipping deserialization
when using the disk store, but the in-memory code also seems to be
faster than before (most probably because of other changes in this
patch).

To make subsequent calls faster, some quantiles are cached in the
status store. This makes UIs much faster after the first time a stage
has been loaded.

With the above changes, a lot of code in the UI layer could be simplified.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #20013 from vanzin/SPARK-20657.
2018-01-11 19:41:48 +08:00
..
kvstore [SPARK-20657][CORE] Speed up rendering of the stages page. 2018-01-11 19:41:48 +08:00
network-common [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service 2018-01-04 11:39:42 -08:00
network-shuffle [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service 2018-01-04 11:39:42 -08:00
network-yarn [SPARK-17321][YARN] Avoid writing shuffle metadata to disk if NM recovery is disabled 2017-08-31 09:26:20 +08:00
sketch [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
tags [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT 2017-04-24 21:48:04 -07:00
unsafe [SPARK-22997] Add additional defenses against use of freed MemoryBlocks 2018-01-10 00:45:47 -08:00