[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : https://github.com/apache/spark/pull/27208#pullrequestreview-347451714

This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

<img width="951" alt="Screen Shot 2020-02-16 at 2 23 18 PM" src="https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png">

> change on the table

<img width="1126" alt="Screen Shot 2020-01-30 at 5 08 12 PM" src="https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png">

Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
Jungtaek Lim (HeartSaVioR) 2020-02-25 15:17:16 -08:00 committed by Dongjoon Hyun
parent 8f247e5d36
commit 02f8165343

View file

@ -95,6 +95,48 @@ The history server can be configured as follows:
</tr> </tr>
</table> </table>
### Applying compaction on rolling event log files
A long-running application (e.g. streaming) can bring a huge single event log file which may cost a lot to maintain and
also requires a bunch of resource to replay per each update in Spark History Server.
Enabling <code>spark.eventLog.rolling.enabled</code> and <code>spark.eventLog.rolling.maxFileSize</code> would
let you have rolling event log files instead of single huge event log file which may help some scenarios on its own,
but it still doesn't help you reducing the overall size of logs.
Spark History Server can apply compaction on the rolling event log files to reduce the overall size of
logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the
Spark History Server.
Details will be described below, but please note in prior that compaction is LOSSY operation.
Compaction will discard some events which will be no longer seen on UI - you may want to check which events will be discarded
before enabling the option.
When the compaction happens, the History Server lists all the available event log files for the application, and considers
the event log files having less index than the file with smallest index which will be retained as target of compaction.
For example, if the application A has 5 event log files and <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, then first 3 log files will be selected to be compacted.
Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them
into one compact file with discarding events which are decided to exclude.
The compaction tries to exclude the events which point to the outdated data. As of now, below describes the candidates of events to be excluded:
* Events for the job which is finished, and related stage/tasks events
* Events for the executor which is terminated
* Events for the SQL execution which is finished, and related job/stage/tasks events
Once rewriting is done, original log files will be deleted, via best-effort manner. The History Server may not be able to delete
the original log files, but it will not affect the operation of the History Server.
Please note that Spark History Server may not compact the old event log files if figures out not a lot of space
would be reduced during compaction. For streaming query we normally expect compaction
will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won't run
in many cases for batch query.
Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. Under some circumstances,
the compaction may exclude more events than you expect, leading some UI issues on History Server for the application.
Use it with caution.
### Spark History Server Configuration Options ### Spark History Server Configuration Options
Security options for the Spark History Server are covered more detail in the Security options for the Spark History Server are covered more detail in the
@ -303,19 +345,8 @@ Security options for the Spark History Server are covered more detail in the
<td>Int.MaxValue</td> <td>Int.MaxValue</td>
<td> <td>
The maximum number of event log files which will be retained as non-compacted. By default, The maximum number of event log files which will be retained as non-compacted. By default,
all event log files will be retained.<br/> all event log files will be retained. The lowest value is 1 for technical reason.<br/>
Please note that compaction will happen in Spark History Server, which means this configuration Please read the section of "Applying compaction of old event log files" for more details.
should be set to the configuration of Spark History server, and the same value will be applied
across applications which are being loaded in Spark History Server. This also means compaction
and cleanup would require running Spark History Server.<br/>
Please set the configuration in Spark History Server, and <code>spark.eventLog.rolling.maxFileSize</code>
in each application accordingly if you want to control the overall size of event log files.
The event log files older than these retained files will be compacted into single file and
deleted afterwards.<br/>
NOTE: Spark History Server may not compact the old event log files if it figures
out not a lot of space would be reduced during compaction. For streaming query
(including Structured Streaming) we normally expect compaction will run, but for
batch query compaction won't run in many cases.
</td> </td>
</tr> </tr>
</table> </table>