spark-instrumented-optimizer/sql/core
Jungtaek Lim (HeartSaVioR) 5a258b0b67
[SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
### What changes were proposed in this pull request?

This patch adds the new method `getLatestBatchId()` in CompactibleFileStreamLog in complement of getLatest() which doesn't read the content of the latest batch metadata log file, and apply to both FileStreamSource and FileStreamSink to avoid unnecessary latency on reading log file.

### Why are the changes needed?

Once compacted metadata log file becomes huge, writing outputs for the compact + 1 batch is also affected due to unnecessarily reading the compacted metadata log file. This unnecessary latency can be simply avoided.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New UT. Also manually tested under query which has huge metadata log on file stream sink:

> before applying the patch

![Screen Shot 2020-02-21 at 4 20 19 PM](https://user-images.githubusercontent.com/1317309/75016223-d3ffb180-54cd-11ea-9063-49405943049d.png)

> after applying the patch

![Screen Shot 2020-02-21 at 4 06 18 PM](https://user-images.githubusercontent.com/1317309/75016220-d235ee00-54cd-11ea-81a7-7c03a43c4db4.png)

Peaks are compact batches - please compare the next batch after compact batches, especially the area of "light brown".

Closes #27664 from HeartSaVioR/SPARK-30915.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
2020-05-22 16:46:17 -07:00
..
benchmarks [SPARK-31630][SQL] Fix perf regression by skipping timestamps rebasing after some threshold 2020-05-05 14:11:53 +00:00
src [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID 2020-05-22 16:46:17 -07:00
v1.2/src [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC 2020-04-26 15:49:00 -07:00
v2.3/src [SPARK-31489][SQL] Fix pushing down filters with java.time.LocalDate values in ORC 2020-04-26 15:49:00 -07:00
pom.xml Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" 2020-05-21 16:00:58 -07:00