7b9d7551a6
### What changes were proposed in this pull request? The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. #28412 introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of #28412 , mridulm mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll(). ### Why are the changes needed? I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement. when the disk is at 0% utilization: | log size, jobs and tasks per job | original switching time, with write() | switching time with writeAll() | | ---------------------------------- | ------------------------------------- | ------------------------------ | | 133m, 400 jobs, 100 tasks per job | 16s | 13s | | 265m, 400 jobs, 200 tasks per job | 30s | 23s | | 1.3g, 1000 jobs, 400 tasks per job | 136s | 108s | when the disk is at 100% utilization: | log size, jobs and tasks per job | original switching time, with write() | switching time with writeAll() | | --------------------------------- | ------------------------------------- | ------------------------------ | | 133m, 400 jobs, 100 tasks per job | 116s | 17s | | 265m, 400 jobs, 200 tasks per job | 251s | 26s | I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects. The tests were conducted when the disk is at 0% utilization. | Benchmark test | with write(), ms | with writeAll(), ms | | ------------------------ | ---------------- | ------------------- | | randomUpdatesIndexed | 213.06 | 157.356 | | randomUpdatesNoIndex | 57.869 | 35.439 | | randomWritesIndexed | 298.854 | 229.274 | | randomWritesNoIndex | 66.764 | 38.361 | | sequentialUpdatesIndexed | 87.019 | 56.219 | | sequentialUpdatesNoIndex | 61.851 | 41.942 | | sequentialWritesIndexed | 94.044 | 56.534 | | sequentialWritesNoIndex | 118.345 | 66.483 | ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested. Closes #29149 from baohe-zhang/SPARK-32350. Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> |
||
---|---|---|
.. | ||
kvstore | ||
network-common | ||
network-shuffle | ||
network-yarn | ||
sketch | ||
tags | ||
unsafe |