Merge branch 'master' of gitlab.odin.cse.buffalo.edu:odin-lab/Website
This commit is contained in:
commit
0e79b62d13
29
src/news/2018-11-02-SQL_Query_Log_Summaries.md
Normal file
29
src/news/2018-11-02-SQL_Query_Log_Summaries.md
Normal file
|
@ -0,0 +1,29 @@
|
|||
---
|
||||
title: Query Log Compression for Workload Analytics
|
||||
author: Ting Xie
|
||||
---
|
||||
|
||||
Analyzing database access logs is a key part of performance tuning, intrusion
|
||||
detection, and many other database administration tasks. Unfortunately, it is
|
||||
common for production databases to deal with millions or even more queries
|
||||
each day, so these logs must be summarized before they can be used. On one
|
||||
hand, we want to compress logs to facilitate efficient storage and human inspection.
|
||||
On the other hand, we want to accurately infer frequencies of patterns that
|
||||
are of interest to workload-analytic applications. We established a framework
|
||||
for inferring pattern frequencies in a principled way using only a small subset
|
||||
of patterns and proposed an efficiently computable measure of overall inference
|
||||
accuracy. Achieving higher accuracy requires more patterns, but we found that
|
||||
runtime of pattern mining algorithms also steeply increase. We hypothesize that
|
||||
this is due to mixing workloads and proposed to partition the log into separate
|
||||
clusters. By clustering, the search space of candidate patterns are reduced and
|
||||
we empirically showed that state-of-the-art pattern mining algorithms can be
|
||||
greatly improved both in runtime and accuracy. We further improved the effectiveness
|
||||
of clustering to the extent that as we create more clusters, each cluster
|
||||
becomes easy enough for pattern mining such that different algorithms do not
|
||||
vary much in accuracy. As a result, we finally proposed naive mixture encodings
|
||||
which focuses on partitioning workload mixtures and summarize each partition
|
||||
using the most efficient though naive encoding. We showed that naive mixture
|
||||
encoding is orders of magnitude faster to construct and provides summarization
|
||||
accuracy competitive with more complicated pattern mining algorithms.
|
||||
|
||||
Read more in the [preprint](https://odin.cse.buffalo.edu/papers/2018/VLDB-LogCompression.pdf)
|
BIN
src/papers/2018/VLDB-LogCompression.pdf
Normal file
BIN
src/papers/2018/VLDB-LogCompression.pdf
Normal file
Binary file not shown.
Loading…
Reference in a new issue