Merge branch 'master' of gitlab.odin.cse.buffalo.edu:odin-lab/Website

This commit is contained in:
Oliver Kennedy 2018-11-05 10:27:11 -05:00
commit 0e79b62d13
2 changed files with 29 additions and 0 deletions

View file

@ -0,0 +1,29 @@
---
title: Query Log Compression for Workload Analytics
author: Ting Xie
---
Analyzing database access logs is a key part of performance tuning, intrusion
detection, and many other database administration tasks. Unfortunately, it is
common for production databases to deal with millions or even more queries
each day, so these logs must be summarized before they can be used. On one
hand, we want to compress logs to facilitate efficient storage and human inspection.
On the other hand, we want to accurately infer frequencies of patterns that
are of interest to workload-analytic applications. We established a framework
for inferring pattern frequencies in a principled way using only a small subset
of patterns and proposed an efficiently computable measure of overall inference
accuracy. Achieving higher accuracy requires more patterns, but we found that
runtime of pattern mining algorithms also steeply increase. We hypothesize that
this is due to mixing workloads and proposed to partition the log into separate
clusters. By clustering, the search space of candidate patterns are reduced and
we empirically showed that state-of-the-art pattern mining algorithms can be
greatly improved both in runtime and accuracy. We further improved the effectiveness
of clustering to the extent that as we create more clusters, each cluster
becomes easy enough for pattern mining such that different algorithms do not
vary much in accuracy. As a result, we finally proposed naive mixture encodings
which focuses on partitioning workload mixtures and summarize each partition
using the most efficient though naive encoding. We showed that naive mixture
encoding is orders of magnitude faster to construct and provides summarization
accuracy competitive with more complicated pattern mining algorithms.
Read more in the [preprint](https://odin.cse.buffalo.edu/papers/2018/VLDB-LogCompression.pdf)

Binary file not shown.