Merge branch 'master' of gitlab.odin.cse.buffalo.edu:odin-lab/Website

2018-11-05 10:27:11 -05:00 · 2018-11-05 10:27:11 -05:00 · 0e79b62d13
parent a49baf4420 de0f7e9405
commit 0e79b62d13
2 changed files with 29 additions and 0 deletions
--- a/src/news/2018-11-02-SQL_Query_Log_Summaries.md
+++ b/src/news/2018-11-02-SQL_Query_Log_Summaries.md
@ -0,0 +1,29 @@
+---
+title: Query Log Compression for Workload Analytics
+author: Ting Xie
+---
+
+Analyzing database access logs is a key part of performance tuning, intrusion
+detection, and many other database administration tasks. Unfortunately, it is
+common for production databases to deal with millions or even more queries
+each day, so these logs must be summarized before they can be used. On one
+hand, we want to compress logs to facilitate efficient storage and human inspection.
+On the other hand, we want to accurately infer frequencies of patterns that
+are of interest to workload-analytic applications. We established a framework
+for inferring pattern frequencies in a principled way using only a small subset
+of patterns and proposed an efficiently computable measure of overall inference
+accuracy. Achieving higher accuracy requires more patterns, but we found that
+runtime of pattern mining algorithms also steeply increase. We hypothesize that
+this is due to mixing workloads and proposed to partition the log into separate
+clusters. By clustering, the search space of candidate patterns are reduced and
+we empirically showed that state-of-the-art pattern mining algorithms can be
+greatly improved both in runtime and accuracy. We further improved the effectiveness
+of clustering to the extent that as we create more clusters, each cluster
+becomes easy enough for pattern mining such that different algorithms do not
+vary much in accuracy. As a result, we finally proposed naive mixture encodings
+which focuses on partitioning workload mixtures and summarize each partition
+using the most efficient though naive encoding. We showed that naive mixture
+encoding is orders of magnitude faster to construct and provides summarization
+accuracy competitive with more complicated pattern mining algorithms.
+
+Read more in the [preprint](https://odin.cse.buffalo.edu/papers/2018/VLDB-LogCompression.pdf)
--- a/src/papers/2018/VLDB-LogCompression.pdf
+++ b/src/papers/2018/VLDB-LogCompression.pdf