Updating ACM stylesheet and cleaning up the nastier consequences

master
Oliver Kennedy 2020-12-20 19:07:41 -05:00
parent a2b8867edb
commit 163d4007f4
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
4 changed files with 548 additions and 204 deletions

View File

@ -1,7 +1,7 @@
%root: main.tex
%!TEX root=./main.tex
\begin{abstract}
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in each of the database's possible worlds.
The problem of computing the marginal probability of a tuple in the result of a query over set-probabilistic databases (PDBs) can be reduced to calculating the probability of the \emph{lineage formula} of the result, a Boolean formula over random variables representing the existence of tuples in the database's possible worlds.
The analog for bag semantics is a natural number-valued polynomial over random variables that evaluates to the multiplicity of the tuple in each world.
In this work, we study the problem of calculating the expectation of such polynomials (a tuple's expected multiplicity) exactly and approximately.
For tuple-independent databases (TIDBs), the expected multiplicity of a query result tuple can trivially be computed in linear time in the size of the tuple's lineage, if this polynomial is encoded as a sum of products.

File diff suppressed because it is too large Load Diff

View File

@ -22,8 +22,8 @@ Analogously, this problem can be reduced to computing the expectation of the lin
This problem has received much less attention, perhaps because the problem is trivially tractable.
In fact it is linear time when the lineage polynomial is encoded in the typical sum of products (SOP) representation.
However, there exist compressed representations of polynomials, e.g., factorizations~\cite{factorized-db}, that can be polynomially more concise than the SOP representation of a polynomial.
These compression schemes are close analogs of typical database optimizations like projection push-down~\cite{DBLP:conf/pods/KhamisNR16}, hinting that perhaps even Bag-PDBs inherently have higher query processing complexity than deterministic databases.
In this paper, we confirm this intuition, first proving (by reduction from counting $k$-matchings) that computing the expected count of a query result tuple is super-linear (\sharpwonehard) in the size of a compressed (factorized~\cite{factorized-db}) lineage representation, and then relating the size of the compressed lineage to the cost of answering a deterministic query.
These compression schemes are analogous to typical database optimizations like projection push-down~\cite{DBLP:conf/pods/KhamisNR16}, hinting that perhaps even Bag-PDBs have higher query processing complexity than deterministic databases.
In this paper, we confirm this intuition, first proving (by reduction from counting $k$-matchings) that computing the expected count of a query result tuple is super-linear (\sharpwonehard) in the size of a compressed lineage representation, and then relating the size of the compressed lineage to the cost of answering a deterministic query.
In spite of this negative result, not everything is lost.
We develop an approximation algorithm for expected counts of SPJU query results over Bag-PDBs that is, to our knowledge, the the first linear time (in the size of the factorized lineage) $(1-\epsilon)$-approximation.

View File

@ -95,6 +95,7 @@ sensitive=true
% \orcid{1234-5678-9012}
\affiliation{%
\institution{Illinois Institute of Technology}
\country{USA}
}
\email{sfeng14@hawk.iit.edu,bglavic@iit.edu}
@ -102,6 +103,7 @@ sensitive=true
% \orcid{1234-5678-9012}
\affiliation{%
\institution{University at Buffalo}
\country{USA}
}
\email{ahuber,okennedy,atri@buffalo.edu}