More changes per @atri 112420 Element suggestions

This commit is contained in:
Aaron Huber 2020-11-25 13:36:09 -05:00
parent d8366d1b4e
commit ca2ef2b233

View file

@ -64,29 +64,29 @@ In practice, modern production databases, e.g., Postgres, Oracle, etc. use bag s
\draw (left)--(right);
\draw (right)--(top);
\end{tikzpicture}
\caption{Output edges of $\poly$}
\caption{Graph of tuples in table E}
\label{fig:intro-ex-graph}
\end{figure}
\begin{Example}\label{ex:intro}
Suppose we are given the following query $\poly() := R(A), E(A, B), R(B)$ over a Tuple Independent Database ($\ti$). The $\ti$ relations are given in ~\cref{fig:intro-ex}. While for completeness we should include annotations for Table E, since each tuple has a probability of $1$, we drop them for simplicity. The output for $\poly$ can be visualized as the graph in ~\cref{fig:intro-ex-graph}.
Suppose we are given the following boolean query $\poly() := R(A), E(A, B), R(B)$ over a Tuple Independent Database ($\ti$), where the output polynomial will consist of all tuple annotations contributing to the output. The $\ti$ relations are given in ~\cref{fig:intro-ex}. While for completeness we should include annotations for Table E, since each tuple has a probability of $1$, we drop them for simplicity. Note that the attribute column $\Phi$ contains a variable/value, where in the former case the variable ranges in $[0, 1]$ denoting its marginal probability of appearing in the set of possible world, and the latter is the fixed (marginal) probability of the tuple across the set of possible worlds. Finally, see that the tuples in table E can be visualized as the graph in ~\cref{fig:intro-ex-graph}.
\end{Example}
While our work handles Block Independent Disjoint Databases ($\bi$), for now we consider the $\ti$ model. Define the probability distribution to be $P[W_i = 1] = \prob$ for $i$ in $\{a, b, c\}$.
Note that the query of ~\cref{ex:intro} in set semantics is indeed \#-P hard, since it is a query that is non-hierarchical, i.e., for $Vars(\poly)$ denoting the set of variables occuring across all atoms of $\poly$, a function $sg(x)$ whose output is the set of all atoms that contain variable $x$, we have that $sg(A) \cap sg(B) \neq \emptyset$ and $sg(A)\not\subseteq sg(B)$ and $sg(B)\not\subseteq sg(A)$, as defined by Dalvi and Suciu in ~\cite{10.1145/1265530.1265571}. Thus, computing $\expct\pbox{\poly(W_a, W_b, W_c)}$, i.e. the probability of the output tuple with annotation $\poly(W_a, W_b, W_c)$, ($\prob(q)$ in Dalvi, Sucui) is hard in set semantics. To see this intuitively, for query $\poly$ over set semantics, we have that the output polynomial $\poly(W_a, W_b, W_c) = W_aW_b \vee W_bW_c \vee W_cW_a$. Note that the conjunctive clauses are not independent and the computation of the probability is not linear in the size of $\poly(W_a, W_b, W_c)$ but exponential in the worst case.
Note that the query of ~\cref{ex:intro} in set semantics is indeed \#-P hard, since it is a query that is non-hierarchical, i.e., for $Vars(\poly)$ denoting the set of variables occuring across all atoms of $\poly$, a function $sg(x)$ whose output is the set of all atoms that contain variable $x$, we have that $sg(A) \cap sg(B) \neq \emptyset$ and $sg(A)\not\subseteq sg(B)$ and $sg(B)\not\subseteq sg(A)$, as defined by Dalvi and Suciu in ~\cite{10.1145/1265530.1265571}. Thus, computing $\expct\pbox{\poly(W_a, W_b, W_c)}$, i.e. the probability of the output with annotation $\poly(W_a, W_b, W_c)$, ($\prob(q)$ in Dalvi, Sucui) is hard in set semantics. To see this intuitively, for query $\poly$ over set semantics, we have that the output polynomial $\poly(W_a, W_b, W_c) = W_aW_b \vee W_bW_c \vee W_cW_a$. Note that the conjunctive clauses are not independent and the computation of the probability is not linear in the size of $\poly(W_a, W_b, W_c)$ but exponential in the worst case.
%Using Shannon's Expansion,
%\begin{align*}
%&W_aW_b \vee W_bW_c \vee W_cW_a
%= &W_a
%\end{align*}
However, in the bag setting, the output polynomial is $\poly(W_a, W_b, W_c) = W_aW_b + W_bW_c + W_cW_a$. The expectation computation the output polynomial is a computation of what the 'average' multiplicity of the tuple across possible worlds. In ~\cref{ex:intro}, the expectation is simply
\AH{The value $\expct\pbox{\poly(W_a, W_b, W_c)}$ needs to be computed, but I don't think I've arrived at a correct answer. In the interest of time, I am coming back to this. I appreciate any help as I feel \textit{ashamedly} lacking education on this. I have googled \textit{extensively} about this, but most instructional resources involve using shannon's expansion as a tool for multiplexers and not directly for computing the probability of propositional formulas...and I haven't seemed to make the connections.}
However, in the bag setting, the output polynomial is $\poly(W_a, W_b, W_c) = W_aW_b + W_bW_c + W_cW_a$. The expectation computation over the output polynomial is a computation of what the 'average' multiplicity of the number of tuples contributing to the output across possible worlds. In ~\cref{ex:intro}, the expectation is simply
\begin{align*}
&\expct\pbox{\poly(W_a, W_b, W_c)} = \expct\pbox{W_aW_b} + \expct\pbox{W_bW_c} + \expct\pbox{W_cW_a}\\
= &\expct\pbox{W_a}\expct\pbox{W_b} + \expct\pbox{W_b}\expct\pbox{W_c} + \expct\pbox{W_c}\expct\pbox{W_a}\\
= &\prob^2 + \prob^2 + \prob^2,
= &\prob^2 + \prob^2 + \prob^2 = 3\prob^2,
\end{align*}
which is indeed linear in the size of the output polynomial as the number of operations in the computation is \textit{exactly} the number of output polynomial operations. Note that the answer is the same as $\poly(\prob, \prob, \prob)$, although this is coincidental and not true for the general case.
which is indeed linear in the size of the output polynomial as the number of operations in the computation is \textit{exactly} the number of output polynomial operations. The above equalities hold, since expectation is linear over addition of the natural numbers. Further, we exploited linearity of expectation over multiplication since in the $\ti$ model, all variables are independent. Note that the answer is the same as $\poly(\prob, \prob, \prob)$, although this is coincidental and not true for the general case.
Now, consider the query
\begin{equation*}