Fixing conflicts.

This commit is contained in:
Aaron Huber 2020-12-11 20:34:29 -05:00
commit 43814e2488
4 changed files with 28 additions and 4 deletions

14
conclusions.tex Normal file
View file

@ -0,0 +1,14 @@
\section{Conclusions and Future Work}\label{sec:concl-future-work}
We have studied the problem of calculating the expectation of polynomials over random integer variables. This problem has a practical application in probabilistic databases over multisets where it corresponds to calculating the expected multiplicity of a query result tuple using the tuple's provenance polynomial. This problem has been studied extensively for sets (lineage formulas), but the bag settings has not received much attention so far. While the expectation of a polynomial can be calculated in linear time in the size of polynomials that are in sum-of-products normal form, the problem is \sharpwonehard for factorized polynomials. We have proven this claim through a reduction from the problem of counting k-matchings. When only considering polynomials for result tuples of UCQs over TIDBs and BIDBs (under the assumption that there are $O(1)$ cancellations), we prove that it is possible to approximate the expectation of a polynomial in linear time.
\BG{I am not sure what interesting future work is here. Some wild guesses, if anybody agrees I'll try to flesh them out:
\textbullet{More queries: what happens with negation can circuits with monus be used?}
\textbullet{More databases: can we push beyond BIDBs? E.g., C-tables / aggregate semimodules or just TIDBs where each input tuple is a random variable over $\mathbb{N}$?}
\textbullet{Other results: can we extend the work to approximate $P(R(t) = n)$}
}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:

View file

@ -2,7 +2,8 @@
\section{Introduction}
Modern production databases like Postgres and Oracle use bag semantics. In contrast, most implementations of probabilistic databases (PDBs) are built in the setting of set semantics, where computing the probability of an output tuple is analogous to weighted model counting (a known $\sharpphard$ problem).
Modern production databases like Postgres and Oracle use bag semantics. In contrast, most implementations of probabilistic databases (PDBs) are built in the setting of set semantics, where computing the probability of an output tuple is analogous to weighted model counting (a known \sharpphard problem).
%the annotation of the tuple is a lineage formula ~\cite{DBLP:series/synthesis/2011Suciu}, which can essentially be thought of as a boolean formula. It is known that computing the probability of a lineage formula is \#-P hard in general
In PDBs, a boolean formula, ~\cite{DBLP:series/synthesis/2011Suciu} also called a lineage formula, encodes the conditions under which each output tuple appears in the result.
%The marginal probability of this formula being true is the tuple's probability to appear in a possible world.
@ -99,7 +100,9 @@ Assume the following $\mathbb{B}/\mathbb{N}$ variable assignments: $W_a\mapsto T
\end{align*}
In the set/lineage setting, we find that the boolean query is satisfied, while in the bags evaluation we see how many combinations of the input satsify the query.
\end{Example}
Note that computing the probability of the query of ~\cref{ex:intro} in set semantics is indeed $\sharpphard$, since it is a query that is non-hierarchical
Note that computing the probability of the query of ~\cref{ex:intro} in set semantics is indeed \sharpphard, since it is a query that is non-hierarchical
%, i.e., for $Vars(\poly)$ denoting the set of variables occuring across all atoms of $\poly$, a function $sg(x)$ whose output is the set of all atoms that contain variable $x$, we have that $sg(A) \cap sg(B) \neq \emptyset$ and $sg(A)\not\subseteq sg(B)$ and $sg(B)\not\subseteq sg(A)$,
~\cite{10.1145/1265530.1265571}. %Thus, computing $\expct\pbox{\poly(W_a, W_b, W_c)}$, i.e. the probability of the output with annotation $\poly(W_a, W_b, W_c)$, ($\prob(q)$ in Dalvi, Sucui) is hard in set semantics.
To see why this computation is hard for query $\poly$ over set semantics, from the query input we compute an output lineage formula of $\poly(W_a, W_b, W_c) = W_aW_b \vee W_bW_c \vee W_cW_a$. Note that the conjunctive clauses are not independent of one another and the computation of the probability is not linear in the size of $\poly(W_a, W_b, W_c)$:

View file

@ -169,8 +169,9 @@ sensitive=true
\input{single_p}
\input{lin_sys}
\input{approx_alg}
%\input{bi_cancellation}
% \input{bi_cancellation}
\input{related-work}
\input{conclusions}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

6
related-work.tex Normal file
View file

@ -0,0 +1,6 @@
\section{Related Work}\label{sec:related-work}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: