Small changes and notes.

master
Aaron Huber 2021-09-01 11:27:11 -04:00
parent ed16f49249
commit 6f593b3627
6 changed files with 14 additions and 14 deletions

View File

@ -12,7 +12,7 @@ so our results imply that computing probabilities for Bag-PDB based on the resul
% Such factorized representations are necessary to realize the performance of modern join algorithms (e.g., worst-case optimal joins), and so our results imply that a Bag-PDB doing exact computations (via these factorized representations) can never be as fast as a classical (deterministic) database.
The problem stays hard even for polynomials generated by project-join queries if all input tuples have a fixed probability $\prob$ (s.t. $\prob \in (0,1)$).
We proceed to study polynomials of result tuples of positive relational algebra queries ($\raPlus$) over TIDBs and for a non-trivial subclass of block-independent databases (BIDBs).
We develop a sampling algorithm that computes a $1 \pm \epsilon$-approximation of the expectation of polynomial circuits in linear time in the size of the polynomial.
We develop a sampling algorithm that computes a $1 \pm \epsilon$-approximation of the expectation of polynomial (arithmetic) circuits in linear time in the size of the circuit.
By removing Bag-PDB's reliance on the sum-of-products representation of polynomials, this result paves the way for future work on PDBs that are competitive with deterministic databases.
\end{abstract}

View File

@ -70,13 +70,13 @@ $f_\ell(s) = \eset{1}$.
Note, importantly, that when we discuss $f_\ell^{-1}$, that each \textit{edge} present in $\eset{1}$ must have an edge in $s\in f_\ell^{-1}(\eset{1})$ that projects down to it. In particular, if $|\eset{1}| = 3$, then it must be the case that each $s\in f_\ell^{-1}(S)$ consists of the following set of edges: $\{ (e_i, b), (e_j, b'), (e_m, b'') \}$, where $i,j$ and $m$ are distinct.
\begin{Lemma}\label{lem:fk-func}
$f_\ell$ is a function.
\end{Lemma}
\begin{proof}\label{subsubsec:proof-fk}
For any $b \in \{0,\ldots, \ell-1\}$, the map $(e, b) \mapsto e$ is a function since it has exactly one mapping. It then follows that $f_\ell$ is a function.\qed
\end{proof}
%\begin{Lemma}\label{lem:fk-func}
%$f_\ell$ is a function.
%\end{Lemma}
%
%\begin{proof}\label{subsubsec:proof-fk}
%For any $b \in \{0,\ldots, \ell-1\}$, the map $(e, b) \mapsto e$ is a function since it has exactly one mapping. It then follows that $f_\ell$ is a function.\qed
%\end{proof}
We are now ready to prove the structural lemmas. Note that $f_\ell$ maps subsets of three edges in $\graph{\ell}$ to a subset of at most three edges in $\esetType{1}$. To prove the structural lemmas, we will use the map $f_\ell^{-1}$. In particular, to count the number of occurrences of $\tri$ and $\threedis$ in $\graph{\ell}$ we count for each $S\in\binom{E_1}{\le 3}$, how many $\threedis$ and $\tri$ subgraphs appear in $f_\ell^{-1}(S)$.

View File

@ -83,7 +83,7 @@ Since $\semNX$-PDBs $\pxdb$ are a complete representation system for $\semN$-PDB
\subsection{\tis and \bis in the $\semNX$-PDB model}\label{subsec:supp-mat-ti-bi-def}
Two important subclasses of $\semNX$-PDBs that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is $1$. \AH{Reviewer complaint: This is not true by definition.}
Two important subclasses of $\semNX$-PDBs that are of interest to us are the bag versions of tuple-independent databases (\tis) and block-independent databases (\bis). Under set semantics, a \ti is a deterministic database $\db$ where each tuple $\tup$ is assigned a probability $\prob_\tup$. The set of possible worlds represented by a \ti $\db$ is all subsets of $\db$. The probability of each world is the product of the probabilities of all tuples that exist with one minus the probability of all tuples of $\db$ that are not part of this world, i.e., tuples are treated as independent random events. In a \bi, we also assign each tuple a probability, but additionally partition $\db$ into blocks. The possible worlds of a \bi $\db$ are all subsets of $\db$ that contain at most one tuple from each block. Note then that the tuples sharing the same block are disjoint, and the sum of the probabilitites of all the tuples in the same block $\block$ is at most $1$. \AH{Reviewer complaint: This is not true by definition.}
The probability of such a world is the product of the probabilities of all tuples present in the world. %and one minus the sum of the probabilities of all tuples from blocks for which no tuple is present in the world.
For bag \tis and \bis, we define the probability of a tuple to be the probability that the tuple exists with multiplicity at least $1$.
@ -171,9 +171,9 @@ Then in expectation we have
&= \sum_{\substack{\vct{d} \in \{0,\ldots,B\}^\numvar\\\wedge~\isInd{\vct{d}}}}c_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \prob_i\label{p1-s4}\\
&= \rpoly(\prob_1,\ldots, \prob_\numvar).\label{p1-s5}
\end{align}
\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above. Then we arrive at \cref{p1-s1b} by linearity of expectation. Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that no monomial can be composed of dependent variables, i.e., variables from the same block $\block$.\Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product. In \cref{p1-s3}, note that $\randWorld_i \in \{0, 1\}$ which further implies that for any exponent $e \geq 1$, $\randWorld_i^e = \randWorld_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
\Cref{p1-s1a} is the result of substituting in the definition of $\poly$ given above. Then we arrive at \cref{p1-s1b} by linearity of expectation. Next, \cref{p1-s1c} is the result of the independence constraint of \abbrBIDB\xplural, specifically that any monomial composed of dependent variables, i.e., variables from the same block $\block$, has a probability of $0$. \Cref{p1-s2} is obtained by the fact that all variables in each monomial are independent, which allows for the expectation to be pushed through the product. In \cref{p1-s3}, since $\randWorld_i \in \{0, 1\}$ it is the case that for any exponent $e \geq 1$, $\randWorld_i^e = \randWorld_i$. Next, in \cref{p1-s4} the expectation of a tuple is indeed its probability.
Finally, observe \Cref{p1-s5}, where by construction in \Cref{lem:pre-poly-rpoly}, that $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial and its corresponding coefficient, across the entire sum.
Finally, it can be verified that \Cref{p1-s5} follows since \cref{p1-s4} satisfies the construction of \Cref{lem:pre-poly-rpoly}, i.e. $\rpoly(\prob_1,\ldots, \prob_\numvar)$ is exactly the product of probabilities of each variable in each monomial and its corresponding coefficient, across the entire sum.
\qed
\end{proof}

View File

@ -120,11 +120,11 @@ sensitive=true
\maketitle
\input{abstract}
\input{intro-rewrite-070921}%Another iteration of ICDT 2nd Round submission
%\input{intro-rewrite-070921}%Another iteration of ICDT 2nd Round submission
%\input{intro-atri}
%\input{intro-rewrite2}%ICDT 2nd Round submission
%\input{outline-intro-new}
%\input{intro-new}%ICDT 1st Round submission
\input{intro-new}%ICDT 1st Round submission
% \input{intro}--PODS submission
\input{ra-to-poly}
\input{poly-form}

View File

@ -57,7 +57,7 @@ even hold for the expression trees. %this polynomial can be encoded in an expres
\end{minipage}
}
where adapting the PDB instance in \Cref{fig:ex-shipping-simp}, relation $OnTime$ has $n$ tuples corresponding to each vertex in $\vset=[n]$ each with probability $\prob$ and $Route(\text{City}_1, \text{City}_2)$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\AH{This footnote is probably unnecessary now since we changed the example.}\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
Note that this implies that our hard query polynomial can be represented as an expression tree produced by a project-join query with same probability value for each input tuple $\prob_i$.
Note that this implies that our hard query polynomial can be represented as an expression tree \AH{If we speak of an expression tree, we need to have defined this somewhere.} produced by a project-join query with same probability value for each input tuple $\prob_i$.
\AH{The above discussion from \cref{def:qk} seems to be a bit ambiguous. I'm not sure it's entirely accurate to end with $\prob_i$.}