Added Proof for Lem 4.8 and Cor 4.11.

master
Aaron Huber 2022-03-01 11:34:16 -05:00
parent 2eed89b19e
commit 70df07e4fd
2 changed files with 27 additions and 5 deletions

View File

@ -140,11 +140,27 @@ In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the abo
The restriction on $\gamma$ is satisfied by any
$1$-\abbrTIDB (where $\gamma=0$ in the equivalent $1$-\abbrBIDB of~\Cref{def:ctidb-reduct})
as well as for all three queries of the PDBench \abbrBIDB benchmark (see \Cref{app:subsec:experiment} for experimental results). Further, we can alo argue the following result:
\secrev{
\begin{Lemma}
\label{lem:c-TIDB-gamma}
Given \emph{\abbrOneBIDB} computed from the reduction of~\Cref{def:ctidb-reduct}, $\gamma\inparen{\circuit}=\inparen{c + 1}^{-k}$.
Given \emph{\abbrOneBIDB} computed from the reduction of~\Cref{def:ctidb-reduct}, $\gamma\inparen{\circuit}\leq 1 - \inparen{c + 1}^{\inparen{k-1}}$.
\end{Lemma}
\begin{proof}[Proof of~\Cref{lem:c-TIDB-gamma}]
Let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \pdb'}$ be the reduced \abbrOneBIDB and $\pdb = \inparen{\worlds, \pdb}$ the original \abbrCTIDB.
By~\Cref{def:ctidb-reduct}, $\pdb'$ is a \abbrOneBIDB.
By~\Cref{def:one-bidb}, a block $\block_\tup$ of $\pdb'$ has the property that $\sum_{\tup\in\tupset, j\in\pbox{\bound}}\prob_{\tup, j}\leq 1$. Then, if we consider the case of strict inequality, we have an extra possible outcome in block $\block_\tup$, the outcome when no tuple is present in a possible world. Let us denote this as $\tup_0$. Then there are at most $c + 1$ disjoint tuples in $\block_\tup$. We argue later that the case when $\tup_0$ is a possibility produces the worst case $\gamma$.
Let $\poly'\inparen{\vct{X}}$ be an aribitrary polynomial produced by $\query\inparen{\pdb'}$ with $\vct{X} = \inparen{X_{\tup, j}}_{\tup\in\tupset', j\in\pbox{0, \bound}}$ the set of variables in $\pdb'$. Let $m$ be an arbitrary monomial in $\poly'\inparen{\vct{X}}$ and $v_m$ be the set of variables appearing in $m$. We define a cross term to be any monomial $m$ such that there exists $j\neq j'\in\pbox{0, \bound}$ such that $X_{\tup, j}, X_{\tup, j'}\in v_m$.
The semantics of~\Cref{fig:lin-poly-bidb-redux} show that a new monomial product can only be generated by the $\join$ operator of $\raPlus$ queries. Further, a cross term may only be produced specifically when the join is a self join. The highest number of terms that can be produced by a self join of $\block_\tup$ is $\inparen{\bound + 1}^k$, the case for when all tuples join and $\sum_{\tup\in\tupset, j\in\pbox{\bound}}\prob_{\tup, \bound} < 1$ as noted above. For monomials $m\in\inset{\bigtimes_{i\in\pbox{k}, j\in\pbox{0, \bound}} X_{\tup, j_i}}$, there exist \emph{exaclty} $\inparen{\bound + 1}$ \emph{non}-cross terms, specifically $X_{\tup, j}^k$ for $j\in\pbox{0, \bound}$. Then there are exactly $\inparen{\bound + 1}^k - \inparen{\bound + 1}$ cross terms (cancellations). This implies that $\gamma\inparen{\circuit} = 1 - \frac{\inparen{\bound + 1}}{\inparen{\bound + 1}^k}$ for this case.
We now show that the case above is indeed the worst case. First, given a self join, it is always the case that $X_{\tup, j}^k$ will be in the output since all tuples join with themselves. Then, the most number of cancellations occurs when we have that all $X_{\tup, j}$ joins with all $X_{\tup, j'}$ for $j\neq j' \in \pbox{0, c}$. Finally, it is the case that $\bound^k - \bound \leq \inparen{\bound + 1}^k - \inparen{\bound + 1} = \sum_{i = 1}^k\binom{k}{i}c^i - \inparen{\bound - 1}$ for $\bound, k \in \mathbb{N}$, which implies that the worst case is when we have the `extra' tuple $\tup_0$ and all tuples joining, which is exactly the case above, producing the greatest $\gamma\inparen{\circuit}$ ratio.
Since the size of any block $\block$ is $\bound + 1$, it follows that $\gamma\inparen{\circuit}$ ratio for block $\block_\tup$ is the same when taken across all blocks of $\query\inparen{\pdb'}$, since the number of blocks $\numvar$ cancels out of the ratio calculations.%stays the same the number of blocks $\numvar$ results for one block $\block_\tup$ hold for the entire $\tupset'$, where the number of monomials is $\numvar\inparen{\bound + 1}^k$ and the number of non-cross terms is $\numvar\inparen{\bound + 1}$. Thus the multiplicative factor $\numvar$ (number of blocks) cancels out.% then the total number of monomials is the number of blocks $\numvar$ times $\bound + 1$ or $\numvar\cdot\inparen{\bound + 1}$.
\end{proof}
\qed
}
We briefly connect the runtime in \Cref{eq:approx-algo-runtime} to the algorithm outline earlier (where we ignore the dependence on $\multc{\cdot}{\cdot}$, which is needed to handle the cost of arithmetic operations over integers). The $\size(\circuit)$ comes from the time take to run \onepass once (\onepass essentially computes $\abs{\circuit}(1,\ldots, 1)$ using the natural circuit evaluation algorithm on $\circuit$). We make $\frac{\log{\frac{1}{\conf}}}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}$ many calls to \sampmon (each of which essentially traces $O(k)$ random sink to source paths in $\circuit$ all of which by definition have length at most $\depth(\circuit)$).
@ -177,12 +193,18 @@ Next, we note that the above result along with \Cref{lem:c-TIDB-gamma}
answers \Cref{prob:big-o-joint-steps} in the affirmative as follows:
\begin{Corollary}
\label{cor:approx-algo-punchline-ctidb}
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$ (where $p_0$ as in \Cref{cor:approx-algo-const-p}) is an absolute constant. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf,\bound}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $p_i$ for each $i\in [n]$ that defines $\pd$).
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$ (where $p_0$ as in \Cref{cor:approx-algo-const-p}) is an absolute constant. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf,\bound}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $\prob_{\tup, j}$ for each $\tup\in\tupset,~j\in\pbox{\bound}$ that defines $\bpd$).
%Let $\poly(\vct{X})$ be a \abbrBIDB-lineage polynomial correspoding to an \abbrBIDB circuit $\circuit$ that satisfies the specific conditions in \Cref{lem:val-ub}. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time
% $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$. % for the case when $\circuit$ satisfies the specific conditions in \Cref{lem:val-ub}.
\end{Corollary}
\secrev{
\begin{proof}[Proof of~\Cref{cor:approx-algo-punchline-ctidb}]
The proof follows by~\Cref{def:ctidb-reduct},~\Cref{lem:c-TIDB-gamma}, and~\Cref{cor:approx-algo-punchline}.
\end{proof}
\qed
}
%\AH{What is $\abs{\query}$? Isn't that just $k$?}
If we want to approximate the expected multiplicities of all $Z=O(n^k)$ result tuples $\tup$ simultaneously, we just need to run the above result with $\conf$ replaced by $\frac \conf Z$. Note this increases the runtime by only a logarithmic factor.

View File

@ -152,7 +152,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Binary-BIDB Notation %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\onebidbworlds}[1]{\bigtimes_{\tup\in[#1]}\inset{0,\bound_\tup}}
\newcommand{\onebidbworlds}[1]{\bigtimes_{\tup\in #1}\inset{0,\bound_\tup}}
%PDB Abbreviations
\newcommand{\abbrOneBIDB}{\text{Binary-BIDB}\xspace}