Revamped proof for Lemma 4.8.

master
Aaron Huber 2022-03-07 09:20:36 -05:00
parent 9ff8aa220c
commit 7738fd5180
1 changed files with 6 additions and 14 deletions

View File

@ -143,26 +143,18 @@ as well as for all three queries of the PDBench \abbrBIDB benchmark (see \Cref{a
\secrev{
\begin{Lemma}
\label{lem:ctidb-gamma}
Given $\raPlus$ query $\query$ and \abbrCTIDB $\pdb$, let \circuit be the circuit computed by $\query\inparen{\tupset}$. Then, for the reduced \abbrOneBIDB $\pdb'$ there exists an equivalent circuit \circuit' obtained from $\query\inparen{\tupset'}$, such that $\gamma\inparen{\circuit'}\leq 1 - \inparen{\bound + 1}^{-\inparen{k-1}}$ with $\size\inparen{\circuit'} \leq \size\inparen{\circuit} + \numvar\cdot\inparen{2^{\inparen{\log{2\bound}}+ 1} - 1}$ and $\depth\inparen{\circuit'} = \depth\inparen{\circuit} + \ceil{\log{2\bound}}$.
Given $\raPlus$ query $\query$ and \abbrCTIDB $\pdb$, let \circuit be the circuit computed by $\query\inparen{\tupset}$. Then, for the reduced \abbrOneBIDB $\pdb'$ there exists an equivalent circuit \circuit' obtained from $\query\inparen{\tupset'}$, such that $\gamma\inparen{\circuit'}\leq 1 - \inparen{\bound + 1}^{-\inparen{k-1}}$ with $\size\inparen{\circuit'} \leq \size\inparen{\circuit} + \numvar\cdot\inparen{2^{\inparen{\ceil{\log{2\bound}}}+ 1} - 1}$ and $\depth\inparen{\circuit'} = \depth\inparen{\circuit} + \ceil{\log{2\bound}}$.
\end{Lemma}
}
\secrev{
\begin{proof}[Proof of~\Cref{lem:ctidb-gamma}]
%Let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \pdb'}$ be the reduced \abbrOneBIDB and $\pdb = \inparen{\worlds, \pdb}$ the original \abbrCTIDB.
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val = X_\tup$, replace $\gate_i$ with the circuit \subcircuit for $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb = \inparen{\worlds, \pdb}$ be the original \abbrCTIDB \circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a reduced $\pdb' = \inparen{\onebidbworlds{\tupset'}, \pdb'}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and each $j\cdot\worldvec'_{\tup, j} = \worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$. Finally, note that the above conversion implies the claimed size and depth bounds of the lemma.
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val = X_\tup$, replace $\gate_i$ with the circuit \subcircuit encoding the sum $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb = \inparen{\worlds, \pdb}$ be the original \abbrCTIDB \circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a reduced $\pdb' = \inparen{\onebidbworlds{\tupset'}, \pdb'}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and each $j\cdot\worldvec'_{\tup, j} = \worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$. Finally, note that because there exists $\subcircuit'\in\circuitset{\polyf\inparen{\circuit}}$ encoding $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$ that is a \emph{balanced} binary tree, the above conversion implies the claimed size and depth bounds of the lemma.
Consider list of expanded monomials $\expansion{\circuit}$ for \abbrCTIDB circuit \circuit. Let \monom be an arbitrary monomial with $\ell$ variables. Then $\monom = X_{\tup, 1}^{d_1},\ldots,X_{\tup, \ell}^{d_\ell}$ yields the set of monomials $\inparen{j_1\cdot X_{\tup, 1}^{d_1},\ldots, j_\ell\cdot X_{\tup, \ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{0, \bound}}$ in $\expansion{\circuit'}$. Observe that cancellations can only occur for each $X_{\tup_i}^{d_i}$ which implies that, considering only $X_{\tup, i}^{d_i}$, $\gamma \leq 1 - \inparen{c + 1}^{k - 1}$, since for each element in $\inset{\bigtimes_{i\in\pbox{k}, j_i\in\pbox{0, \bound}}X_{j_i}}$ there are \emph{exactly} $\bound+1$ surviving elements with $j_1=\cdots=j_k$, i.e. $X_j^k$ for each $j\in\pbox{0, \bound}$. The rest of the $\inparen{\bound + 1}^k$ cross terms cancel.
%By~\Cref{def:ctidb-reduct}, $\pdb'$ is a \abbrOneBIDB.
%By~\Cref{def:one-bidb}, a block $\block_\tup$ of $\pdb'$ has the property that $\sum_{\tup\in\tupset, j\in\pbox{\bound}}\prob_{\tup, j}\leq 1$. Then, if we consider the case of strict inequality, we have an extra possible outcome in block $\block_\tup$, the outcome when no tuple is present in a possible world. Let us denote this as $\tup_0$. Then there are at most $c + 1$ disjoint tuples in $\block_\tup$. We argue later that the case when $\tup_0$ is a possibility produces the worst case $\gamma$.
%
%Let $\poly'\inparen{\vct{X}}$ be an aribitrary polynomial produced by $\query\inparen{\pdb'}$ with $\vct{X} = \inparen{X_{\tup, j}}_{\tup\in\tupset', j\in\pbox{0, \bound}}$ the set of variables in $\pdb'$. Let $m$ be an arbitrary monomial in $\poly'\inparen{\vct{X}}$ and $v_m$ be the set of variables appearing in $m$. We define a cross term to be any monomial $m$ such that there exists $j\neq j'\in\pbox{0, \bound}$ such that $X_{\tup, j}, X_{\tup, j'}\in v_m$.
%
%The semantics of~\Cref{fig:lin-poly-bidb-redux} show that a new monomial product can only be generated by the $\join$ operator of $\raPlus$ queries. Further, a cross term may only be produced specifically when the join is a self join. The highest number of terms that can be produced by a self join of $\block_\tup$ is $\inparen{\bound + 1}^k$, the case for when all tuples join and $\sum_{\tup\in\tupset, j\in\pbox{\bound}}\prob_{\tup, \bound} < 1$ as noted above. For monomials $m\in\inset{\bigtimes_{i\in\pbox{k}, j\in\pbox{0, \bound}} X_{\tup, j_i}}$, there exist \emph{exaclty} $\inparen{\bound + 1}$ \emph{non}-cross terms, specifically $X_{\tup, j}^k$ for $j\in\pbox{0, \bound}$. Then there are exactly $\inparen{\bound + 1}^k - \inparen{\bound + 1}$ cross terms (cancellations). This implies that $\gamma\inparen{\circuit} = 1 - \frac{\inparen{\bound + 1}}{\inparen{\bound + 1}^k}$ for this case.
%
%We now show that the case above is indeed the worst case. First, given a self join, it is always the case that $X_{\tup, j}^k$ will be in the output since all tuples join with themselves. Then, the most number of cancellations occurs when we have that all $X_{\tup, j}$ joins with all $X_{\tup, j'}$ for $j\neq j' \in \pbox{0, c}$. Finally, it is the case that $\bound^k - \bound \leq \inparen{\bound + 1}^k - \inparen{\bound + 1} = \sum_{i = 1}^k\binom{k}{i}c^i - \inparen{\bound - 1}$ for $\bound, k \in \mathbb{N}$, which implies that the worst case is when we have the `extra' tuple $\tup_0$ and all tuples joining, which is exactly the case above, producing the greatest $\gamma\inparen{\circuit}$ ratio.
%
%Since the size of any block $\block$ is $\bound + 1$, it follows that $\gamma\inparen{\circuit}$ ratio for block $\block_\tup$ is the same when taken across all blocks of $\query\inparen{\pdb'}$, since the number of blocks $\numvar$ cancels out of the ratio calculations.%stays the same the number of blocks $\numvar$ results for one block $\block_\tup$ hold for the entire $\tupset'$, where the number of monomials is $\numvar\inparen{\bound + 1}^k$ and the number of non-cross terms is $\numvar\inparen{\bound + 1}$. Thus the multiplicative factor $\numvar$ (number of blocks) cancels out.% then the total number of monomials is the number of blocks $\numvar$ times $\bound + 1$ or $\numvar\cdot\inparen{\bound + 1}$.
Consider the list of expanded monomials $\expansion{\circuit}$ for \abbrCTIDB circuit \circuit. Let \monom be an arbitrary monomial such that the set of variables in \monom is $\encMon = X_{\tup, 1}^{d_1},\ldots,X_{\tup, \ell}^{d_\ell}$ with the number of variables $\abs{\encMon} = \ell$. Then \monom yields the set of monomials $\vari{E}_\monom\inparen{\circuit'}=\inparen{j_1\cdot X_{\tup, j_1}^{d_1},\ldots, j_\ell\cdot X_{\tup, j_\ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{0, \bound}}$ in $\expansion{\circuit'}$. Observe that cancellations can only occur for each $X_{\tup}^{d_\tup}\in \encMon$. Consider the number of cancellations for $X_{\tup}^{d_\tup}$. Then $\gamma \leq 1 - \inparen{c + 1}^{d_\tup - 1}$, since for each element in $\inset{\bigtimes_{i\in\pbox{d_\tup}, j_i\in\pbox{0, \bound}}X_{j_i}}$ there are \emph{exactly} $\bound+1$ surviving elements with $j_1=\cdots=j_{d_\tup}$, i.e. $X_j^{d_\tup}$ for each $j\in\pbox{0, \bound}$. The rest of the $\inparen{\bound + 1}^{d_\tup-1}$ cross terms cancel. Regarding the whole monomial \monom it is the case that the proportion of non-cancellations across each $X_\tup^{d_\tup}\in\encMon$ multiply as non-cancelling terms for $X_\tup$ can only be joined with non-cancelling terms of $X_{\tup'}^{d_{\tup'}}$. This then yields the inequality $1 - \prod_{i = 1}^{\ell}\inparen{c +1}^{d_i - 1}\leq \gamma \leq 1 - \inparen{c + 1}^{-\inparen{k - 1}}$ where the inequalities take into account the fact that $\sum_{i = 1}^\ell d_i \leq k$.
Since this is true for arbitrary \monom, the bound follows for $\polyf\inparen{\circuit'}$.
\end{proof}
\qed
}
@ -204,7 +196,7 @@ Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$ (wher
\end{Corollary}
\secrev{
\begin{proof}[Proof of~\Cref{cor:approx-algo-punchline-ctidb}]
The proof follows by~\Cref{def:ctidb-reduct},~\Cref{lem:ctidb-gamma}, and~\Cref{cor:approx-algo-punchline}.
The proof follows by~\Cref{lem:ctidb-gamma}, and~\Cref{cor:approx-algo-punchline}.
\end{proof}
\qed
}