Reworked proof for Lem 4.8.

master
Aaron Huber 2022-04-26 09:02:09 -04:00
parent d101fc8df6
commit 2208604fac
7 changed files with 17 additions and 14 deletions

View File

@ -18,7 +18,7 @@ The slight abuse of notation seen in $\abs{\circuit}\inparen{1,\ldots,1}$ is exp
\subsection{Proof of Theorem \ref{lem:approx-alg}}\label{sec:proof-lem-approx-alg}
\input{app_approx_alg-pseudo-code}
We prove \Cref{lem:approx-alg} constructively by presenting an algorithm \approxq (\Cref{alg:mon-sam}) which has the desired runtime and computes an approximation with the desired approximation guarantee. Algorithm \approxq uses Algorithm \onepass to compute weights on the edges of a circuits. These weights are then used to sample a set of monomials of $\poly(\circuit)$ from the circuit $\circuit$ by traversing the circuit using the weights to ensure that monomials are sampled with an appropriate probability. The correctness of \approxq relies on the correctness (and runtime behavior) of auxiliary algorithms \onepass and \sampmon that we state in the following lemmas (and prove later in this part of the appendix).
We prove \Cref{lem:approx-alg} constructively by presenting an algorithm \approxq (\Cref{alg:mon-sam}) which has the desired runtime and computes an approximation with the desired approximation guarantee. Algorithm \approxq uses auxiliary algorithm \onepass to compute weights on the edges of a circuit. These weights are then used to sample a set of monomials of $\poly(\circuit)$ from the circuit $\circuit$ by traversing the circuit using the weights to ensure that monomials are sampled with an appropriate probability. The correctness of \approxq relies on the correctness (and runtime behavior) of auxiliary algorithms \onepass and \sampmon that we state in the following lemmas (and prove later in this part of the appendix).
\begin{Lemma}\label{lem:one-pass}
The $\onepass$ function completes in time:
@ -27,6 +27,7 @@ $$O\left(\size(\circuit) \cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right
\end{Lemma}
To prove correctness of \Cref{alg:mon-sam}, we only use the following fact that follows from the above lemma: for the modified circuit ($\circuit_{\vari{mod}}$) output by \onepass, $\circuit_{\vari{mod}}.\vari{partial}=\abs{\circuit}(1,\dots,1)$.
\AH{I don't think the word \emph{only} is needed.}
\begin{Lemma}\label{lem:sample}
The function $\sampmon$ completes in time
$$O(\log{k} \cdot k \cdot \depth(\circuit)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log{\size(\circuit)}})$$
@ -36,7 +37,7 @@ $$O(\log{k} \cdot k \cdot \depth(\circuit)\cdot\multc{\log\left(\abs{\circuit}(1
With the above two lemmas, we are ready to argue the following result:
\begin{Theorem}\label{lem:mon-samp}
For any $\circuit$ with
$\degree(poly(|\circuit|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\vari{acc}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ such that
$\degree(\polyf(|\circuit|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\vari{acc}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ such that
\[\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf,\]
in $O\left(\left(\size(\circuit)+\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot \depth(\circuit)\right)\cdot \multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log{\size(\circuit)}}\right)$ time.
\end{Theorem}
@ -47,7 +48,7 @@ Before proving \Cref{lem:mon-samp}, we use it to argue the claimed runtime of ou
\begin{proof}[Proof of \Cref{lem:approx-alg}]
Set $\mathcal{E}=\approxq({\circuit}, (\prob_1,\dots,\prob_\numvar),$ $\conf, \error')$, where
\[\error' = \error \cdot \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{{\circuit}}(1,\ldots, 1)},\]
which achieves the claimed error bound on $\mathcal{E}$ (\vari{acc}) trivially due to the assignment to $\error'$ and \cref{lem:mon-samp}, since $\error' \cdot \abs{\circuit}(1,\ldots, 1) = \error\cdot\frac{\rpoly(1,\ldots, 1)}{\abs{\circuit}(1,\ldots, 1)} \cdot \abs{\circuit}(1,\ldots, 1) = \error\cdot\rpoly(1,\ldots, 1)$.
which achieves the claimed error bound on $\mathcal{E}$ (\vari{acc}) trivially due to the assignment to $\error'$ and \cref{lem:mon-samp}, since $\error' \cdot \abs{\circuit}(1,\ldots, 1) = \error\cdot\frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{\circuit}(1,\ldots, 1)} \cdot \abs{\circuit}(1,\ldots, 1) = \error\cdot\rpoly(\prob_1,\ldots, \prob_\numvar)$.
The claim on the runtime follows from \Cref{lem:mon-samp} since
@ -88,7 +89,7 @@ Using Hoeffding's inequality, we then get:
\probOf\left(~\left| \empmean - \expct\pbox{\empmean} ~\right| \geq \error\right) \leq 2\exp{\left(-\frac{2\samplesize^2\error^2}{2^2 \samplesize}\right)} = 2\exp{\left(-\frac{\samplesize\error^2}{2 }\right)}\leq \conf,
\end{equation*}
where the last inequality dictates our choice of $\samplesize$ in \Cref{alg:mon-sam-global2}.
\AH{Why does the $\geq$ sign change to $>$?}
For the claimed probability bound of $\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf$, note that in the algorithm, \vari{acc} is exactly $\empmean \cdot \abs{\circuit}(1,\ldots, 1)$. Multiplying the rest of the terms by the additional factor $\abs{\circuit}(1,\ldots, 1)$ yields the said bound.
This concludes the proof for the first claim of theorem~\ref{lem:mon-samp}. Next, we prove the claim on the runtime.
@ -113,11 +114,13 @@ Applying this bound in the runtime bound in \Cref{lem:approx-alg} gives the firs
\subsection{Proof of~\Cref{lem:ctidb-gamma}}
\begin{proof}
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val = X_\tup$, replace $\gate_i$ with the circuit \subcircuit encoding the sum $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb = \inparen{\worlds, \bpd}$ be the original \abbrCTIDB \circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a \abbrOneBIDB $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$, with $\tupset' = \inset{\intup{\tup, j}~|~\tup\in\tupset, j\in\pbox{\bound}}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and each $j\cdot\worldvec'_{\tup, j} = \worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$. Finally, note that because there exists a (sub) circuit encoding $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$ that is a \emph{balanced} binary tree, the above conversion implies the claimed size and depth bounds of the lemma.
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val = X_\tup$, replace $\gate_i$ with the circuit \subcircuit encoding the sum $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb = \inparen{\worlds, \bpd}$ be the original \abbrCTIDB \circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a \abbrOneBIDB $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$, with $\tupset' = \inset{\intuple{\tup, j}~|~\tup\in\tupset, j\in\pbox{\bound}}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and $\sum_{j=1}^\bound j\cdot\worldvec'_{\tup, j} = \worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$ such that $\worldvec_\tup\equiv\worldvec'_\tup$. Finally, note that because there exists a (sub) circuit encoding $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$ that is a \emph{balanced} binary tree, the above conversion implies the claimed size and depth bounds of the lemma.
Next we argue the claim on $\gamma\inparen{\circuit'}$. Consider the list of expanded monomials $\expansion{\circuit}$ for \abbrCTIDB circuit \circuit. Let
$\encMon = X_{\tup_1}^{d_1},\ldots,X_{\tup_\ell}^{d_\ell}$ be an arbitrary monomial with $\ell$ variables. Then \monom yields the set of monomials $\vari{E}_\monom\inparen{\circuit'}=\inset{j_1^{d_1}\cdot X_{\tup, j_1}^{d_1}\times\cdots\times j_\ell^{d_\ell}\cdot X_{\tup, j_\ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{\bound}}$ in $\expansion{\circuit'}$. Recall that a cancellation occurs when we have a monomial \monom' such that there exists $\tup\neq\tup'$ in the same block $\block$ where variables $X_\tup, X_{\tup'}$ are in the set of variables $\encMon'$ of \monom'. Observe that cancellations can only occur for each $X_{\tup}^{d_\tup}\in \encMon$, where the expansion $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}}^{d_\tup}$ represents the monomial $X_\tup^{d_\tup}$ in $\tupset'$. Consider the number of cancellations for $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}}^{d_t}$. Then $\gamma \leq 1 - \bound^{d_\tup - 1}$, since
for each element in the set of cross products $\inset{\bigtimes_{i\in\pbox{d_\tup}, j_i\in\pbox{\bound}}X_{\tup, j_i}}$ there are \emph{exactly} $\bound$ surviving elements with $j_1=\cdots=j_{d_\tup}=j$, i.e. $X_{t,j}^{d_\tup}$ for each $j\in\pbox{\bound}$. The rest of the $\bound^{d_\tup}-c$ cross terms cancel. Regarding the whole monomial \monom', it is the case that the proportion of non-cancellations across each $X_\tup^{d_\tup}\in\encMon'$ multiply because non-cancelling terms for $X_\tup$ can only be joined with non-cancelling terms of $X_{\tup'}^{d_{\tup'}}\in\encMon'$ for $\tup\neq\tup'$. This then yields the fraction of cancelled monomials $\gamma\le 1 - \prod_{i = 1}^{\ell}\bound^{d_i - 1} \leq 1 - \bound^{-\inparen{k - 1}}$ where the inequalities take into account the fact that $\sum_{i = 1}^\ell d_i \leq k$.
$\encMon = X_{\tup_1}^{d_1}\cdots X_{\tup_\ell}^{d_\ell}$ be an arbitrary monomial with $\ell$ variables and let (abusing notation) $\encMon' = \inparen{\sum_{j = 1}^{\bound}j\cdot X_{\tup_1, j}}^{d_1}\cdots\inparen{\sum_{j = 1}^{\bound}j\cdot X_{\tup_\ell, j}}^{d_\ell}$. Then, for $f_\ell = \sum_{i = 1}^\ell d_i$, $\encMon$ induces the set of monomials $\inset{\prod_{i = 1}^{f_\ell} j_i\cdot X_{\tup_i, j_i}^{d_i}}_{j_i\in\pbox{\bound}}$ in the pure expansion of $\encMon'$.
%Denote the additional list elements (projecting out coefficient terms) \emph{induced} by $\monom$ as $\vari{E}_\monom\inparen{\circuit'}$. Then $\vari{E}_\monom\inparen{\circuit'}=\inset{\monom'^1~|~\encMon' \in \vari{S}}$%\inset{j_1^{d_1}\cdot X_{\tup, j_1}^{d_1}\times\cdots\times j_\ell^{d_\ell}\cdot X_{\tup, j_\ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{\bound}}$ in $\expansion{\circuit'}$.
Recall that a cancellation occurs in $\encMon'$ when there exists $\tup_{i, j}\neq\tup_{i, j'}$ in the same block $\block$ where variables $X_{\tup_i, j}, X_{\tup_i, j'}$ are in the set of variables $\monom_i'$ of $\monom_{\vari{m}_\vari{i}}\in\encMon'$. Observe that cancellations can only occur for each $X_{\tup}^{d_\tup}\in \encMon$, where the expansion $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}}^{d_\tup}$ represents the monomial $X_\tup^{d_\tup}$ in $\tupset'$. Consider the number of cancellations for $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}}^{d_t}$. Then $\gamma \leq 1 - \bound^{-\inparen{d_\tup - 1}}$, since
for each element in the set of cross products $\inset{\bigtimes_{i\in\pbox{d_\tup}, j_i\in\pbox{\bound}}X_{\tup, j_i}}$ there are \emph{exactly} $\bound$ surviving elements with $j_1=\cdots=j_{d_\tup}=j$, i.e. $X_{t,j}^{d_\tup}$ for each $j\in\pbox{\bound}$. The rest of the $\bound^{d_\tup}-c$ cross terms cancel. Regarding all of $\encMon'$, it is the case that the proportion of non-cancellations for each $\inparen{\sum_{j = 1}^{\bound}j\cdot X_{\tup_i, j }}^{d_i}\in\encMon'$ multiply because non-cancelling terms for $\inparen{\sum_{j = 1}^{\bound}j\cdot X_{\tup_i, j}}^{d_i}$ can only be joined with non-cancelling terms of $\inparen{\sum_{j=1}^{\bound}X_{\tup_{i'}, j}}^{d_{i'}}\in\encMon'$ for $\tup\neq\tup'$. This then yields the fraction of cancelled monomials $\gamma\le 1 - \prod_{i = 1}^{\ell}\bound^{-\inparen{d_i - 1}} \leq 1 - \bound^{-\inparen{k - 1}}$ where the inequalities take into account the fact that $f_\ell \leq k$.
Since this is true for arbitrary \monom, the bound follows for $\polyf\inparen{\circuit'}$.
\end{proof}

View File

@ -8,18 +8,18 @@ In the following definitions and examples, we use the following polynomial as an
\begin{Definition}[Pure Expansion]
The pure expansion of a polynomial $\poly$ is formed by computing all product of sums occurring in $\poly$, without combining like monomials. The pure expansion of $\poly$ generalizes \Cref{def:smb} by allowing monomials $m_i = m_j$ for $i \neq j$.
\end{Definition}
Note that similar in spirit to \Cref{def:reduced-poly-one-bidb}, $\expansion{\circuit}$ \Cref{def:expand-circuit} reduces all variable exponents $e > 1$ to $e = 1$. Further, it is true that $\expansion{\circuit}$ is the pure expansion of $\circuit$.
Note that similar in spirit to \Cref{def:reduced-poly-one-bidb}, $\expansion{\circuit}$ \Cref{def:expand-circuit} reduces all variable exponents $e > 1$ to $e = 1$. Further, it is true that $\expansion{\circuit}$ encodes the pure expansion of $\circuit$.
\begin{Example}[Example of Pure Expansion]\label{example:expr-tree-T}
Consider the factorized representation $(X+ 2Y)(2X - Y)$ of the polynomial in \Cref{eq:poly-eg}.
Its circuit $\circuit$ is illustrated in \Cref{fig:circuit}.
The pure expansion of the product is $2X^2 - XY + 4XY - 2Y^2$. As an additional example of \Cref{def:expand-circuit}, $\expansion{\circuit}=[(X, 2), (XY, -1), (XY, 4), (Y, -2)]$.
\end{Example}
$\expansion{\circuit}$ effectively\footnote{The minor difference here is that $\expansion{\circuit}$ encodes the \emph{reduced} form over the SOP pure expansion of the compressed representation, as opposed to the \abbrSMB representation} encodes the \emph{reduced} form of $\polyf\inparen{\circuit}$, decoupling each monomial into a set of variables $\monom$ and a real coefficient $\coef$.
$\expansion{\circuit}$ effectively\footnote{The minor difference here is that $\expansion{\circuit}$ encodes the \emph{reduced} form of the pure expansion of the compressed representation, as opposed to the \abbrSMB representation} encodes the \emph{reduced} form of $\polyf\inparen{\circuit}$, decoupling each monomial into a set of variables $\monom$ and a real coefficient $\coef$.
However, unlike the constraint on the input $\poly$ to compute $\rpoly$, the input circuit $\circuit$ does not need to be in \abbrSMB/SOP form.
\begin{Example}[Example for \Cref{def:positive-circuit}]\label{ex:def-pos-circ}
Using the same factorization from \Cref{example:expr-tree-T}, $\polyf(\abs{\circuit}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from \Cref{eq:poly-eg}. As an example of the slight abuse of notation we alluded to, $\polyf\inparen{\abs{\circuit}\inparen{1,\ldots, 1}} =2\inparen{1}^2 + 5\inparen{1}\inparen{1} + 2\inparen{1}^2 = 9$.
Using the same factorization from \Cref{example:expr-tree-T}, $\polyf(\abs{\circuit}) = (X + 2Y)(2X + Y) = 2X^2 +XY +4XY + 2Y^2 = 2X^2 + 5XY + 2Y^2$. Note that this \textit{is not} the same as the polynomial from \Cref{eq:poly-eg}. As an example of the slight abuse of notation we alluded to in~\Cref{sec:algo}, $\abs{\circuit}\inparen{1,\ldots, 1} =2\inparen{1}^2 + 5\inparen{1}\inparen{1} + 2\inparen{1}^2 = 9$.
\end{Example}
\begin{Definition}[Subcircuit]

View File

@ -23,7 +23,7 @@ $\expansion{\circuit} =
\expansion{\circuit_\linput} \circ \expansion{\circuit_\rinput} &\textbf{ if }\circuit.\type = \circplus\\
\left\{(\monom_\linput \cup \monom_\rinput, \coef_\linput \cdot \coef_\rinput) \right.\\\left.~|~(\monom_\linput, \coef_\linput) \in \expansion{\circuit_\linput}, (\monom_\rinput, \coef_\rinput) \in \expansion{\circuit_\rinput}\right\} &\textbf{ if }\circuit.\type = \circmult\\
\elist{(\emptyset, \circuit.\val)} &\textbf{ if }\circuit.\type = \tnum\\
\elist{(\{\circuit.\val\}, 1)} &\textbf{ if }\circuit.\type = \var.\\
\elist{(\circuit.\val, 1)} &\textbf{ if }\circuit.\type = \var.\\
\end{cases}
$
\end{Definition}

View File

@ -53,7 +53,7 @@ We slightly abuse notation here, denoting a world vector as $W$ rather than $\wo
\Cref{fig:lin-poly-bidb} shows the lineage construction of $\poly'\inparen{\vct{X}}$ given $\raPlus$ query $\query$ for arbitrary deterministic $\gentupset'$. Note that the semantics differ from~\Cref{fig:nxDBSemantics} only in the base case.
\begin{Proposition}[\abbrCTIDB reduction]\label{prop:ctidb-reduct}
Given \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$ be the \emph{\abbrOneBIDB} obtained in the following manner: for each $\tup\in\tupset$, create block $\block_\tup = \inset{\intuple{\tup, j}_{j\in\pbox{\bound}}}$ of disjoint tuples, for all $j\in\pbox{\bound}$.
Given \abbrCTIDB $\pdb =$\newline $\inparen{\worlds, \bpd}$, let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$ be the \emph{\abbrOneBIDB} obtained in the following manner: for each $\tup\in\tupset$, create block $\block_\tup = \inset{\intuple{\tup, j}_{j\in\pbox{\bound}}}$ of disjoint tuples, for all $j\in\pbox{\bound}$.
The probability distribution $\bpd'$ is the characterized by the vector $\vct{p} = \inparen{\inparen{\prob_{\tup, j}}_{\tup\in\tupset, j\in\pbox{\bound}}}$.
Then, the distributions $\mathcal{P}$ and $\mathcal{P}'$ are equivalent.
\end{Proposition}
@ -103,7 +103,7 @@ Given a polynomial $\poly'\inparen{\vct{X}}$ generated from a \abbrOneBIDB and l
Then given $\worldvec\in\inset{0,1}^{\tupset'}$ over the reduced \abbrOneBIDB of~\Cref{prop:ctidb-reduct}, the disjoint requirement and the semantics for constructing the lineage polynomial over a \abbrOneBIDB, $\poly'\inparen{\worldvec}$ is of the same structure as the reformulated polynomial $\refpoly{}\inparen{\worldvec}$ of step i) from~\Cref{def:reduced-poly}, which then implies that $\rpoly'$ is the reduced polynomial that results from step ii) of both~\Cref{def:reduced-poly} and~\Cref{def:reduced-poly-one-bidb}, and further that~\Cref{lem:tidb-reduce-poly} immediately follows for \abbrOneBIDB polynomials.
\begin{Lemma}\label{lem:bin-bidb-phi-eq-redphi}
Given any \emph{\abbrOneBIDB} $\pdb'$, $\raPlus$ query $\query$, and lineage polynomial
$\poly'\inparen{\vct{X}}=\poly'\pbox{\query,\tupset',\tup}\inparen{\vct{X}}$, it holds that $
$\poly'\inparen{\vct{X}}=\poly'\pbox{\query,\tupset',\tup}\inparen{\vct{X}}$, it holds that \newline$
\expct_{\vct{W} \sim \pdassign'}\pbox{\poly'\inparen{\vct{W}}} = \rpoly'\inparen{\probAllTup}.
$
\end{Lemma}

BIN
main.pdf

Binary file not shown.

Binary file not shown.

View File

@ -58,7 +58,7 @@ Note that the ciricuit \circuit representing $AX$ and the circuit \circuit' repr
\end{figure}
We next formally define the relationship of circuits with polynomials. While the definition assumes one sink for notational convenience, it easily generalizes to the multiple sinks case.
\begin{Definition}[$\polyf(\cdot)$]\label{def:poly-func}
$\polyf(\circuit)$ maps the sink of circuit $\circuit$ to its corresponding polynomial (in \abbrSMB). $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
$\polyf(\circuit)$ maps the sink of circuit $\circuit$ to its corresponding polynomial in \abbrSMB. $\polyf(\cdot)$ is recursively defined on $\circuit$ as follows, with addition and multiplication following the standard interpretation for polynomials:
\begin{equation*}
\polyf(\circuit) = \begin{cases}
\polyf(\circuit_\lchild) + \polyf(\circuit_\rchild) &\text{ if \circuit.\type } = \circplus\\