Some changes to proof for Sample Monomial, probability bound for approximation algo.

master
Aaron Huber 2022-05-03 10:03:54 -04:00
parent ef0bc79ec8
commit 1cbcf4c927
6 changed files with 12 additions and 12 deletions

View File

@ -25,7 +25,7 @@ The $\onepass$ function completes in time:
$$O\left(\size(\circuit) \cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}\right)$$
$\onepass$ guarantees two post-conditions: First, for each subcircuit $\vari{S}$ of $\circuit$, we have that $\vari{S}.\vari{partial}$ is set to $\abs{\vari{S}}(1,\ldots, 1)$. Second, when $\vari{S}.\type = \circplus$, \subcircuit.\lwght $= \frac{\abs{\subcircuit_\linput}(1,\ldots, 1)}{\abs{\subcircuit}(1,\ldots, 1)}$ and likewise for \subcircuit.\rwght.
\end{Lemma}
To prove correctness of \Cref{alg:mon-sam}, we only use the following fact that follows from the above lemma: for the modified circuit ($\circuit_{\vari{mod}}$) output by \onepass, $\circuit_{\vari{mod}}.\vari{partial}=\abs{\circuit}(1,\dots,1)$.
To prove correctness of \Cref{alg:mon-sam}, we use the following fact that follows from the above lemma: for the modified circuit ($\circuit_{\vari{mod}}$) output by \onepass, $\circuit_{\vari{mod}}.\vari{partial}=\abs{\circuit}(1,\dots,1)$.
\AH{I don't think the word \emph{only} is needed.}
\begin{Lemma}\label{lem:sample}
@ -38,7 +38,7 @@ With the above two lemmas, we are ready to argue the following result:
\begin{Theorem}\label{lem:mon-samp}
For any $\circuit$ with
$\degree(\polyf(|\circuit|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\vari{acc}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ such that
\[\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf,\]
\[\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|\geq \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf,\]
in $O\left(\left(\size(\circuit)+\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot \depth(\circuit)\right)\cdot \multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log{\size(\circuit)}}\right)$ time.
\end{Theorem}
@ -90,7 +90,7 @@ Using Hoeffding's inequality, we then get:
\end{equation*}
where the last inequality dictates our choice of $\samplesize$ in \Cref{alg:mon-sam-global2}.
\AH{Why does the $\geq$ sign change to $>$?}
For the claimed probability bound of $\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|> \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf$, note that in the algorithm, \vari{acc} is exactly $\empmean \cdot \abs{\circuit}(1,\ldots, 1)$. Multiplying the rest of the terms by the additional factor $\abs{\circuit}(1,\ldots, 1)$ yields the said bound.
For the claimed probability bound of $\probOf\left(\left|\vari{acc} - \rpoly(\prob_1,\ldots, \prob_\numvar)\right|\geq \error \cdot \abs{\circuit}(1,\ldots, 1)\right) \leq \conf$, note that in the algorithm, \vari{acc} is exactly $\empmean \cdot \abs{\circuit}(1,\ldots, 1)$. Multiplying the rest of the terms by the additional factor $\abs{\circuit}(1,\ldots, 1)$ yields the said bound.
This concludes the proof for the first claim of theorem~\ref{lem:mon-samp}. Next, we prove the claim on the runtime.

View File

@ -16,21 +16,21 @@ The efficiency gains of circuits over trees is found in the capability of circui
\begin{proof}
We first need to show that $\sampmon$ samples a valid monomial $\encMon$ by sampling and returning a set of variables $\monom$, such that $(\monom, \coef)$ is in $\expansion{\circuit}$ and $\encMon$ is indeed a monomial of the $\rpoly\inparen{\vct{X}}$ encoded in \circuit. We show this via induction over the depth of \circuit.
For the base case, let the depth $d$ of $\circuit$ be $0$. We have that the single gate is either a constant $\coef$ for which by line~\ref{alg:sample-num-return} we return $\{~\}$, or we have that $\circuit.\type = \var$ and $\circuit.\val = x$, and by line~\ref{alg:sample-var-return} we return $\{x\}$. By \cref{def:expand-circuit}, both cases return a valid $\monom$ for some $(\monom, \coef)$ from $\expansion{\circuit}$, and the base case is proven.
For the base case, let the depth $d$ of $\circuit$ be $1$. We have that the single gate is either a constant $\coef$ for which by line~\ref{alg:sample-num-return} we return $\{~\}$, or we have that $\circuit.\type = \var$ and $\circuit.\val = x$, and by line~\ref{alg:sample-var-return} we return $\{x\}$. By \cref{def:expand-circuit}, both cases return a valid $\monom$ for some $(\monom, \coef)$ from $\expansion{\circuit}$, and the base case is proven.
\AH{I think it is slightly confusing to say that depth $= 0$ in view of the definition of depth in S.4. To say $k = 0$ is also strange, since, for a single join, we have that $k = 2$.}
For the inductive hypothesis, assume that for $d \leq k$ for some $k \geq 0$, that it is indeed the case that $\sampmon$ returns a valid monomial.
For the inductive hypothesis, assume that for $d \leq k$ for some $k \geq 1$, that it is indeed the case that $\sampmon$ returns a valid monomial.
For the inductive step, let us take a circuit $\circuit$ with $d = k + 1$. Note that each input has depth $d - 1 \leq k$, and by inductive hypothesis both of them sample a valid monomial. Then the sink can be either a $\circplus$ or $\circmult$ gate. For the case when $\circuit.\type = \circplus$, line~\ref{alg:sample-plus-bsamp} of $\sampmon$ will choose one of the inputs of the source. By inductive hypothesis it is the case that some valid monomial is being randomly sampled from each of the inputs. Then it follows when $\circuit.\type = \circplus$ that a valid monomial is sampled by $\sampmon$. When the $\circuit.\type = \circmult$, line~\ref{alg:sample-times-union} computes the set union of the monomials returned by the two inputs of the sink, and it is trivial to see by \cref{def:expand-circuit} that $\encMon$ is a valid monomial encoded by some $(\monom, \coef)$ of $\expansion{\circuit}$.
We will next prove by induction on the depth $d$ of $\circuit$ that for $(\monom,\coef) \in \expansion{\circuit}$, $\monom$ is sampled with a probability $\frac{|\coef|}{\abs{\circuit}\polyinput{1}{1}}$.
For the base case $d = 0$, by definition~\ref{def:circuit} we know that the $\size\inparen{\circuit} = 1$ and \circuit.\type$=$ \tnum or \var. For either case, the probability of the value returned is $1$ since there is only one value to sample from. When \circuit.\val $= x$, the algorithm always return the variable set $\{x\}$. When $\circuit.\type = \tnum$, \sampmon will always return $\emptyset$.
For the base case $d = 1$, by definition~\ref{def:circuit} we know that the $\size\inparen{\circuit} = 1$ and \circuit.\type$=$ \tnum or \var. For either case, the probability of the value returned is $1$ since there is only one value to sample from. When \circuit.\val $= x$, the algorithm always return the variable set $\{x\}$. When $\circuit.\type = \tnum$, \sampmon will always return the variable set $\emptyset$.
\AH{I don't think this is technically right, since \sampmon returns a tuple of two values.}
For the inductive hypothesis, assume that for $d \leq k$ and $k \geq 0$ $\sampmon$ indeed returns $\monom$ in $(\monom, \coef)$ of $\expansion{\circuit}$ with probability $\frac{|\coef|}{\abs{\circuit}\polyinput{1}{1}}$.
For the inductive hypothesis, assume that for $d \leq k$ and $k \geq 1$ $\sampmon$ indeed returns $\monom$ in $(\monom, \coef)$ of $\expansion{\circuit}$ with probability $\frac{|\coef|}{\abs{\circuit}\polyinput{1}{1}}$.
We prove now for $d = k + 1$ the inductive step holds. It is the case that the sink of $\circuit$ has two inputs $\circuit_\linput$ and $\circuit_\rinput$. Since $\circuit_\linput$ and $\circuit_\rinput$ are both depth $d - 1 \leq k$, by inductive hypothesis, $\sampmon$ will return $\monom_\linput$ in $(\monom_\lchild, \coef_\lchild)$ of $\expansion{\circuit_\linput}$ and $\monom_\rinput$ in $(\monom_\rchild, \coef_\rchild)$ of $\expansion{\circuit_\rinput}$, from $\circuit_\linput$ and $\circuit_\rinput$ with probability $\frac{|\coef_\lchild|}{\abs{\circuit_\linput}\polyinput{1}{1}}$ and $\frac{|\coef_\rchild|}{\abs{\circuit_\rinput}\polyinput{1}{1}}$.
@ -49,7 +49,7 @@ Lastly, we show by simple induction of the depth $d$ of \circuit that \sampmon i
In the base case, $\circuit.\type = \tnum$ or $\var$. For the former by~\Cref{alg:sample-num-leaf}, \sampmon correctly returns the sign value of the gate. For the latter by~\Cref{alg:sample-var-return}, \sampmon returns the correct sign of $1$, since a variable is a neutral element, and $1$ is the multiplicative identity, whose product with another sign element will not change that sign element.
For the inductive hypothesis, we assume for a circuit of depth $d \leq k$ and $k \geq 0$ that the algorithm correctly returns the sign value of $\coef$.
For the inductive hypothesis, we assume for a circuit of depth $d \leq k$ and $k \geq 1$ that the algorithm correctly returns the sign value of $\coef$.
Similar to before, for a depth \AH{Why do we use $d = k + 1$ for the inductive cases above?}
$d \leq k + 1$, it is true that $\circuit_\linput$ and $\circuit_\rinput$ both return the correct sign of $\coef$. For the case that $\circuit.\type = \circmult$, the sign value of both inputs are multiplied, which is the correct behavior by \cref{def:expand-circuit}. When $\circuit.\type = \circplus$, only one input of $\circuit$ is sampled, and the algorithm returns the correct sign value of $\coef$ by inductive hyptothesis.
@ -59,7 +59,7 @@ $d \leq k + 1$, it is true that $\circuit_\linput$ and $\circuit_\rinput$ both r
It is easy to check that except for lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}, all lines take $O(1)$ time. Consider an execution of \cref{alg:sample-times-union}. We note that we will be adding a given set of variables to some set at most once: since the sum of the sizes of the sets at a given level is at most $\degree(\circuit)$, each gate visited takes $O(\log{\degree(\circuit)})$. For \Cref{alg:sample-plus-bsamp}, note that we pick $\circuit_\linput$ with probability $\frac a{a+b}$ where $a=\circuit.\vari{Lweight}$ and $b=\circuit.\vari{Rweight}$. We can implement this step by picking a random number $r\in[a+b]$ and then checking if $r\le a$. It is easy to check that $a+b\le \abs{\circuit}(1,\dots,1)$. This means we need to add and compare $\log{\abs{\circuit}(1,\ldots, 1)}$-bit numbers, which can certainly be done in time $\multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}$ (note that this is an over-estimate).
Denote \cost(\circuit) (\Cref{eq:cost-sampmon}) to be an upper bound of the number of gates visited by \sampmon. Then the runtime is $O\left(\cost(\circuit)\cdot \log{\degree(\circuit)}\cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}\right)$.
We now bound the number of recursive calls in $\sampmon$ by $O\left((\degree(\circuit) + 1)\right.$$\left.\cdot\right.$ $\left.\depth(\circuit)\right)$, which by the above will prove the claimed runtime.
We now bound the number of recursive calls in $\sampmon$ by $O\left((\degree(\circuit) + 1)\right.$$\left.\cdot\right.$ $\left.\depth(\circuit)\right)$, which by the above will prove the claimed runtime. The reason for this is that the number of recursive calls is exactly the number of calls to lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}.
Let \cost$(\cdot)$ be a function that models an upper bound on the number of gates that can be visited in the run of \sampmon. We define \cost$(\cdot)$ recursively as follows.

View File

@ -16,8 +16,8 @@
\BOOKMARK [2][-]{subsection.4.2}{\376\377\0004\000.\0002\000\040\000O\000u\000r\000\040\000m\000a\000i\000n\000\040\000r\000e\000s\000u\000l\000t}{section.4}% 16
\BOOKMARK [1][-]{section.5}{\376\377\0005\000\040\000R\000e\000l\000a\000t\000e\000d\000\040\000W\000o\000r\000k}{}% 17
\BOOKMARK [1][-]{section.6}{\376\377\0006\000\040\000C\000o\000n\000c\000l\000u\000s\000i\000o\000n\000s\000\040\000a\000n\000d\000\040\000F\000u\000t\000u\000r\000e\000\040\000W\000o\000r\000k}{}% 18
\BOOKMARK [1][-]{section*.12}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 19
\BOOKMARK [1][-]{section*.14}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 20
\BOOKMARK [1][-]{section*.11}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 19
\BOOKMARK [1][-]{section*.13}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 20
\BOOKMARK [1][-]{appendix.A}{\376\377\000A\000\040\000M\000i\000s\000s\000i\000n\000g\000\040\000d\000e\000t\000a\000i\000l\000s\000\040\000f\000r\000o\000m\000\040\000S\000e\000c\000t\000i\000o\000n\000\040\0002}{}% 21
\BOOKMARK [2][-]{subsection.A.1}{\376\377\000A\000.\0001\000\040\000B\000a\000c\000k\000g\000r\000o\000u\000n\000d\000\040\000d\000e\000t\000a\000i\000l\000s\000\040\000f\000o\000r\000\040\000p\000r\000o\000o\000f\000\040\000o\000f\000\040\000p\000r\000o\000p\000:\000e\000x\000p\000e\000c\000t\000i\000o\000n\000-\000o\000f\000-\000p\000o\000l\000y\000n\000o\000m}{appendix.A}% 22
\BOOKMARK [2][-]{subsection.A.2}{\376\377\000A\000.\0002\000\040\000P\000r\000o\000o\000f\000\040\000o\000f\000\040\000p\000r\000o\000p\000:\000c\000t\000i\000d\000b\000-\000r\000e\000d\000u\000c\000t}{appendix.A}% 23

BIN
main.pdf

Binary file not shown.

Binary file not shown.

View File

@ -46,7 +46,7 @@
\usepackage[normalem]{ulem}
\usepackage{subcaption}
\usepackage{booktabs}
\usepackage{todonotes}
\usepackage[disable]{todonotes}
\usepackage{graphicx}
\usepackage{listings}
%%%%%%%%%% SQL + proveannce listing settings