Started restructuring lemma 13 proof

2020-08-22 15:47:56 -04:00 · 2020-08-22 15:47:56 -04:00 · 455b48e9ab
parent 856be5ddff
commit 455b48e9ab
1 changed files with 27 additions and 9 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -46,7 +46,7 @@ Denote $poly(\polytree)$ to be the function that takes as input expression tree
 \begin{Definition}[Expression Tree Set]\label{def:express-tree-set}$\expresstree{\smb}$ is the set of all possible expression trees each of whose corresponding polynomial in the standard monomial basis is $\smb$.  
 \end{Definition}

-Note that \cref{def:express-tree-set} implies that $\polytree \subseteq \expresstree{\smb}$.
+Note that \cref{def:express-tree-set} implies that $\polytree \in \expresstree{\smb}$.

 \begin{Definition}[Expanded T]\label{def:expand-tree}
 $\expandtree$ is the pure SOP expansion of $\polytree$, where non-distinct monomials are not combined.
@ -60,6 +60,12 @@ Let $\abstree$ denote the resulting expression tree when each coefficient $c_i$

 Using the same polynomial from the above example, $poly(\abstree) = (x + 2y)(2x + y) = 2x^2 +xy +4xy + 2y^2 = 2x^2 + 5xy + 2y^2$.

+In the following we lay the groundwork to prove the following theorem.
+
+\begin{Theorem}\label{lem:approx-alg}
+For any query polynomial $\poly(\vct{X})$, an approximation of $\rpoly(\prob_1,\ldots, \prob_n)$ can be computed in $O\left(|\poly|\cdot k \frac{\log\frac{1}{\conf}}{\error^2}\right)$, within $1 \pm \error$ multiplicative error with probability $\geq 1 - \conf$, where $k$ denotes the product width of $\poly$.
+\end{Theorem}
+
 \subsection{Approximating $\rpoly$}

 \subsubsection{Description}
@ -91,13 +97,9 @@ Algorithm ~\ref{alg:mon-sam} approximates $\rpoly$ by employing some auxiliary m
 \end{algorithm}

 \subsubsection{Correctness}
-\AH{Unsure where to place the theorem.  Maybe above.  Need to write the proof for the theorem as well.}
-\begin{Theorem}\label{lem:approx-alg}
-For any query polynomial $\poly(\vct{X})$, an approximation of $\rpoly(\prob_1,\ldots, \prob_n)$ can be computed in $O\left(|\poly|\cdot k \frac{\log\frac{1}{\conf}}{\error^2}\right)$, within $1 \pm \error$ multiplicative error with probability $\geq 1 - \conf$, where $k$ denotes the product width of $\poly$.
-\end{Theorem}

 \begin{Lemma}\label{lem:mon-samp}
-Algorithm \ref{alg:mon-sam} computes $O\left(\frac{\log\frac{1}{\conf}}{\error^2}\right)$ samples, outputting an estimate of $\rpoly(\prob,\ldots, \prob)$ within a multiplicative $1 \pm \error$ error with probability $1 - \conf$.
+Algorithm \ref{alg:mon-sam} outputs an estimate of $\rpoly(\prob,\ldots, \prob)$ within an additive $\error\cdot\abstree(1,\ldots, 1)$ error with probability $1 - \conf$, in $O\left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot \text{need to finish}\right)$ time.
 \end{Lemma}

 %Before the proof, a brief summary of the sample scheme is necessary.  Regardless of the $\polytree$, note that when one samples with a weighted distribution corresponding to the coefficients in $poly(\expandtree)$, it is the same as uniformly sampling over all individual terms of the equivalent polynomial whose terms have coefficients in the set $\{-1, 1\}$, i.e. collapsed monomials are decoupled.  Following this reasoning, algorithim ~\ref{alg:one-pass} computes such a weighted distribution and algorithm ~\ref{alg:sample} produces samples accordingly.  As a result, from here on, we can consider our sampling scheme to be uniform.
@ -105,14 +107,19 @@ Algorithm \ref{alg:mon-sam} computes $O\left(\frac{\log\frac{1}{\conf}}{\error^2
 %each of the $k$ product terms is sampled from individually, where the final output sample is sampled with a probability that is proportional to its coefficient in $\expandtree$.  Note, that   This is performed by \cref{alg:sample} and its correctness will be argued momentarily.  For now it suffices to note that the sampling scheme samples from each of the $k$ products in a POS using a weighted distribution equivalent to sampling uniformly over all monomials.

 \begin{proof}[Proof of Lemma \ref{lem:mon-samp}]
-The first part of the claim in lemma ~\ref{lem:mon-samp} is trivial, as evidenced in the number of iterations in the for loop.  

-Next, consider $\expandtree$ and let $c_i$ be the coefficient of the $i^{th}$ monomial and $\distinctvars_i$ be the number of distinct variables appearing in the $i^{th}$ monomial.  As will be seen, the sampling scheme samples each term $t$ in $\expandtree$ with probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.  Now consider $\rpoly$ and note that $\coeffitem{i}$ is the value of the $i^{th}$ monomial term in $\rpoly(\prob_1,\ldots, \prob_n)$.  Let $m$ be the number of terms in $\expandtree$ and $\coeffset$ to be the set $\{c_1,\ldots, c_m\}.$  
+Consider $\expandtree$ and let $c_i$ be the coefficient of the $i^{th}$ monomial and $\distinctvars_i$ be the number of distinct variables appearing in the $i^{th}$ monomial.  As will be seen, the sampling scheme samples each term $t$ in $\expandtree$ with probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.  Call this sampling scheme $\mathcal{S}$.  Now consider $\rpoly$ and note that $\coeffitem{i}$ is the value of the $i^{th}$ monomial term in $\rpoly(\prob_1,\ldots, \prob_n)$.  Let $m$ be the number of terms in $\expandtree$ and $\coeffset$ to be the set $\{c_1,\ldots, c_m\}.$  

 Consider now a set of $\samplesize$ random variables $\vct{\randvar}$, where each $\randvar_i$ is distributed as described above.  Then for random variable $\randvar_i$, it is the case that $\expct\pbox{\randvar_i} = \sum_{i = 1}^{\setsize}\frac{c'_i \cdot \prob^{\distinctvars_i}}{\sum_{i = 1}^{\setsize}|c_i|} = \frac{\rpoly(\prob,\ldots, \prob)}{\abstree(1,\ldots, 1)}$.  Let $\hoeffest = \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\randvar_i$.  It is also true that 

 \[\expct\pbox{\hoeffest} = \expct\pbox{ \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\randvar_i} = \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\expct\pbox{\randvar_i} = \frac{1}{\samplesize}\sum_{i = 1}^{\samplesize}\frac{1}{\setsize}\sum_{j = 1}^{\setsize}\frac{c'_i \cdot \prob^{\distinctvars}}{\setsize} = \frac{\rpoly(\prob,\ldots, \prob)}{\abstree(1,\ldots, 1)}.\]

+\begin{Lemma}\label{lem:hoeff-est}
+Given $\samplesize$ random variables $\vct{\randvar}$ with distribution $\mathcal{S}$ over expression tree $\polytree$, there exists an additive $\error'$ $\conf$ bounds.
+\end{Lemma}
+
+\begin{proof}[Proof of Lemma \ref{lem:hoeff-est}]
+
 Given the range $[-1, 1]$ for every $\randvar_i$ in $\vct{\randvar}$, by Hoeffding, it is the case that $P\pbox{~\left| \hoeffest - \expct\pbox{\hoeffest} ~\right| \geq \error} \leq 2\exp{-\frac{2\samplesize^2\error^2}{2^2 \samplesize}} \leq \conf$.

 Solving for the number of samples $\samplesize$ we get
@ -127,7 +134,18 @@ Solving for the number of samples $\samplesize$ we get

 Equation \cref{eq:hoeff-1} results computing the sum in the denominator of the exponential.  Equation \cref{eq:hoeff-2} is the result of dividing both sides by $2$.  Equation \cref{eq:hoeff-3} follows from taking the reciprocal of both sides, and noting that such an operation flips the inequality sign.  We then derive \cref{eq:hoeff-4} by the taking the base $e$ log of both sides, and \cref{eq:hoeff-5} results from reducing common factors.  We arrive at the final result of \cref{eq:hoeff-6} by simply multiplying both sides by the reciprocal of the RHS fraction without the $\samplesize$ factor.

-By Hoeffding, then algorithm ~\ref{alg:mon-sam} takes the correct number of samples to obtain the desired confidence bounds.  Note that Hoeffding is assuming the sum of random variables be divided by the number of variables.  Also see that to properly estimate $\rpoly$, it is necessary to multiply by the number of monomials in $\rpoly$, i.e. $\abstree(1,\ldots, 1)$.  Therefore it is the case that $\frac{acc}{N}$ gives the estimate of one monomial, and multiplying by $\abstree(1,\ldots, 1)$ yields the estimate of $\rpoly(\prob,\ldots, \prob)$.  This concludes the proof of lemma ~\ref{lem:mon-samp}.
+By Hoeffding we obtain the number of samples necessary to acheive the claimed confidence bounds.  
+\end{proof}
+\qed
+\begin{Corollary}\label{cor:adj-err}
+There is a value $\error$ that achieves $1 \pm \epsilon$ multiplicative error bounds.
+\end{Corollary}
+\begin{proof}[Proof of Corollary \ref{cor:adj-err}]
+Since it is the case that we have $\error' \cdot \abstree(1,\ldots, 1)$ additive error, one can set $\error = \error' \cdot \frac{\rpoly(\prob,\ldots, \prob)}{\abstree(1,\ldots, 1)}$.
+\end{proof}
+\qed
+
+Note that Hoeffding is assuming the sum of random variables be divided by the number of variables.  Also see that to properly estimate $\rpoly$, it is necessary to multiply by the number of monomials in $\rpoly$, i.e. $\abstree(1,\ldots, 1)$.  Therefore it is the case that $\frac{acc}{N}$ gives the estimate of one monomial, and multiplying by $\abstree(1,\ldots, 1)$ yields the estimate of $\rpoly(\prob,\ldots, \prob)$.  This concludes the proof of lemma ~\ref{lem:mon-samp}.

 \end{proof}