Remove r.v. Y from SampleMonomial correctness proof.

This commit is contained in:
Aaron Huber 2020-08-27 10:03:52 -04:00
parent 658bf5508d
commit 6966d95cb8

View file

@ -295,7 +295,7 @@ For every $(m,c)$ in \expandtree, $\sampmon(\etree)$ returns $m$ with probabilit
\begin{proof}[Proof of Lemma ~\ref{lem:sample}] \begin{proof}[Proof of Lemma ~\ref{lem:sample}]
First, note that for any monomial sampled by algorithm ~\ref{alg:sample}, the nodes traversed form a subgraph of $\etree$ that is \textit{not} a subtree in the general case. We thus seek to prove that the subgraph traversed produces the correct probability corresponding to the monomial sampled. First, note that for any monomial sampled by algorithm ~\ref{alg:sample}, the nodes traversed form a subgraph of $\etree$ that is \textit{not} a subtree in the general case. We thus seek to prove that the subgraph traversed produces the correct probability corresponding to the monomial sampled.
Prove by structural induction on the depth $d$ of $\etree$. For the base case $d = 0$, by definition ~\ref{def:express-tree} we know that the root has to be either a coefficient or a variable. When the root is a variable $x$, we have the fact that $P(\randvar_i = x) = 1$, the algorithm correctly returns $(\{x\}, 1 )$, upholding correctness. When the root is a coefficient, \sampmon correctly returns $sign(c_i) \times 1$. Prove by structural induction on the depth $d$ of $\etree$. For the base case $d = 0$, by definition ~\ref{def:express-tree} we know that the root has to be either a coefficient or a variable. When the root is a variable $x$, we have the fact that the probability \sampmon returns $x$ is $1$, the algorithm correctly returns $(\{x\}, 1 )$, upholding correctness. When the root is a coefficient, \sampmon correctly returns $sign(c_i) \times 1$.
\AH{I don't know if I need to state why the latter statement (for the case of the root being a coefficient )is correct. I am not sure how to properly argue this either, whether is suffices to say that this follows by definition of our sampling scheme--or if there is a statistical claime, etc...} \AH{I don't know if I need to state why the latter statement (for the case of the root being a coefficient )is correct. I am not sure how to properly argue this either, whether is suffices to say that this follows by definition of our sampling scheme--or if there is a statistical claime, etc...}
%By definition of sampling scheme, this %For $|c_i| \leq 1$, $P(\randvar_i = c_i) = 1$, and correctness follows as the algorithm returns $sign(c_i) \times 1$. When $|c_i| \geq 2$, $P(|\randvar_i| = 1) = \frac{1}{|c_i|}$, and $sign(c_i) \times 1$ yields a properly weighted sampling for the case when $|c_i| \geq 2$. %By definition of sampling scheme, this %For $|c_i| \leq 1$, $P(\randvar_i = c_i) = 1$, and correctness follows as the algorithm returns $sign(c_i) \times 1$. When $|c_i| \geq 2$, $P(|\randvar_i| = 1) = \frac{1}{|c_i|}$, and $sign(c_i) \times 1$ yields a properly weighted sampling for the case when $|c_i| \geq 2$.
@ -305,7 +305,7 @@ Prove now, that when $d = k + 1$ lemma ~\ref{lem:sample} holds. It is the case
Then the root has to be either a $+$ or $\times$ node. Then the root has to be either a $+$ or $\times$ node.
Consider the case when the root is $\times$. Note that we are sampling a term from $\expandtree$. Consider the $i^{th}$ term in $\expandtree$, and call the sampled element $m$. Notice also that it is the case that $m = m_L \times m_R$, where $m_L$ is coming from $\etree_L$ and $m_R$ from $\etree_R$. Denote $\randvar_{\etree_\vari{L}}$ as the random variable sampling over $\etree_\vari{L}$. Further note that $P(\randvar_{\etree_{\vari{L}}} = m_L ) = \frac{|c_{m_L}|}{|\etree_L|(1,\ldots, 1)}$ and symmetrically for $m_R$. The final probability for the $i^{th}$ monomial sampled is then $P(\randvar_{\etree} = m) = \frac{|c_{m_L}| \cdot |c_{m_R}|}{|\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)}$. For the $i^{th}$ term in \expandtree, it is indeed the case that $|c_i| = |c_{m_L}| \cdot |c_{m_R}|$ and that $\abstree(1,\ldots, 1) = |\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)$, and therefore the $i^{th}$ monomial $m$ sampled is sampled with correct probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$. Consider the case when the root is $\times$. Note that we are sampling a term from $\expandtree$. Consider $(m, c)$ in $\expandtree$, where $m$ is the sampled monomial. Notice also that it is the case that $m = m_L \times m_R$, where $m_L$ is coming from $\etree_L$ and $m_R$ from $\etree_R$. The probability that \sampmon$(\etree_{L})$ returns $m_L$ is $\frac{|c_{m_L}|}{|\etree_L|(1,\ldots, 1)}$ and symmetrically for $m_R$. The final probability for sample $m$ is then $\frac{|c_{m_L}| \cdot |c_{m_R}|}{|\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)}$. For $(m, c)$ in \expandtree, it is indeed the case that $|c_i| = |c_{m_L}| \cdot |c_{m_R}|$ and that $\abstree(1,\ldots, 1) = |\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)$, and therefore $m$ is sampled with correct probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.
%When it is the former, algorithm ~\ref{alg:sample} will sample from $\etree_L$ and $\etree_R$ according to their computed weights in algorithm ~\ref{alg:one-pass}, and by inductive hypothesis correctness is ensured. %When it is the former, algorithm ~\ref{alg:sample} will sample from $\etree_L$ and $\etree_R$ according to their computed weights in algorithm ~\ref{alg:one-pass}, and by inductive hypothesis correctness is ensured.
%the call to $WeightedSample$ over both subtrees will return either of the two subtrees with probability proportional to the distribution computed by \textsc{OnePass}, which is precisely $P(T_L) = \frac{|c_L|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$ and $P(T_R) = \frac{|c_R|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$. By inductive hypothesis, we know that $|c_L|$ and $|c_R|$ are correct, and combined with the fact that $|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1) = \abstree(1,\ldots, 1)$, since the algorithm makes a call to $WeightedSample$, this then proves the inductive step for the case when the root of $\etree$ is $+$. %the call to $WeightedSample$ over both subtrees will return either of the two subtrees with probability proportional to the distribution computed by \textsc{OnePass}, which is precisely $P(T_L) = \frac{|c_L|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$ and $P(T_R) = \frac{|c_R|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$. By inductive hypothesis, we know that $|c_L|$ and $|c_R|$ are correct, and combined with the fact that $|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1) = \abstree(1,\ldots, 1)$, since the algorithm makes a call to $WeightedSample$, this then proves the inductive step for the case when the root of $\etree$ is $+$.