Remove r.v. Y from SampleMonomial correctness proof.

This commit is contained in:
Aaron Huber 2020-08-27 10:03:52 -04:00
parent 658bf5508d
commit 6966d95cb8

View file

@ -295,7 +295,7 @@ For every $(m,c)$ in \expandtree, $\sampmon(\etree)$ returns $m$ with probabilit
\begin{proof}[Proof of Lemma ~\ref{lem:sample}]
First, note that for any monomial sampled by algorithm ~\ref{alg:sample}, the nodes traversed form a subgraph of $\etree$ that is \textit{not} a subtree in the general case. We thus seek to prove that the subgraph traversed produces the correct probability corresponding to the monomial sampled.
Prove by structural induction on the depth $d$ of $\etree$. For the base case $d = 0$, by definition ~\ref{def:express-tree} we know that the root has to be either a coefficient or a variable. When the root is a variable $x$, we have the fact that $P(\randvar_i = x) = 1$, the algorithm correctly returns $(\{x\}, 1 )$, upholding correctness. When the root is a coefficient, \sampmon correctly returns $sign(c_i) \times 1$.
Prove by structural induction on the depth $d$ of $\etree$. For the base case $d = 0$, by definition ~\ref{def:express-tree} we know that the root has to be either a coefficient or a variable. When the root is a variable $x$, we have the fact that the probability \sampmon returns $x$ is $1$, the algorithm correctly returns $(\{x\}, 1 )$, upholding correctness. When the root is a coefficient, \sampmon correctly returns $sign(c_i) \times 1$.
\AH{I don't know if I need to state why the latter statement (for the case of the root being a coefficient )is correct. I am not sure how to properly argue this either, whether is suffices to say that this follows by definition of our sampling scheme--or if there is a statistical claime, etc...}
%By definition of sampling scheme, this %For $|c_i| \leq 1$, $P(\randvar_i = c_i) = 1$, and correctness follows as the algorithm returns $sign(c_i) \times 1$. When $|c_i| \geq 2$, $P(|\randvar_i| = 1) = \frac{1}{|c_i|}$, and $sign(c_i) \times 1$ yields a properly weighted sampling for the case when $|c_i| \geq 2$.
@ -305,7 +305,7 @@ Prove now, that when $d = k + 1$ lemma ~\ref{lem:sample} holds. It is the case
Then the root has to be either a $+$ or $\times$ node.
Consider the case when the root is $\times$. Note that we are sampling a term from $\expandtree$. Consider the $i^{th}$ term in $\expandtree$, and call the sampled element $m$. Notice also that it is the case that $m = m_L \times m_R$, where $m_L$ is coming from $\etree_L$ and $m_R$ from $\etree_R$. Denote $\randvar_{\etree_\vari{L}}$ as the random variable sampling over $\etree_\vari{L}$. Further note that $P(\randvar_{\etree_{\vari{L}}} = m_L ) = \frac{|c_{m_L}|}{|\etree_L|(1,\ldots, 1)}$ and symmetrically for $m_R$. The final probability for the $i^{th}$ monomial sampled is then $P(\randvar_{\etree} = m) = \frac{|c_{m_L}| \cdot |c_{m_R}|}{|\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)}$. For the $i^{th}$ term in \expandtree, it is indeed the case that $|c_i| = |c_{m_L}| \cdot |c_{m_R}|$ and that $\abstree(1,\ldots, 1) = |\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)$, and therefore the $i^{th}$ monomial $m$ sampled is sampled with correct probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.
Consider the case when the root is $\times$. Note that we are sampling a term from $\expandtree$. Consider $(m, c)$ in $\expandtree$, where $m$ is the sampled monomial. Notice also that it is the case that $m = m_L \times m_R$, where $m_L$ is coming from $\etree_L$ and $m_R$ from $\etree_R$. The probability that \sampmon$(\etree_{L})$ returns $m_L$ is $\frac{|c_{m_L}|}{|\etree_L|(1,\ldots, 1)}$ and symmetrically for $m_R$. The final probability for sample $m$ is then $\frac{|c_{m_L}| \cdot |c_{m_R}|}{|\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)}$. For $(m, c)$ in \expandtree, it is indeed the case that $|c_i| = |c_{m_L}| \cdot |c_{m_R}|$ and that $\abstree(1,\ldots, 1) = |\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)$, and therefore $m$ is sampled with correct probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.
%When it is the former, algorithm ~\ref{alg:sample} will sample from $\etree_L$ and $\etree_R$ according to their computed weights in algorithm ~\ref{alg:one-pass}, and by inductive hypothesis correctness is ensured.
%the call to $WeightedSample$ over both subtrees will return either of the two subtrees with probability proportional to the distribution computed by \textsc{OnePass}, which is precisely $P(T_L) = \frac{|c_L|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$ and $P(T_R) = \frac{|c_R|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$. By inductive hypothesis, we know that $|c_L|$ and $|c_R|$ are correct, and combined with the fact that $|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1) = \abstree(1,\ldots, 1)$, since the algorithm makes a call to $WeightedSample$, this then proves the inductive step for the case when the root of $\etree$ is $+$.