Finished argument for run-time analysis of SampleMonomial.

This commit is contained in:
Aaron Huber 2020-08-31 11:33:15 -04:00
parent a911fdc809
commit d154da5a17
2 changed files with 8 additions and 17 deletions

View file

@ -297,7 +297,6 @@ First, note that for any monomial sampled by algorithm ~\ref{alg:sample}, the no
Prove by structural induction on the depth $d$ of $\etree$. For the base case $d = 0$, by definition ~\ref{def:express-tree} we know that the root has to be either a coefficient or a variable. When the root is a variable $x$, we have the fact that the probability \sampmon returns $x$ is $1$, the algorithm correctly returns $(\{x\}, 1 )$, upholding correctness. When the root is a coefficient, \sampmon correctly returns $sign(c_i) \times 1$.
\AH{I don't know if I need to state why the latter statement (for the case of the root being a coefficient )is correct. I am not sure how to properly argue this either, whether is suffices to say that this follows by definition of our sampling scheme--or if there is a statistical claime, etc...}
%By definition of sampling scheme, this %For $|c_i| \leq 1$, $P(\randvar_i = c_i) = 1$, and correctness follows as the algorithm returns $sign(c_i) \times 1$. When $|c_i| \geq 2$, $P(|\randvar_i| = 1) = \frac{1}{|c_i|}$, and $sign(c_i) \times 1$ yields a properly weighted sampling for the case when $|c_i| \geq 2$.
For the inductive hypothesis, assume that for $d \leq k \geq 0$ lemma ~\ref{lem:sample} is true.
@ -306,9 +305,6 @@ Prove now, that when $d = k + 1$ lemma ~\ref{lem:sample} holds. It is the case
Then the root has to be either a $+$ or $\times$ node.
Consider the case when the root is $\times$. Note that we are sampling a term from $\expandtree$. Consider $(m, c)$ in $\expandtree$, where $m$ is the sampled monomial. Notice also that it is the case that $m = m_L \times m_R$, where $m_L$ is coming from $\etree_L$ and $m_R$ from $\etree_R$. The probability that \sampmon$(\etree_{L})$ returns $m_L$ is $\frac{|c_{m_L}|}{|\etree_L|(1,\ldots, 1)}$ and symmetrically for $m_R$. The final probability for sample $m$ is then $\frac{|c_{m_L}| \cdot |c_{m_R}|}{|\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)}$. For $(m, c)$ in \expandtree, it is indeed the case that $|c_i| = |c_{m_L}| \cdot |c_{m_R}|$ and that $\abstree(1,\ldots, 1) = |\etree_L|(1,\ldots, 1) \cdot |\etree_R|(1,\ldots, 1)$, and therefore $m$ is sampled with correct probability $\frac{|c_i|}{\abstree(1,\ldots, 1)}$.
%When it is the former, algorithm ~\ref{alg:sample} will sample from $\etree_L$ and $\etree_R$ according to their computed weights in algorithm ~\ref{alg:one-pass}, and by inductive hypothesis correctness is ensured.
%the call to $WeightedSample$ over both subtrees will return either of the two subtrees with probability proportional to the distribution computed by \textsc{OnePass}, which is precisely $P(T_L) = \frac{|c_L|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$ and $P(T_R) = \frac{|c_R|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$. By inductive hypothesis, we know that $|c_L|$ and $|c_R|$ are correct, and combined with the fact that $|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1) = \abstree(1,\ldots, 1)$, since the algorithm makes a call to $WeightedSample$, this then proves the inductive step for the case when the root of $\etree$ is $+$.
For the case when the root is a $+$ node, \sampmon ~will sample monomial $m$ from one of its children. By inductive hypothesis we know that $m_L$ and $m_R$ will both be sampled with probability $\frac{|c_{m_L}|}{\etree_{\vari{L}}(1,\ldots, 1)}$ and $\frac{|c_{m_R}|}{|\etree_\vari{R}|(1,\ldots, 1)}$. Assume that $m$ is sampled from $\etree_\vari{L}$, and note that a symmetric argument holds for the case when $m$ is sampled from $\etree_\vari{R}$. Then the probability for $m$ to be sampled from $\etree$ is equal to the product of the probability that $m$ is sampled in $\etree_\vari{L}$ and the probability that $\etree_\vari{L}$ is sampled from $\etree$, and
\begin{align*}
@ -317,15 +313,6 @@ P(\sampmon(\etree) = m) = &P(\sampmon(\etree_\vari{L}) = m) \cdot P(SampledChild
= &\frac{|c_m|}{\abstree(1,\ldots, 1)},
\end{align*}
and we obtain the desired result.
%Since algorithm ~\ref{alg:sample} is choosing $m$ from either \vari{E}($\etree_\vari{L}$) or \vari{E}($\etree_\vari{R}$), we are choosing $m$ out of $|\etree_\vari{L}|(1,\ldots, 1) + |\etree_\vari{R}|(1,\ldots, 1) = \abstree(1,\ldots, 1)$ possible monomials, thus for $m = m_L$ or $m = m_R$, it is the case that $\sampmon$ samples $m$ from $(m, c)$ of $\expandtree$ with probability of $\frac{|c|}{\abstree(1,\ldots, 1)}$, and correctness follows.
% Suppose the algorithm chooses the sample from $\etree_\vari{L}$. Then $P(\randvar = m) = \frac{|c_m|}{|\etree_\vari{L}|(1,\ldots, 1) + |\etree_\vari{R}|(1,\ldots, 1)}$. Since the algorithm is selecting between $m_i$ and $m_j$ in $\expandtree$
%, it is the case that both subtrees compose together one monomial. Here we have that the joint probability of selecting both $\etree_{\vari{L}}$ and $\etree_{\vari{R}}$ is $P(\etree_{\vari{L}} \text{ and } \etree_{\vari{R}}) = P(\etree_{\vari{L}}) \cdot \etree_{\vari{R}})$ which is the same computation made in $\onepass$, and by inductive hypothesis we have correctness.
%, thus no more sampling is necessary, and the algorithm correctly returns the product of the sample output for existing subtrees. This behavior is correct since it is the equivalent of the weights precomputed by \textsc{OnePass} for a $\times$ node, where we select both subtrees of the node with probability $\frac{|c_L| \cdot |c_R|}{|T_L|(1,\ldots, 1) + |T_R|(1,\ldots, 1)}$. This concludes the proof.
\end{proof}
\qed
@ -335,6 +322,10 @@ and we obtain the desired result.
\begin{Lemma}\label{lem:alg-sample-runtime}
For $k = deg(\etree)$, algorithm ~\ref{alg:sample} has a runtime $O(k \cdot depth(\etree))$.
\end{Lemma}
For any $\etree$ of degree $k$, it is the case that the number of leaf nodes visited will be $O(k)$, since at most $k$ variable and $k$ coefficient leaves are visited.
Second, consider that in each level of binary tree $\etree$, $O(k)$ nodes are visited. It follows that algorithm ~\ref{alg:sample} runs in $O(k \cdot depth(\etree))$ time.
\begin{proof}[Proof of Lemma ~\ref{lem:alg-sample-runtime}]
Take an arbitrary sample subgraph of expression tree $\etree$ of degree $k$ and pick an arbitrary level, $i$. Call the number of $\times$ nodes in this level $y_i$, and the total number of nodes $x_i$. Note that the number of nodes on level $i + 1$ in the general case is $y_i + x_i$, and the increase in the number of nodes from level $i$ to level $i + 1$ is upperbounded by $x_{i + 1} - x_i \leq y_i$. So, for any level $i$ of the sample subgraph, there are $O(k)$ multiplication nodes. Note that accounting for coefficients gives $O(2k)$ upper bound, which asymptotically is $O(k)$.
Then, since $\etree$ has $depth(\etree)$ levels, we have a final running time of $O(k \cdot depth(\etree))$.
\end{proof}
\qed

View file

@ -303,7 +303,7 @@ Let $\binom{S}{t}$ denote the set of subsets in $S$ with exactly $t$ edges. In
The following function $f_k$ will be useful in our proofs.
\begin{Definition}\label{def:fk}
Let $f_k: \binom{E_k}{3} \mapsto \binom{E_1}{\leq3}$ be defined as follows. For any $S \in \binom{E_3}{3}$, such that $S = \pbrace{(e_1, b_1), (e_2, b_2), (e_3, b_3)}$, define:
Let $f_k: \binom{E_k}{3} \mapsto \binom{E_1}{\leq3}$ be defined as follows. For any $S \in \binom{E_k}{3}$, such that $S = \pbrace{(e_1, b_1), (e_2, b_2), (e_3, b_3)}$, define:
\[ f_k\left(\pbrace{(e_1, b_1), (e_2, b_2), (e_3, b_3)}\right) = \pbrace{e_1, e_2, e_3}.\]
\end{Definition}
@ -332,7 +332,7 @@ Note that $f_k$ is properly defined. For any $S \in \binom{E_k}{3}$, $|f(S)| \l
\begin{proof}[Proof of Lemma \ref{lem:3m-G2}]
\AR{TODO for {\em later}: I think the proof will be much easier to follow with figures: just drawing out $S\times \{0,1\}$ along with the $(e_i,b_i)$ explicity notated on the edges will make the proof much easier to follow.}
Given any $S \in \binom{E_1}{\leq3}$, we consider $f_2^{-1}(S)$, which is the set of all possible sets of $3$-edge subgraphs in $S \times \{0, 1\}$ which $f_2$ maps to $S$. Then we count the number of $3$-matchings in the $3$-edge subgraphs of $\graph{2}$ in $f_2^{-1}(S)$. We start with $S \in \binom{E_1}{3}$, where $S$ is composed of the edges $e_1, e_2, e_3$ and $f_2^{-1}(S)$ is set of all $3$-edge subsets of the set $\{(e_1, 0), (e_1, 1), (e_2, 0), (e_2, 1), (e_3, 0), (e_3, 1)\}$.
Given any $S \in \binom{E_1}{\leq3}$, we consider $f_2^{-1}(S)$, which is the set of all possible sets of $3$-edge subgraphs in $S \times \{0, 1\}$ which $f_2$ maps to $S$. Then we count the number of $3$-matchings in the $3$-edge subgraphs of $\graph{2}$ in $f_2^{-1}(S)$. We start with $S \in \binom{E_1}{3}$, where $S$ is composed of the edges $e_1, e_2, e_3$ and $f_2^{-1}(S)$ is the set of all $3$-edge subsets of the set $\{(e_1, 0), (e_1, 1), (e_2, 0), (e_2, 1), (e_3, 0), (e_3, 1)\}$.
\begin{itemize}
\item $3$-matching ($\threedis$)