In the middle of Oliver's 091420 suggestions

master
Aaron Huber 2020-09-16 16:27:50 -04:00
parent 9b20a5f195
commit f5482e2770
5 changed files with 146 additions and 93 deletions

View File

@ -40,7 +40,7 @@ Now, some useful definitions and notation. For illustrative purposes in the def
The degree of polynomial $\poly(\vct{X})$ is the maximum sum of the exponents of a monomial, over all monomials.
\end{Definition}
The degree of $\poly(\vct{X})$ in the above example is $2$. In this note we consider only finite degree polynomials.
The degree of $\poly(\vct{X})$ in the above example is $2$. In this paper we consider only finite degree polynomials.
\AH{We need to verify that this definition is consistent with the rest of the paper. Also, it might be useful to specify coefficients are 1?}
\begin{Definition}[Monomial]\label{def:monomial}
@ -65,35 +65,56 @@ Note that $\etree$ encodes an expression generally \textit{not} in the standard
\begin{Definition}[poly$(\cdot)$]\label{def:poly-func}
Denote $poly(\etree)$ to be the function that takes as input expression tree $\etree$ and outputs its corresponding polynomial. Recursively defined on $\etree$ as follows, where $\etree_\lchild$ and $\etree_\rchild$ denote the left and right child of $\etree$ respectively.
\begin{align*}
&\etree.\type = +\mapsto&& \polyf(\etree_\lchild) + \polyf(\etree_\rchild)\\
&\etree.\type = \times\mapsto&& \polyf(\etree_\lchild) \cdot \polyf(\etree_\rchild)\\
&\etree.\type = \var \text{ OR } \tnum\mapsto&& \etree.\val
\end{align*}
% \begin{align*}
% &\etree.\type = +\mapsto&& \polyf(\etree_\lchild) + \polyf(\etree_\rchild)\\
% &\etree.\type = \times\mapsto&& \polyf(\etree_\lchild) \cdot \polyf(\etree_\rchild)\\
% &\etree.\type = \var \text{ OR } \tnum\mapsto&& \etree.\val
% \end{align*}
\begin{equation*}
\polyf(\etree) = \begin{cases}
\polyf(\etree_\lchild) + \polyf(\etree_\rchild) &\text{ if \etree.\type } = +\\
\polyf(\etree_\lchild) \cdot \polyf(\etree_\rchild) &\text{ if \etree.\type } = \times\\
\etree.\val &\text{ if \etree.\type } = \var \text{ OR } \tnum.
\end{cases}
\end{equation*}
\end{Definition}
\AH{1) Is $OR$ the best way to express the third case?
\AH{
\par2) Below seems like over-defining to me. Is this really necessary? The first sentence I think is \textit{enough}.}
Note that addition and multiplication follow the normal interpretation over polynomials. Specifically, when adding two monomials whose variables and respective exponents agree, the coefficients corresponding to the monomials are added and their sum is multiplied to the monomial. Multiplication here is denoted by concatenation of the monomial and coefficient. When two monomials are multiplied, the product of each corresponding coefficient is computed, and the variables in each monomial are multiplied, i.e., the exponents of like variables are added. Again we notate this by the direct product of coefficient product and all disitinct variables in the two monomials, with newly computed exponents.
Note that addition and multiplication above follow the standard interpretation over polynomials.
%Specifically, when adding two monomials whose variables and respective exponents agree, the coefficients corresponding to the monomials are added and their sum is multiplied to the monomial. Multiplication here is denoted by concatenation of the monomial and coefficient. When two monomials are multiplied, the product of each corresponding coefficient is computed, and the variables in each monomial are multiplied, i.e., the exponents of like variables are added. Again we notate this by the direct product of coefficient product and all disitinct variables in the two monomials, with newly computed exponents.
\begin{Definition}[Expression Tree Set]\label{def:express-tree-set}$\etreeset{\smb}$ is the set of all possible expression trees $\etree$, such that $poly(\etree) = \poly(\vct{X})$.
\end{Definition}
For our running example, $\etreeset{\smb} = \{2x^2 + 3xy - 2y^2, (x + 2y)(2x - y)\}$. Note that \cref{def:express-tree-set} implies that $\etree \in \etreeset{poly(\etree)}$.
\AH{Just wondering if there is a more simple way to describe ~\cref{def:expand-tree}. \par Also not sure about the notation \vari{List}.}
\begin{Definition}[Expanded T]\label{def:expand-tree}
$\expandtree{\etree}$ is the pure SOP expansion of $\etree$. The logical view of \expandtree{\etree} ~is a list of tuples $(\monom, \coef)$, where $\monom$ is of type monomial and $\coef$ is in $\mathbb{R}$, recursively defined as
\begin{align*}
&\etree.\type = + \mapsto&& \elist{\expandtree{\etree_\lchild}, \expandtree{\etree_\rchild}}\\
&\etree.\type = \times \mapsto&& \elist{\expandtree{\etree_\lchild} \otimes \expandtree{\etree_\rchild}}\\
&\etree.\type = \tnum \mapsto&& \elist{(\emptyset, \etree.\val)}\\
&\etree.\type = \var \mapsto&& \elist{(\etree.\val, 1)}
\end{align*}
such that the multiplication of two tuples %is the standard multiplication over monomials and the standard multiplication over coefficients to produce the product tuple, as in
is their direct product $(\monom_1, \coef_1) \cdot (\monom_2, \coef_2) = (\monom_1 \times \monom_2, \coef_1 \times \coef_2)$ such that monomials $\monom_1$ and $\monom_2$ are concatenated in a product operation, while the standard product operation over reals applies to $\coef_1 \times \coef_2$. The operator $\otimes$ is defined as the cross-product tuple multiplication of all such tuples returned by both $\expandtree{\etree_\lchild}$ and $\expandtree{\etree_\rchild}$.
$\expandtree{\etree}$ is the pure sum of products expansion of $\etree$. The logical view of \expandtree{\etree} ~is a list of tuples $(\monom, \coef)$, where $\monom$ is of type monomial and $\coef$ is in $\mathbb{R}$. \expandtree{\etree} has the following recursive definition.
\end{Definition}
% recursively defined as
% \begin{align*}
% &\etree.\type = + \mapsto&& \elist{\expandtree{\etree_\lchild}, \expandtree{\etree_\rchild}}\\
% &\etree.\type = \times \mapsto&& \elist{\expandtree{\etree_\lchild} \otimes \expandtree{\etree_\rchild}}\\
% &\etree.\type = \tnum \mapsto&& \elist{(\emptyset, \etree.\val)}\\
% &\etree.\type = \var \mapsto&& \elist{(\etree.\val, 1)}
% \end{align*}
\begin{align*}
\expandtree{\etree} = \begin{cases}
\expandtree{\etree_\lchild} \circ \expandtree{\etree_\rchild} &\textbf{ if }\etree.\type = +\\
\left\{(\monom_\lchild \cup \monom_\rchild, \coef_\lchild \cdot \coef_\rchild) ~|~ (\monom_\lchild, \coef_\lchild) \in \expandtree{\etree_\lchild}, (\monom_\rchild, \coef_\rchild) \in \expandtree{\etree_\rchild}\right\} &\textbf{ if }\etree.\type = \times\\
\elist{(\emptyset, \etree.\val)} &\textbf{ if }\etree.\type = \tnum\\
\elist{(\{\etree.\val\}, 1)} &\textbf{ if }\etree.\type = \var.\\
\end{cases}
\end{align*}
%where that the multiplication of two tuples %is the standard multiplication over monomials and the standard multiplication over coefficients to produce the product tuple, as in
%is their direct product $(\monom_1, \coef_1) \cdot (\monom_2, \coef_2) = (\monom_1 \cdot \monom_2, \coef_1 \times \coef_2)$ such that monomials $\monom_1$ and $\monom_2$ are concatenated in a product operation, while the standard product operation over reals applies to $\coef_1 \times \coef_2$. The product of $\expandtree{\etree_\lchild} \cdot \expandtree{\etree'_\rchild}$ is then the cross product of the multiplication of all such tuples returned to both $\expandtree{\etree_\lchild}$ and $\expandtree{\etree_\rchild}$. %The operator $\otimes$ is defined as the cross-product tuple multiplication of all such tuples returned by both $\expandtree{\etree_\lchild}$ and $\expandtree{\etree_\rchild}$.
\begin{Example}\label{example:expr-tree-T}
To illustrate \cref{def:expand-tree} with an example, consider the product $(x + 2y)(2x - y)$ and its expression tree $\etree$ in Figure ~\ref{fig:expr-tree-T}. The pure expansion of the product is $2x^2 - xy + 4xy - 2y^2 = \expandtree{\etree}$, logically viewed as $[(2, x^2), (-1, xy), (4, xy), (-2, y^2)]$. (For preciseness, note that $\etree$ would use a $+$ node to model the second factor ($\etree_\rchild$), while storing a child coefficient of $-1$ for the variable $y$. The subtree $\etree_\rchild$ would be $+(\times(2, x), \times(-1, y))$, and one can see that $\etree_\rchild$ is indeed equivlent to $(2x - y)$).
\end{Example}
@ -148,7 +169,7 @@ Given an expression tree $\etree$ and $\vct{v} \in \mathbb{R}^\numvar$, $\etree(
In the subsequent subsections we lay the groundwork to prove the following theorem.
\begin{Theorem}\label{lem:approx-alg}
For any query polynomial $\poly(\vct{X})$, an approximation of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in $O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\etree}^2(1,\ldots, 1)}{\error^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)$, with $(\error,\delta)$-bounds, where $k$ denotes the degree of $\poly$.
For any query polynomial $\poly(\vct{X})$, an approximation of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in $O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\etree}^2(1,\ldots, 1)}{\error^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)$, with multiplicative $(\error,\delta)$-bounds, where $k$ denotes the degree of $\poly$.
\end{Theorem}
\subsection{Approximating $\rpoly$}
@ -187,23 +208,30 @@ Kindly recall that the notaion $[x, y]$ denotes the range of values between $x$
\end{algorithm}
\subsubsection{Correctness}
\begin{Theorem}\label{lem:mon-samp}
For any $\etree$ with $\degree(poly(|\etree|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ within an additive $\error \cdot \abs{\etree}(1,\ldots, 1)$ error with probability $1 - \conf$, in $O\left(\treesize(\etree) + \left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot depth(\etree)\right)\right)$ time.
\end{Theorem}
At the conclusion of $\onepass$, $\etree.\vari{partial}$ will hold sum of all coefficients in $\expandtree{\abs{\etree}}$, i.e. $\sum\limits_{(\monom, \coef) \in \expandtree{\abs{\etree}}}\coef$. $\etree.\vari{weight}$ will hold the weighted probability that $\etree$ is sampled from from its parent $+$ node.
We state the lemmas for $\onepass$ and $\sampmon$, the auxiliary algorithms on which ~\cref{alg:mon-sam} relies. Their proofs are subsequent.
\begin{Lemma}\label{lem:one-pass}
There exists an algorithm $\onepass$ which correctly computes $\abs{\vari{S}}(1,\ldots, 1)$ for each subtree $\vari{S}$ of $\etree$. For the left child $\vari{S}_\lchild$ of $\vari{S}$ such that $\vari{S}.\val = +$, it correctly computes the weighted distribution $\frac{\abs{\vari{S}_\lchild}(1,\ldots, 1)}{\abs{\vari{S}}(1,\ldots, 1)}$ and likewise for the right child. All computations are performed in one traversal in $O(size(\etree))$ time.
The $\onepass$ function completes in $O(size(\etree))$ time. After $\onepass$ returns the following post conditions hold. First, that $\abs{\vari{S}}(1,\ldots, 1)$ is correctly computed for each subtree $\vari{S}$ of $\etree$. Second, when $\vari{S}.\val = +$, the weighted distribution $\frac{\abs{\vari{S}_{\vari{child}}}(1,\ldots, 1)}{\abs{\vari{S}}(1,\ldots, 1)}$ is correctly computed for each child of $\vari{S}.$
\end{Lemma}
At the conclusion of $\onepass$, $\etree.\vari{partial}$ will hold sum of all coefficients in $\expandtree{\abs{\etree}}$, i.e., $\sum\limits_{(\monom, \coef) \in \expandtree{\abs{\etree}}}\coef$. $\etree.\vari{weight}$ will hold the weighted probability that $\etree$ is sampled from from its parent $+$ node.
\begin{Lemma}\label{lem:sample}
For every $(\monom,\coef)$ in $\vari{E}(\abs{\etree})$, $k = \degree(poly(\abs{\etree})$, there exists an algorithm $\sampmon(\etree)$ that returns $\left(\monom, sign(\coef)\right)$ with probability $\frac{|\coef|}{\abs{\etree}(1,\ldots, 1)}$ in $O(\log{k} \cdot k \cdot depth(\etree))$ time.
The function $\sampmon$ complete in $O(\log{k} \cdot k \cdot depth(\etree))$ time, where $k = \degree(poly(\abs{\etree})$. Upon completion, with probability $\frac{|\coef|}{\abs{\etree}(1,\ldots, 1)}$, $\sampmon$ returns the sampled term $\left(\monom, sign(\coef)\right)$ from $\expandtree{\abs{\etree}}$.
\end{Lemma}
\begin{Theorem}\label{lem:mon-samp}
If the contracts for $\onepass$ and $\sampmon$ hold, then for any $\etree$ with $\degree(poly(|\etree|)) = k$, algorithm \ref{alg:mon-sam} outputs an estimate $\mathcal{X}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ %within an additive $\error \cdot \abs{\etree}(1,\ldots, 1)$ error with
with bound $P\left(\left|\mathcal{X} - \expct\pbox{\mathcal{X}}\right|\geq \error \cdot \abs{\etree}(1,\ldots, 1)\right) \leq \conf$, in $O\left(\treesize(\etree) + \left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot k \cdot\log{k} \cdot depth(\etree)\right)\right)$ time.
\end{Theorem}
\begin{proof}[Proof of Theorem \ref{lem:mon-samp}]
Consider $\expandtree{\etree}{\etree}$ and let $(\monom, \coef)$ be an arbitrary tuple in $\expandtree{\etree}{\etree}$. For convenience over alphabet $\Sigma$, define $\evalmp: \left(\{\nu^a~|~\nu \in \Sigma, a \in \mathbb{N}\}, [0, 1]^\numvar\right)\mapsto \mathbb{R}$, a function that takes a monomial $\monom$ and probability vector $\vct{p}$ as input and outputs the evaluation of $\monom$ over $\vct{p}$. By ~\cref{lem:sample}, the sampling scheme samples $(\monom, \coef)$ in $\expandtree{\etree}$ with probability $\frac{|\coef|}{\abs{\etree}(1,\ldots, 1)}$. Now consider $\rpoly$ and note that
@ -243,8 +271,9 @@ By Hoeffding we obtain the number of samples necessary to acheive the claimed ad
This concludes the proof for the first claim of theorem ~\ref{lem:mon-samp}.
\paragraph{Run-time Analysis}
Note that lines ~\ref{alg:mon-sam-global1}, ~\ref{alg:mon-sam-global2}, and ~\ref{alg:mon-sam-global3} are $O(1)$ global operations. The call to $\onepass$ in line ~\ref{alg:mon-sam-onepass} by lemma ~\ref{lem:one-pass} is $O(|\etree|)$ time.
First, algorithm ~\ref{alg:mon-sam} calls \textsc{OnePass} which takes $O(|\etree|)$ time. Then for $\numsamp = \ceil{\frac{2 \log{\frac{2}{\conf}}}{\error^2}}$, the $O(1)$ assignment, product, and addition operations occur. Over the same $\numsamp$ iterations, $\sampmon$ is called, with a runtime of $O(\log{k}\cdot k \cdot depth(\etree)$ by lemma ~\ref{lem:sample}. Finally, over the same iterations, because $\degree(\polyf(\abs{\etree})) = k$, the assignment and product operations of line ~\ref{alg:mon-sam-product2} are called at most $k$ times.
Note that lines ~\ref{alg:mon-sam-global1}, ~\ref{alg:mon-sam-global2}, and ~\ref{alg:mon-sam-global3} are $O(1)$ global operations. The call to $\onepass$ in line ~\ref{alg:mon-sam-onepass} by lemma ~\ref{lem:one-pass} is $O(\treesize(\etree))$ time.
%First, algorithm ~\ref{alg:mon-sam} calls \textsc{OnePass} which takes $O(|\etree|)$ time.
Then for $\numsamp = \ceil{\frac{2 \log{\frac{2}{\conf}}}{\error^2}}$, the $O(1)$ assignment, product, and addition operations occur. Over the same $\numsamp$ iterations, $\sampmon$ is called, with a runtime of $O(\log{k}\cdot k \cdot depth(\etree)$ by lemma ~\ref{lem:sample}. Finally, over the same iterations, because $\degree(\polyf(\abs{\etree})) = k$, the assignment and product operations of line ~\ref{alg:mon-sam-product2} are called at most $k$ times.
Thus we have $O(\treesize(\etree)) + O(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot \left(k + \log{k}\cdot k \cdot depth(\etree)\right) = O\left(\treesize(\etree) + \left(\frac{\log{\frac{1}{\conf}}}{\error^2} \cdot \left(k \cdot\log{k} \cdot depth(\etree)\right)\right)\right)$ overall running time.
\end{proof}
@ -273,19 +302,31 @@ and the runtime then follows, thus upholding ~\cref{lem:approx-alg}.
\subsubsection{Description}
Algorithm ~\ref{alg:one-pass} satisfies the requirements of lemma ~\ref{lem:one-pass}.
$\abs{\etree}(1,\ldots, 1)$ can be defined recursively, as follows:
\begin{align*}
&\etree.\type = \times \mapsto&& \etree_\lchild \times \etree_\rchild\\
&\etree.\type = + \mapsto&& \etree_\lchild + \etree_\rchild\\
&\etree.\type = \tnum \mapsto&& |\etree.\val|\\
&\etree.\type = \var \mapsto&& 1.
\end{align*}
The evaluation of $\abs{\etree}(1,\ldots, 1)$ $\eval{\cdot}_{\abs{\etree}}$, can be defined recursively, as follows:
%\begin{align*}
% &\etree.\type = \times \mapsto&& \etree_\lchild \times \etree_\rchild\\
% &\etree.\type = + \mapsto&& \etree_\lchild + \etree_\rchild\\
% &\etree.\type = \tnum \mapsto&& |\etree.\val|\\
% &\etree.\type = \var \mapsto&& 1.
%\end{align*}
In the same fashion the weighted distribution can be described with the additional action at a $+$ node:
\begin{align*}
\etree.\type = + \mapsto&& \etree_\lchild + \etree_\rchild; \etree_\lchild.\vari{weight} = \frac{\etree_\lchild}{\etree_\lchild + \etree_\rchild}, \etree_\rchild.\vari{weight} = \frac{\etree_\rchild}{\etree_\lchild + \etree_\rchild}
&\eval{\etree ~|~ \etree.\type = +}_{\abs{\etree}} =&& \eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}\\
&\eval{\etree ~|~ \etree.\type = \times}_{\abs{\etree}} = && \eval{\etree_\lchild}_{\abs{\etree}} \cdot \eval{\etree_\rchild}_{\abs{\etree}}\\
&\eval{\etree ~|~ \etree.\type = \tnum}_{\abs{\etree}} = && \etree.\val\\
&\eval{\etree ~|~ \etree.\val = \var}_{\abs{\etree}} = && 1
\end{align*}
In the same fashion the weighted distribution, $\eval{\cdot}_{\wght}$ can be described as above with the following modification for $\etree.\type = +$:
%\begin{align*}
% \etree.\type = + \mapsto&& \etree_\lchild + \etree_\rchild; \etree_\lchild.\vari{weight} = \frac{\etree_\lchild}{\etree_\lchild + \etree_\rchild}, \etree_\rchild.\vari{weight} = \frac{\etree_\rchild}{\etree_\lchild + \etree_\rchild}
%\end{align*}
\begin{align*}
&\eval{\etree~|~\etree.\type = +}_{\wght} =&&\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}; \etree_\lchild.\wght = \frac{\eval{\etree_\lchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}}; \etree_\rchild.\wght = \frac{\eval{\etree_\rchild}_{\abs{\etree}}}{\eval{\etree_\lchild}_{\abs{\etree}} + \eval{\etree_\rchild}_{\abs{\etree}}}
\end{align*}
Algorithm ~\ref{alg:one-pass} essentially implements the above definitions.
@ -344,19 +385,19 @@ level 2/.style={sibling distance=1.25cm},
]
\Tree [.\node(root){$\boldsymbol{+}$};
\edge [wght_color] node[midway, auto= right, font=\bfseries] {$\bsym{\frac{4}{5}}$}; [.\node[highlight_color](tl){$\boldsymbol{\times}$};
\edge [wght_color] node[midway, auto= right, font=\bfseries, gray] {$\bsym{\frac{4}{5}}$}; [.\node[highlight_color](tl){$\boldsymbol{\times}$};
[.\node(s){$\bsym{+}$};
\edge[wght_color] node[pos=0.35, left, font=\bfseries]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](sl){$\bsym{x_1}$}; ]
\edge[wght_color] node[pos=0.35, right, font=\bfseries]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](sr){$\bsym{x_2}$}; ]
\edge[wght_color] node[pos=0.35, left, font=\bfseries, gray]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](sl){$\bsym{x_1}$}; ]
\edge[wght_color] node[pos=0.35, right, font=\bfseries, gray]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](sr){$\bsym{x_2}$}; ]
]
[.\node(sp){$\bsym{+}$};
\edge[wght_color] node[pos=0.35, left, font=\bfseries]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](spl){$\bsym{x_1}$}; ]
\edge[wght_color] node[pos=0.35, right, font=\bfseries]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](spr){$\bsym{\times}$};
\edge[wght_color] node[pos=0.35, left, font=\bfseries, gray]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](spl){$\bsym{x_1}$}; ]
\edge[wght_color] node[pos=0.35, right, font=\bfseries, gray]{$\bsym{\frac{1}{2}}$}; [.\node[highlight_color](spr){$\bsym{\times}$};
[.$\bsym{-1}$ ] [.$\bsym{x_2}$ ]
]
]
]
\edge [wght_color] node[midway, auto=left, font=\bfseries] {$\bsym{\frac{1}{5}}$}; [.\node[highlight_color](tr){$\boldsymbol{\times}$};
\edge [wght_color] node[midway, auto=left, font=\bfseries, gray] {$\bsym{\frac{1}{5}}$}; [.\node[highlight_color](tr){$\boldsymbol{\times}$};
[.$\bsym{x_2}$
\edge [draw=none]; [.\node[draw=none]{}; ]
\edge [draw=none]; [.\node[draw=none]{}; ]

View File

@ -107,6 +107,8 @@
\newcommand{\out}{output}%output aggregation over the output vector
\newcommand{\numocc}[2]{\#\left(#1, #2\right)}
%Graph Symbols
%Shift macro
\newcommand{\patternshift}[1]{\hspace*{-0.5mm}\raisebox{-0.35mm}{#1}\hspace*{-0.5mm} }
%Global styles
\tikzset{
default_node/.style={align=center, inner sep=0pt},
@ -117,33 +119,39 @@
edge from parent path={(\tikzparentnode) -- (\tikzchildnode)}
}
%Subgraph patterns
\newcommand{\ed}{
\newcommand{\ed}{\patternshift{
\begin{tikzpicture}[every path/.style={thick, draw}]%[baseline=0.00005cm]
%\begin{scope}[yshift=-0.1cm]
%\begin{scope}[yshift=-5cm]
\node at (0, 0)[pattern_node](bottom){};
\node [above=0.07cm of bottom, pattern_node] (top){};
\draw (top) -- (bottom);
% \node at (0, -2)[pattern_node, blue](b2){};
% \node [above=0.07cm of b2, pattern_node, blue] (t2){};
% \draw (t2) -- (b2);
%\end{scope}
\end{tikzpicture}
}
\newcommand{\twodis}{
}
\newcommand{\twodis}{\patternshift{
\begin{tikzpicture}[every path/.style={thick, draw}]
\node at (0, 0) [pattern_node] (bottom1) {};
\node[above=0.07cm of bottom1, pattern_node] (top1) {} edge (bottom1);
\node at (0.14, 0) [pattern_node] (bottom2) {};
\node [above=0.07cm of bottom2, pattern_node] (top2) {} edge (bottom2);
\end{tikzpicture}
}
\newcommand{\twopath}{
}
}
\newcommand{\twopath}{\patternshift{
\begin{tikzpicture}[every path/.style={thick, draw}]
\node at (0, 0.08) [pattern_node] (top){};
\node [below left=0.075cm and 0.05cm of top, pattern_node](left){};
\node[below right=0.075cm and 0.05cm of top, pattern_node](right){};
\node [below left=0.095cm and 0.05cm of top, pattern_node](left){};
\node[below right=0.095cm and 0.05cm of top, pattern_node](right){};
\draw (top) -- (left);
\draw (top) -- (right);
\end{tikzpicture}
}
\newcommand{\threedis}{
}
\newcommand{\threedis}{\patternshift{
\begin{tikzpicture}[every path/.style={thick, draw}]
\node at (0, 0) [pattern_node] (bottom1) {};
\node[above=0.07cm of bottom1, pattern_node] (top1) {} edge (bottom1);
@ -153,15 +161,17 @@
\node [above=0.07cm of bottom3, pattern_node] (top3) {} edge (bottom3);
\end{tikzpicture}
}
\newcommand{\tri}{
}
\newcommand{\tri}{\patternshift{
\begin{tikzpicture}[every path/.style={ thick, draw}]
\node at (0, 0.08) [pattern_node] (top){};
\node [below left=0.08cm and 0.01cm of top, pattern_node](left){} edge (top);
\node[below right=0.08cm and 0.01cm of top, pattern_node](right){} edge (top) edge (left);
\end{tikzpicture}
}
}
\newcommand{\twopathdis}{\ed~\twopath}
\newcommand{\threepath}{
\newcommand{\threepath}{\patternshift{
\begin{tikzpicture}[every path/.style={thick, draw}]
\node at (0, 0) [pattern_node] (node1a) {};
\node [above=0.07cm of node1a, pattern_node] (node1b) {} edge (node1a);
@ -170,7 +180,8 @@
\draw (node1b) -- (node3b);
\end{tikzpicture}
}
\newcommand{\oneint}{
}
\newcommand{\oneint}{\patternshift{
\begin{tikzpicture}[level/.style={sibling distance=0.14cm, level distance=0.15cm}, every path/.style={thick, draw}]
\node at (0, 0) [pattern_node] {} [grow=down]
child{node [pattern_node]{}}
@ -178,6 +189,7 @@
child{node [pattern_node] {}};
\end{tikzpicture}
}
}
\newcommand{\bsym}[1]{\boldsymbol{#1}}%b for bold; sym for symbol
%%%%%%%%%%%%%%%%%%%

BIN
main.synctex(busy) Normal file

Binary file not shown.

View File

@ -7,12 +7,10 @@ Before proceeding, note that the following is assuming $\ti$s in the setting of
Throughout the note, we also make the following \textit{assumption}.
\begin{Assumption}
All polynomials in this note are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$.
All polynomials considered are in standard monomial basis, i.e., $\poly(\vct{X}) = \sum\limits_{\vct{d} \in \mathbb{N}^\numvar}q_d \cdot \prod\limits_{i = 1, d_i \geq 1}^{\numvar}X_i^{d_i}$, where $q_d$ is the coefficient for the monomial encoded in $\vct{d}$ and $d_i$ is the $i^{th}$ element of $\vct{d}$.
\end{Assumption}
We can think of $\poly(\vct{w})$ as a function whose input are the variables $X_1,\ldots, X_\numvar$ as in $\poly(\vct{X})$.
\begin{Definition}\label{def:qtilde}
Define $\rpoly(X_1,\ldots, X_\numvar)$ as the reduced version of $\poly(X_1,\ldots, X_\numvar)$, of the form
$\rpoly(X_1,\ldots, X_\numvar) = $
@ -20,12 +18,14 @@ $\rpoly(X_1,\ldots, X_\numvar) = $
\[\poly(X_1,\ldots, X_\numvar) \mod \wbit_1^2-\wbit_1\cdots\mod \wbit_\numvar^2 - \wbit_\numvar.\]
\end{Definition}
\begin{Example}\label{example:qtilde}
Think of, for example, $\poly(x, y) = (x + y)(x + y)$. Then the expanded derivation for $\rpoly(x, y)$ is
\begin{align*}
(&x^2 + 2xy + y^2 \mod x^2 - x) \mod y^2 - y\\
= ~&x + 2xy + y^2 \mod y^2 - y\\
= ~& x + 2xy + y
\end{align*}
\end{Example}
Intuitively, $\rpoly(\textbf{X})$ is the expanded sum of products form of $\poly(\textbf{X})$ such that if any $X_j$ term has an exponent $e > 1$, it is reduced to $1$, i.e. $X_j^e\mapsto X_j$ for any $e > 1$.
Alternatively, one can gain intuition for $\rpoly$ by thinking of $\rpoly$ as the resulting sum of product expansion of $\poly$ when $\poly$ is in a factorized form such that none of its terms have an exponent $e > 1$, if the product operator is idempotent.
@ -42,11 +42,11 @@ Follows by the construction of $\rpoly$ in \cref{def:qtilde}.
\qed
Note the following fact:
\begin{Proposition}
\[\text{For all } (\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}^\numvar, \poly(\wbit_1,\ldots, \wbit_\numvar) = \rpoly(\wbit_1,\ldots, \wbit_\numvar).\]
\begin{Proposition}\label{proposition:q-qtilde}
\[\text{For all } (X_1,\ldots, X_\numvar) \in \{0, 1\}^\numvar, \poly(X_1,\ldots, X_\numvar) = \rpoly(X_1,\ldots, X_\numvar).\]
\end{Proposition}
\begin{proof}
\begin{proof}[Proof for Proposition ~\ref{proposition:q-qtilde}]
Note that any $\poly$ in factorized form is equivalent to its sum of product expansion. For each term in the expanded form, further note that for all $b \in \{0, 1\}$ and all $e \geq 1$, $b^e = b$.
\end{proof}
@ -60,7 +60,7 @@ The expectation over possible worlds in $\poly$ is equal to $\rpoly(\prob_1,\ldo
\end{equation*}
\end{Lemma}
\begin{proof}
\begin{proof}[Proof for Lemma ~\ref{lem:exp-poly-rpoly}]
%Using the fact above, we need to compute \[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar)\]. We therefore argue that
%\[\sum_{(\wbit_1,\ldots, \wbit_\numvar) \in \{0, 1\}}\rpoly(\wbit_1,\ldots, \wbit_\numvar) = 2^\numvar \cdot \rpoly(\frac{1}{2},\ldots, \frac{1}{2}).\]
@ -68,7 +68,7 @@ Let $\poly$ be the generalized polynomial, i.e., the polynomial of $\numvar$ var
\[\poly(X_1,\ldots, X_\numvar) = \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar X_i^{d_i}\].
Then for expectation we have
Then, assigning $\vct{w}$ to $\vct{X}$, for expectation we have
\begin{align}
\expct_{\wVec}\pbox{\poly(\wVec)} &= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \expct_{\wVec}\pbox{\prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar w_i^{d_i}}\label{p1-s1}\\
&= \sum_{\vct{d} \in \{0,\ldots, B\}^\numvar}q_{\vct{d}}\cdot \prod_{\substack{i = 1\\s.t. d_i \geq 1}}^\numvar \expct_{\wVec}\pbox{w_i^{d_i}}\label{p1-s2}\\
@ -92,20 +92,18 @@ Finally, observe \cref{p1-s5} by construction in \cref{lem:pre-poly-rpoly}, that
\qed
\end{proof}
\begin{Corollary}
If $\poly$ is given to us in a sum of monomials form, the expectation of $\poly$, i.e., $\ex{\poly}$ can be computed in $O(|\poly|)$, where $|\poly|$ denotes the total number of multiplication/addition operators.
\begin{Corollary}\label{cor:expct-sop}
If $\poly$ is given as a sum of monomials, the expectation of $\poly$, i.e., $\ex{\poly}$ can be computed in $O(|\poly|)$, where $|\poly|$ denotes the total number of multiplication/addition operators.
\end{Corollary}
\begin{proof}
\begin{proof}[Proof For Corollary ~\ref{cor:expct-sop}]
Note that \cref{lem:exp-poly-rpoly} shows that $\ex{\poly} = \rpoly(\prob_1,\ldots, \prob_\numvar)$. Therefore, if $\poly$ is already in sum of products form, one only needs to compute $\poly(\prob_1,\ldots, \prob_\numvar)$ ignoring exponent terms (note that such a polynomial is $\rpoly(\prob_1,\ldots, \prob_\numvar)$), which is indeed has $O(|\poly|)$ compututations.\qed
\end{proof}
\subsection{When $\poly$ is not in sum of monomials form}
We would like to argue in the general case that $\expct_{\wVec}\pbox{\poly(\wVec)}$ cannot be computed in linear time.
\AH{\Large\bf This section has been largely rearranged since Atri's last pass. There are also some changes to arguments that he hasn't made a pass over.}
We would like to argue that in the general case there is no computation of expectation over possible worlds in linear time.
To this end, consider the following graph $G(V, E)$, where $|E| = \numedge$, $|V| = \numvar$, and $i, j \in [\numvar]$. Consider the query $q_E(X_1,\ldots, X_\numvar) = \sum\limits_{(i, j) \in E} X_i \cdot X_j$.
To this end, consider the following graph $G(V, E)$, where $|E| = \numedge$, $|V| = \numvar$, and $i, j \in [\numvar]$.
Before proceeding, let us list all possible edge patterns in an arbitrary $G$ consisting of $\leq 3$ distinct edges.
@ -120,20 +118,20 @@ Before proceeding, let us list all possible edge patterns in an arbitrary $G$ co
\item 3-matching ($\threedis$)--this subgraph is composed of three disjoint edges.
\end{itemize}
Let $\numocc{G}{H}$ denote the number of occurrences of subgraph $H$ in graph $G$, where, for example, $\numocc{G}{\ed}$ means the number of single edges in $G$.
Let $\numocc{G}{H}$ denote the number of occurrences of pattern $H$ in graph $G$, where, for example, $\numocc{G}{\ed}$ means the number of single edges in $G$.
For any graph $G$, the following formulas compute $\numocc{G}{H}$ for their respective subgraphs in $O(\numedge)$ time.
For any graph $G$, the following formulas compute $\numocc{G}{H}$ for their respective patterns in $O(\numedge)$ time, with $d_i$ representing the degree of vertex $i$.
\begin{align}
&\numocc{G}{\ed} = \numedge, \label{eq:1e}\\
&\numocc{G}{\twopath} = \sum_{i \in V} \binom{d_i}{2} \text{where $d_u$ is the degree of vertex $u$}\label{eq:2p}\\ &\numocc{G}{\twodis} = \sum_{(i, j) \in E}\binom{\numedge - d_i - d_j + 1}{2}\label{eq:2m}\\
&\numocc{G}{\twopath} = \sum_{i \in V} \binom{d_i}{2} \label{eq:2p}\\ &\numocc{G}{\twodis} = \sum_{(i, j) \in E}\binom{\numedge - d_i - d_j + 1}{2}\label{eq:2m}\\
&\numocc{G}{\oneint} = \sum_{i \in V} \binom{d_i}{3}\label{eq:3s}\\
&\numocc{G}{\twopathdis} + \numocc{G}{\threedis} = \sum_{(i, j) \in E} \binom{\numedge - d_i - d_j + 1}{3}\label{eq:2pd-3d}
\end{align}
A quick argument to why \cref{eq:2pd-3d} is true. Note that for each edge connecting arbitrary vertices $u$ and $v$, we can get rid of all neighbors, and choose two distinct edges. The sum over all such edge combinations is precisely then $\numocc{G}{\twopathdis} + \numocc{G}{\threedis}$.
\AH{The formula doesn't seem to work. This is where I left off. 091620 pm}
A quick argument to why \cref{eq:2m} is true. Note that for each edge connecting arbitrary vertices $i$ and $j$, we can get rid of all neighbors, and choose one distinct edge. Thus, subtracting $(d_i - 1)$ from $m$ yields the number of edges disjoint to vertex $i$ and likewise for $j$. Choand $d_j$ from $m$ and adding $2$ gives the number of edges disjoint to edge $(i, j)$. We need to add two since $d_i + d_j$ double counts the shared edge, and since we keep that edge we need to give back both the edge and its double count., since this edge itself is one of the two in the two matching. The sum over all such edge combinations is precisely then $\numocc{G}{\twopathdis} + \numocc{G}{\threedis}$.
For the following discussion, set $\poly_{G}(\vct{X}) = \left(q_E(X_1,\ldots, X_\numvar)\right)^3$.
Now consider the query $q_E(X_1,\ldots, X_\numvar) = \sum\limits_{(i, j) \in E} X_i \cdot X_j$. For the following discussion, set $\poly_{G}(\vct{X}) = \left(q_E(X_1,\ldots, X_\numvar)\right)^3$.
\begin{Lemma}\label{lem:qE3-exp}
When we expand $\poly_{G}(\vct{X}) = \left(q_E(X_1,\ldots, X_\numvar)\right)^3$ out and assign all exponents $e \geq 1$ a value of $1$, we have the following,
@ -324,7 +322,7 @@ The function $f_k$ is a mapping from every $3$-edge shape in $\graph{k}$ to its
The inverse function $f_k^{-1}: \binom{E_1}{\leq 3}\mapsto \{\binom{E_k}{3}\}^{\leq \binom{E_k}{3}}$ takes an arbitrary $\eset{1}$ of at most $3$ edges and outputs the set of all subsets of $\binom{\eset{k}}{3}$ such that each subset $s^{(k)}$ of the output set is mapped to the input set $s^{(1)}$ by $f_k$, i.e. $f_k(s^{(k)}) = s^{(1)}$.
\end{Definition}
Note, importantly, that when we discuss $f_k^{-1}$, that, although counterintuitive, each \textit{edge} present in $s^{(1)}$ must have an edge in $s^{(k)}$ that `projects` down to it. \textit{Meaning}, if $|s^{(1)}| = 3$, then it must be the case that $s^{(k)}$ be a set $\{ (e_i, b), (e_j, b), e_\ell, b) \}$ where $i \neq j \neq \ell$.
Note, importantly, that when we discuss $f_k^{-1}$, that, although potentially counterintuitive, each \textit{edge} present in $s^{(1)}$ must have an edge in $s^{(k)}$ that `projects` down to it. \textit{Meaning}, if $|s^{(1)}| = 3$, then it must be the case that each $s^{(k)}$ be a set $\{ (e_i, b), (e_j, b), e_\ell, b) \}$ where $i \neq j \neq \ell$.
\begin{Lemma}\label{lem:fk-func}
$f_k$ is a function.
@ -348,6 +346,9 @@ Note that $f_k$ is properly defined. For any $S \in \binom{E_k}{3}$, $|f(S)| \l
\begin{proof}[Proof of Lemma \ref{lem:3m-G2}]
For each edge pattern $S$, we count the number of $3$-matchings in the $3$-edge subgraphs of $\graph{2}$ in $f_2^{-1}(S)$. We start with $S \in \binom{E_1}{3}$, where $S$ is composed of the edges $e_1, e_2, e_3$ and $f_2^{-1}(S)$ is the set of all $3$-edge subsets of the set $\{(e_1, 0), (e_1, 1), (e_2, 0), (e_2, 1), (e_3, 0), (e_3, 1)\}$.
%\begin{tikzpicture}
% \node[
%\end{tikzpicture}
\begin{itemize}
\item $3$-matching ($\threedis$)
\end{itemize}

View File

@ -32,7 +32,7 @@
An incomplete database $\idb$ is a set of deterministic databases $\db$ where each element is known as a possible world. %Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$are unchanging across each $\db_j$.
Denote the schema of $\db$ as $\sch(\db)$. For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is a probability distribution over $\wSet$.
Denote the schema of $\db$ as $\sch(\db)$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is a probability distribution over $\wSet$.
The possible worlds semantics gives a framework for how to think about running queries over $\idb$. Given a query $\query$, $\query$ is deterministically run over each $\db \in \idb$, and the output of $\query(\idb)$ is defined as the set of results (worlds) from running $\query$ over each $\db_i \in \idb$. We write this formally as,
\[\query(\idb) = \{\query(\db) | \db \in \idb\}\]
@ -40,7 +40,7 @@ The possible worlds semantics gives a framework for how to think about running q
\subsection{Modeling and Semantics}
Define $\vct{X}$ to be the variables $X_1,\dots,X_M$. Let the set of all tuples in domain $\mathbb{D}$ be $\tset$.
Define $\vct{X}$ to be the variables $X_1,\dots,X_M$. We emphasize that formal variables do not have a fixed domain type prior to assignment. Let the set of all tuples in domain of $\sch(\db)$ be $\tset$.
\subsubsection{K-relations}\label{subsubsec:k-rel}
@ -48,7 +48,6 @@ A K-relation~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are each
As noted in \cite{DBLP:conf/pods/GreenKT07}, the $\mathbb{N}[\vct{X}]$-semiring is a semiring over the set $\mathbb{N}[\vct{X}]$ of all polynomials, whose variables can then be substituted with $K$-values from other semirings, evaluating the operators with the operators of the substituted semiring, to produce varying semantics such as set, bag, and security annotations.
When used with $\mathbb B$-typed variables, an $\mathbb{N}[\vct{X}]$ relation is effectively a C-Table \cite{DBLP:conf/pods/GreenKT07}, since all first order formulas can be equivalently modeled by polynomials, where disjunction is equivalent to the addition operator and conjunction is equivalent to the multiplication operator.
Using $\mathbb B$-typed variables in an $\mathbb{N}[\vct{X}]$ relation would correspond to substituting values and operators from the $\{\mathbb{B}, \vee, \wedge, \bot, \top\}$ semiring.
Further define $\nxdb$ as an $\mathbb{N}[\vct{X}]$ database where each tuple $\tup \in \db$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later.
@ -58,23 +57,23 @@ It has been shown in previous work that commutative semirings precisely model tr
The evalution semantics notation $\llbracket \cdot \rrbracket = x$ simply mean that the result of evaluating expression $\cdot$ is given by following the semantics $x$. Given a query $\query$, operations in $\query$ are translated into the following polynomial expressions.
\begin{align*}
&\eval{\project_A(\rel)}(\tup) = &&\sum_{\tup': \project_A(\tup) = \tup} \eval{\rel}(\tup')\\
&\eval{(\rel_1 \union \rel_2)}(\tup) = &&\eval{\rel_1}(\tup) + \eval{\rel_2}(\tup)\\
&\eval{(\rel_1 \join \rel_2)}(\tup) = &&\eval{\rel_1}(\project_{\sch(\rel_1)}(\tup)) \times \eval{\rel_2}(\project_{\sch(\rel_2)}(\tup)) \\
&\eval{\select_\theta(\rel)}(\tup) = &&\begin{cases}
&\eval{\project_A(\rel)}(\tup)&& = &&\sum_{\tup': \project_A(\tup) = \tup} \eval{\rel}(\tup')\\
&\eval{(\rel_1 \union \rel_2)}(\tup)&& = &&\eval{\rel_1}(\tup) + \eval{\rel_2}(\tup)\\
&\eval{(\rel_1 \join \rel_2)}(\tup) && = &&\eval{\rel_1}(\project_{\sch(\rel_1)}(\tup)) \times \eval{\rel_2}(\project_{\sch(\rel_2)}(\tup)) \\
&\eval{\select_\theta(\rel)}(\tup) && = &&\begin{cases}
\eval{\rel}(\tup) &\text{if }\theta(\tup) = 1\\
0 &\text{otherwise}.
\end{cases}\\
&\eval{R}(\tup) = &&\rel(\tup)
&\eval{R}(\tup) && = &&\rel(\tup)
\end{align*}
The above semantics show us how to obtain the annotation on a tuple in the result of query $\query$ from the annotations on the tuples in the input of $\query$.
\subsection{Defining the Data}
For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$.
In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world. For example, consider the case of the Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which is independent of one another, and individually occur with a specific probability $\prob_\tup$. Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus $\numTup = M$, and the injective mapping for each $\vct{w} \in \{0, 1\}^M$ is trivial. In the Block Independent Disjoint data model (BIDB), because of the disjoint condition on tuples within the same block, a BIDB may not have exactly $2^M$ possible worlds. Excess $\vct{w}$'s are assigned a probability of $0$.
In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world. For example, consider the case of the Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which is independent of one another, and individually occur with a specific probability $\prob_\tup$. Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus $\numTup = M$, and the injective mapping for each $\vct{w} \in \{0, 1\}^M$ is trivial. However in the Block Independent Disjoint data model (BIDB), because of the disjoint condition on tuples within the same block, a BIDB may not have exactly $2^M$ possible worlds. Such $\vct{w}$'s, that do not exist, are assigned a probability of $0$.
Denote a random variable selecting a world according to distribution $P$ to be $\rw$. Provided that for any non-possible world $\vct{w} \in \{0, 1\}^M, \pd[\rw = \vct{w}] = 0$, then, a probability distribution over $\{0, 1\}^M$ is a distribution over $\Omega$, which we have already defined as $\pd$.
Denote a random variable selecting a world according to distribution $P$ to be $\rw$. Provided that for any non-possible world $\vct{w} \in \{0, 1\}^M, \pd[\rw = \vct{w}] = 0$, a probability distribution over $\{0, 1\}^M$ is a distribution over $\Omega$, which we have already defined as $\pd$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This could be a way to think of world binary vectors in the general case
@ -82,15 +81,15 @@ Denote a random variable selecting a world according to distribution $P$ to be $
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Assume a domain of $\{0, 1\}$ for each $X_i \in \vct{X}$. Since, from this point on, our discussion will involve one polynomial for an arbirtrary $\tup$, we thus abuse notation by using $\poly(\vct{X})$ to be the annotated polynomial $\llbracket\poly(\db)\rrbracket(\tup)$, where the injective mapping maps $\db$ to $\vct{X}$.
From this point on our discussion focuses on exactly one specific tuple $\tup$. Thus, we abuse notation by using $\poly(\vct{X})$ to be the annotated polynomial $\llbracket\poly(\db)\rrbracket(\tup)$, and for a domain of $\{0, 1\}$ for each $X_i \in \vct{X}$, the injective mapping maps $\db$ to $\vct{X}$.
One of the aggregates we desire to compute over the annotated polynomial is the expectation over possible worlds, denoted,
\AH{With our notation, I no longer think that $\vct{w} \sim \pd$ is necessary footer for $\expct$. We can probably just have $\expct\limits_{\vct{w}}$ instead. Do you agree?}
\AR{No. How would you state Lemma 4 without explicitly using $P$ in the definition of expectation?}
\[\expct_{\vct{\rw} \sim \pd}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \pd[\rw = \vct{w}].\]
For a $\ti$, the bit-string world value $\vct{w}$ can be used as indexing to determine which tuples are present in the $\vct{w}$ world, where the $i^{th}$ bit position represents whether a tuple $\tup_i$ appears in the unique world identified by the binary value of $\vct{w}$. Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$ such that those probabilities produce the possible worlds in D with a distribution $\pd$ over all worlds. Let $\pd^{(\vct{p})}$ represent the distribution induced by $\vct{p}$.
Above, $\poly(\vct{w})$ is used to mean the assignment of $\vct{w}$ to $\vct{X}$.
For a $\ti$, the bit-string world value $\vct{w}$ can be used as indexing to determine which tuples are present in the $\vct{w}$ world, where the $i^{th}$ bit position $(\wbit_i)$ represents whether a tuple $\tup_i$ appears in the unique world identified by the binary value of $\vct{w}$. Denote the vector $\vct{p}$ to be a vector whose elements are the individual probabilities $\prob_i$ of each tuple $\tup_i$ such that those probabilities produce the possible worlds in D with a distribution $\pd$ over all worlds. Let $\pd^{(\vct{p})}$ represent the distribution induced by $\vct{p}$.
\[\expct_{\rw\sim \pd^{(\vct{p})}}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]