Finished BI-->Q(TI) reduction correctness proof, the general BI blow up, and sufficient condition for linear time approximation algorithm over BI reduction.

2020-09-25 10:17:14 -04:00 · 2020-09-25 10:17:14 -04:00 · eaba0b00af
parent 3932ca3040
commit eaba0b00af
1 changed files with 36 additions and 4 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -607,7 +607,7 @@ For any $\poly$ then, it is true that all coefficients in $\abs{\etree}(1,\ldots

 \subsection{$\rpoly$ over $\bi$}
 \AH{A general sufficient condition is the $\bi$ having fixed block size (thus implying increasing number of blocks for growing $\numvar$).  For increasing $\numvar$, the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ can be proven to be a constant since, as $\numvar$ increases, it has to be the case that new blocks are added, and this results in a constant number of terms cancelled out by $\rpoly$, with the rest surviving, which gives us a constant $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$.
-\par In the general case, with fixed number of blocks and growing $\numvar$, all additional terms will be cancelled out by $\rpoly$ while for $\abs{\etree}(1,\ldots, 1)$ it is the case that it will grow with $\numvar$, yielding a ratio $\frac{O(\numvar)}{O(1)}$.}
+\par In the general case, with fixed number of blocks and growing $\numvar$, all additional terms will be cancelled out by $\rpoly$ while for $\abs{\etree}(1,\ldots, 1)$ it is the case that it will grow exponentially with $\numvar$, yielding a ratio $\frac{O(2^\numvar)}{O(1)}$ and (as will be seen) greater.}

 \subsubsection{Known Reduction Result $\bi \mapsto \ti$}

@ -686,14 +686,35 @@ For any possible world in $2^b$, notice that the WHERE clause selects the tuple

 \begin{proof}[Proof of Theorem ~\ref{theorem:bi-red-ti}]

-For multiple blocks in $\bipdb$, note that the above reduction to $\poly(\tipdb)$ with multiple 'blocks' will behave the same as $\bipdb$ since the property of independence for $\ti$ ensures that all tuples in the $\ti$ will have the same marginal probability across all possible worlds as their tuple probability, regardless of how many tuples and, thus, worlds the $\tipdb$ has.  Note that this propety is unchanging no matter what probabilities other tuples in $\tipdb$ are assigned.
+For multiple blocks in $\bipdb$, note that the above reduction to $\poly(\tipdb)$ with multiple 'blocks' will behave the same as $\bipdb$ since the property of independence for $\ti$ ensures that all tuples in the $\ti$ will have the same marginal probability across all possible worlds as their tuple probability, regardless of how many tuples and, thus, worlds the $\tipdb$ has.  Note that this propety is unchanging no matter what probabilities additional tuples in $\tipdb$ are assigned.

-Thus, from the above combined with lemmas ~\ref{lem:bi-red-ti-prob} and ~\ref{lem:bi-red-ti-q} the proof follows.
+To see this consider the following.
+\begin{Lemma}\label{lem:bi-red-ti-ind}
+For any set of independent variables $S$ with size $\abs{S}$, when adding another distinct independent variable $y$ to $S$ with probability $\prob_y$, it is the case that the probability of each variable $x_i$ in $S$ remains unchanged.
+\AH{This may be a well known property that I might not even have the need to prove, but since I am not certain, here goes.}
+\end{Lemma}
+
+\begin{proof}[Proof of Lemma ~\ref{lem:bi-red-ti-ind}]
+The proof is by induction.  For the base case, consider a set of one element $S = \{x\}$ with probability $\prob_x$.  The set of possible outcomes includes $2^S = \{\emptyset, \{x\}\}$, with $P(\emptyset) = 1 - \prob_x$ and $P(x) = p_x$.  Now, consider $S' = \{y\}$ with $P(y) = \prob_y$ and $S \cup S' = \{x, y\}$ with the set of possible outcomes now $2^{S \cup S'} = \{\emptyset, \{x\}, \{y\}, \{x, y\}\}$.  The probabilities for each world then are $P(\emptyset) = (1 - \prob_x)\cdot(1 - \prob_y), P(x) = \prob_x \cdot (1 - \prob_y), P(y) = (1 - \prob_x)\cdot \prob_y$, and $P(xy) = \prob_x \cdot \prob_y$.  For the worlds where $x$ appears we have 
+
+\[P(x) + P(xy) = \prob_x \cdot (1 - \prob_y) + \prob_x \cdot \prob_y = \prob_x \cdot \left((1 - \prob_y) + \prob_y\right) = \prob_x \cdot 1 = \prob_x.\]  
+Thus, the base case is satisfied.
+
+For the hypothesis, assume that $\abs{S} = k$ for some $k \geq 1$, and for $S'$ such that $\abs{S'} = 1$ where its element is distinct from all elements in $S$, the probability of each independent variable in $S$ is the same in $S \cup S'$.
+
+For the inductive step, let us prove that for $\abs{S_{k + 1}} = k + 1$ elements, adding another element will not change the probabilities of the independent variables in $S$.  By the hypothesis, that $S_k \cup S_{k + 1}$, all probabilities in $S_k$ remained untainted after the union.  Now consider a set $S' = \{z\}$ and the union $S_{k + 1} \cup S'$.  Since all variables are distinct and independent, it is the case that the set of possible outcomes of $S_{k + 1} \cup S' = 2^{S_{k + 1} \cup S'}$ with $\abs{2^{S_{k + 1} \cup S'}} = 2^{\abs{S_{k + 1}} + \abs{S'}}$ since $\abs{S_{k + 1}} + \abs{S'} = \abs{S_{k + 1} \cup S'}$.  Then, since $2^{\abs{S_{k + 1}} + \abs{S'}} = 2^{\abs{S_{k + 1}}} \cdot 2^{\abs{S'}}$, and $2^{S'} = \{\emptyset, \{x\}\}$, it is the case that all elements in the original set of out comes will appear \textit{exactly one} time without $z$ and \textit{exactly one }time with $z$, such that for element $x \in 2^{S_{k + 1}}$ with probability $\prob_x$ we have $P(x\text{ }OR\text{ }xz) = \prob_x \cdot (1 - \prob_z) + \prob_x \cdot \prob_z = \prob_x\cdot \left((1 - z) + z\right) = \prob_x \cdot 1 = \prob_x$, and the probabilities remain unchanged, and, thus, the marginal probabilities for each variable in $S_{k + 1}$ across all possible outcomes remain unchanged.
 \end{proof}

 \qed

-\subsubsection{General results for $\bi$}
+The repeated application of ~\cref{lem:bi-red-ti-ind} to any 'block' of independent variables in $\tipdb$ provides the same result as joining two sets of distinct elements of size $\abs{S_1}, \abs{S_2} > 1$.
+
+Thus, by lemmas ~\ref{lem:bi-red-ti-prob}, ~\ref{lem:bi-red-ti-q}, and ~\ref{lem:bi-red-ti-ind}, the proof follows.
+\end{proof}
+
+\qed
+
+\subsubsection{General results for $\bi$}\label{subsubsec:bi-gen}
 The general results of approximating a $\bi$ using the reduction and ~\cref{alg:mon-sam} do not allow for the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ to be a constant.  Consider the following example.

 Let monomial $y_i = P(x_i) \cdot \prod_{j = 1}^{i - 1}(1 - P(x_j))$  Let $\poly(\vct{X}) = \sum_{i = 1}^{\numvar}y_i$.  Note that this query output can exist on a projection for which each tuple agrees on the projected values of the query in a $\bi$ consisting of one block and $\numvar$ tuples.
@ -709,3 +730,14 @@ So, then $\abs{\etree}(1,\ldots, 1) = 2^{\numvar} - 1$.

 On the other hand, considering $\rpoly(\prob_1,\ldots, \prob_\numvar)$, since we are simply summing up the probabilities of one block of disjoint tuples (recall that $P(x_i) = \frac{P(a_i)}{1\cdot\prod_{j = 1}^{i - 1}(1 - P(x_j))}$ in the reduction for $a_i$ the original $\bi$ probability), it is the case that $\rpoly(\prob_1,\ldots, \prob_\numvar) \leq 1$, and the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ in this case is exponential $O(2^\numvar)$.  Further note that setting $\poly(\vct{X}) = \sum_{i = 1}^{\numvar} y_i^k$ will yield an $O(2^{\numvar \cdot k})$ bound.
 \subsubsection{Sufficient Condition for $\bi$ for linear time Approximation Algorithm}
+
+Let us introduce a sufficient condition on $\bipdb$ for a linear time approximation algorithm.
+\begin{Lemma}\label{lem:bi-suf-cond}
+For $\bipdb$ with fixed block size $\abs{b}$, the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ is a constant.
+\end{Lemma}
+
+\begin{proof}[Prood of Lemma ~\ref{lem:bi-suf-cond}]
+For increasing $\numvar$ and fixed block size $\abs{b}$ in $\bipdb$ given query $\poly = \sum_{i = 1}^{\numvar}$ where $y_i = x_i \cdot \prod_{j = 1}^{i - 1} (1 - x_j)$, a query whose output is the maximum possible output, it has to be the case as seen in ~\cref{subsubsec:bi-gen} that for each block $b$, $\rpoly(\prob_{b, 1},\ldots, \prob_{b, \abs{b}}) = P(a_{b, 1}) + P(a_{b, 2}) + \cdots + P(a_{b, \abs{b}})$ for $a_i$ in $\bipdb$.  As long as there exists no block in $\bipdb$ such that the sum of alternatives is $0$ (which by definition of $\bi$ should be the case), we can bound the $\rpoly(p_{b, 1},\ldots, \prob_{b, \abs{b}}) \geq \prob_0 > 0$, and then $\rpoly(\prob_1,\ldots, \prob_\numvar) \geq \frac{1}{\abs{b}} \cdot \numvar$, and we have that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ is indeed a constant.
+\end{proof}
+
+\qed