General and sufficient analysis for BIDB approximation without reduction.

2020-10-02 11:46:31 -04:00 · 2020-10-02 11:46:31 -04:00 · e930f5ac36
parent c1f9d280be
commit e930f5ac36
1 changed files with 15 additions and 1 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -281,6 +281,8 @@ Thus we have $O(\treesize(\etree)) + O(\frac{\log{\frac{1}{\conf}}}{\error^2} \c

 \qed

+\AH{Why did we drop the $k \cdot \log{k} \cdot depth(\etree)$ factor in what follows below?}
+
 \begin{proof}[Proof of Theorem \ref{lem:approx-alg}]
 %\begin{Corollary}\label{cor:adj-err}
 Setting $\error = \error \cdot \frac{\rpoly(\prob_1,\ldots, \prob_\numvar)}{\abs{\etree}(1,\ldots, 1)}$ achieves $1 \pm \epsilon$ multiplicative error bounds, in $O\left(\treesize(\etree) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\etree}^2(1,\ldots, 1)}{\error^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)$.
@ -725,6 +727,7 @@ Thus, by lemmas ~\ref{lem:bi-red-ti-prob}, ~\ref{lem:bi-red-ti-q}, and ~\ref{lem
 \qed

 \subsubsection{General results for $\bi$}\label{subsubsec:bi-gen}
+\AH{One thing I don't see in the argument below is that as $\numvar \rightarrow \infty$, we have that $\prob_0 \rightarrow 0$.}
 The general results of approximating a $\bi$ using the reduction and ~\cref{alg:mon-sam} do not allow for the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ to be a constant.  Consider the following example.

 Let monomial $y_i = P(x_i) \cdot \prod_{j = 1}^{i - 1}(1 - P(x_j))$  Let $\poly(\vct{X}) = \sum_{i = 1}^{\numvar}y_i$.  Note that this query output can exist on a projection for which each tuple agrees on the projected values of the query in a $\bi$ consisting of one block and $\numvar$ tuples.
@ -748,6 +751,10 @@ Let us introduce a sufficient condition on $\bipdb$ for a linear time approximat
 For $\bipdb$ with fixed block size $\abs{b}$, the ratio $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ is a constant.
 \end{Lemma}

+\AH{Two observations.  
+\par
+1) I am not sure that the argument below is correct, as I think we would still get something exponential in the numerator $\abs{\etree}(1,\ldots, 1)$.
+\par2)  I \textit{think} a similar argument will hold however for the method of not using the reduction.}
 \begin{proof}[Prood of Lemma ~\ref{lem:bi-suf-cond}]
 For increasing $\numvar$ and fixed block size $\abs{b}$ in $\bipdb$ given query $\poly = \sum_{i = 1}^{\numvar}$ where $y_i = x_i \cdot \prod_{j = 1}^{i - 1} (1 - x_j)$, a query whose output is the maximum possible output, it has to be the case as seen in ~\cref{subsubsec:bi-gen} that for each block $b$, $\rpoly(\prob_{b, 1},\ldots, \prob_{b, \abs{b}}) = P(a_{b, 1}) + P(a_{b, 2}) + \cdots + P(a_{b, \abs{b}})$ for $a_i$ in $\bipdb$.  As long as there exists no block in $\bipdb$ such that the sum of alternatives is $0$ (which by definition of $\bi$ should be the case), we can bound the $\rpoly(p_1,\ldots, \prob_\numvar) \geq \frac{\prob_0 \cdot \numvar}{\abs{\block}}$ for $\prob_0 > 0$, and then we have that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_1,\ldots, \prob_\numvar)}$ is indeed a constant.
 \end{proof}
@ -756,6 +763,8 @@ For increasing $\numvar$ and fixed block size $\abs{b}$ in $\bipdb$ given query

 Given a $\bipdb$ satisfying ~\cref{lem:bi-suf-cond}, it is the case by ~\cref{lem:approx-alg} that ~\cref{alg:mon-sam} runs in linear time.

+\AH{\Large \bf{092520 -- 100220 New material.}}
+
 \section{Algorithm ~\ref{alg:mon-sam} for $\bi$}

 We may be able to get a better run time by developing a separate approximation algorithm for the case of $\bi$.  Instead performing the reduction from $\bi \mapsto \poly(\ti)$, we decide to work with the original variable annotations given to each tuple alternative in $\bipdb$.  For clarity, let us assume the notation of $\bivar$ for the annotation of a tuple alternative.  The algorithm yields $0$ for any monomial sampled that cannot exist in $\bipdb$ due to the disjoint property characterizing $\bi$.  The semantics for $\rpoly$ change in this case.  $\rpoly$ not only performs the same modding function, but also sets all monomial terms to $0$ if they contain variables which appear within the same block.
@ -1029,5 +1038,10 @@ Consider the following $\bi$ table $\rel$ consisting of one block, with the foll
 Note that all of ~\cref{subfig:bi-q1-output}, ~\cref{subfig:bi-q2-output}, and ~\cref{subfig:bi-q3-output} each have a set of tuples, where each annotation has cross terms from its block, and by ~\cref{def:bi-alg-rpoly} $\rpoly$ will eliminate all tuples output in the respective queries.

 \subsubsection{When $\rpoly > 0$}
-\AH{General Case and Sufficient Condition for $\bi$ is in the previous section.}
+\par\AH{General Case and Sufficient Condition for $\bi$ and $\rpoly_{\bi}$ approx alg needs to be written.}
+\paragraph{General Case}
+Consider the query $\poly = \sum_{i = 1}^{\numvar}x_i$, analogous to a projection where all tuples match on the projected set of attributes, meaning $\tup_i[A] = \tup_j[A]$ for $i, j \in [\numvar]$ such that $i \neq j$.  When $\numvar$ grows unboundedly, $\abs{\etree}(1,\ldots, 1) = \numvar$.  We assume that the sum of the probabilities of all $\numvar$ tuples in the block remain a constant as $\numvar$ grows.  Thus, we have that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\vct{\prob})} = \frac{n}{c}$ for some constant $c$, and this implies $O(\numvar)$ growth.
+% while $\rpoly(\vct{\prob}) \leq 1$, which implies that the ratio is linear, i.e., $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\vct{p})} = \frac{\numvar}{\numvar \cdot \prob_0} = \frac{1}{\prob_0}$ for $\prob_0 = min(\vct{\prob})$.  However, note that for $\numvar \rightarrow \infty$ it is the case that $\prob_0 \rightarrow 0$, and as $\numvar$ grows, so does $\frac{1}{\prob_0}$.  Intuitively, consider when $p_0 = \frac{1}{\numvar}$.  Then we know that the bound is $\frac{\numvar}{1}$ which is $O(\numvar)$.

+\paragraph{Sufficient Condition for $\bi$ to achieve linear approximation}
+Consider the same query $\poly = \sum_{i = 1}^{\numvar}$, but this time conditioned on a fixed block size which we denote $\abs{\block}$.  Then it is the case that $\abs{\etree}(1,\ldots, 1) = \numvar$, but if we assume that all blocks have a sum of probabilities equal to $1$, $\rpoly(\vct{\prob}) = \frac{\numvar}{\abs{b}}$, and this means that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\vct{\prob})} = \frac{\numvar}{\frac{\numvar}{\abs{\block}}} = \abs{\block}$.  For the general case when all blocks do not have the property that the sum of the probabilities of the alternatives equal $1$, we can lower bound the sum of probabilities as $\frac{\numvar}{\abs{\block}} \cdot \prob_0$ for $\prob_0 = min(\vct{\prob})$.  Note that in $\numvar \cdot \frac{\prob_0}{\abs{\block}}$, $\frac{\prob_0}{\block}$ is indeed a constant, and this gives an overall ratio of $O(1)$ as $\numvar$ increases.