More work on 'safe' queries over BIDB

2020-10-01 14:38:40 -04:00 · 2020-10-01 14:38:40 -04:00 · 4749bd3652
parent f53e0bd69f
commit 4749bd3652
2 changed files with 170 additions and 18 deletions
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -760,10 +760,28 @@ Given a $\bipdb$ satisfying ~\cref{lem:bi-suf-cond}, it is the case by ~\cref{le

 We may be able to get a better run time by developing a separate approximation algorithm for the case of $\bi$.  Instead performing the reduction from $\bi \mapsto \poly(\ti)$, we decide to work with the original variable annotations given to each tuple alternative in $\bipdb$.  For clarity, let us assume the notation of $\bivar$ for the annotation of a tuple alternative.  The algorithm yields $0$ for any monomial sampled that cannot exist in $\bipdb$ due to the disjoint property characterizing $\bi$.  The semantics for $\rpoly$ change in this case.  $\rpoly$ not only performs the same modding function, but also sets all monomial terms to $0$ if they contain variables which appear within the same block.

-\begin{Definition}[$\rpoly$ for $\bi$ Data Model]\label{bialg:rpoly}
-$\rpoly(\vct{X})$ over the $\bi$ data model is redefined to include the following mod operation in addition to definition ~\ref{def:qtilde}.  For every $j \neq i$, we add the operation $\mod X_{\block, i}\cdot X_{\block, j}$.  For set of blocks $\vct{b}$ and the size of block $\block$ as $\abs{\block}$,
+Before redefining $\rpoly$ in terms of the $\bi$ model, we need to define the notion of performing a mod operation with a set of polynomials.

-\[\rpoly(\vct{X}) = \poly(\vct{X}) \mod \{X_{\block, i}^2 - X_{\block, i} \st \block \in \vct{b}, i \in [\abs{\block}]\} \cup_{\block \in \vct{b}} \{X_{\block, i}X_{\block, j} \st i, j \in [\abs{\block}], i \neq j\}
+\begin{Definition}[Mod with a set of polynomials]\label{def:mod-set-poly}
+To mod a polynomial $\poly$ with a set $\vct{Z} = \{Z_1,\ldots Z_x\}$ of polynomials, the mod operation is performed successively on the $\poly$ modding out each element of the set $\vct{Z}$ from $\poly$.
+\end{Definition}
+
+\begin{Example}\label{example:mod-set-poly}
+To illustrate for $\poly = X_1^2 + X_1X_2^3$ and the set $\vct{Z} = \{X_1^2 - X_1, X_2^2 - X_2, X_1X_2\}$ we get
+
+\begin{align*}
+&X_1^2 + X_1X_2^3 \mod X_1^2 - X_1 \mod X_2^2 - X_2 \mod X_1X_2\\
+=&X_1 + X_1X_2^3 \mod X_2^2 - X_2 \mod X_1X_2\\
+=&X_1 + X_1X_2 \mod X_1X_2\\
+=&X_1
+\end{align*}
+
+\end{Example}
+
+\begin{Definition}[$\rpoly$ for $\bi$ Data Model]\label{def:bi-alg-rpoly}
+$\rpoly(\vct{X})$ over the $\bi$ data model is redefined to include the following mod operation in addition to definition ~\ref{def:qtilde}.  For every $j \neq i$, we add the operation $\mod X_{\block, i}\cdot X_{\block, j}$.  For set of blocks $\mathcal{B}$ and the size of block $\block$ as $\abs{\block}$,
+
+\[\rpoly(\vct{X}) = \poly(\vct{X}) \mod \{X_{\block, i}^2 - X_{\block, i} \st \block \in \mathcal{B}, i \in [\abs{\block}]\} \cup_{\block \in \mathcal{B}} \{X_{\block, i}X_{\block, j} \st i, j \in [\abs{\block}], i \neq j\}
 % \mod X_{\block_1, 1}^2 - X_{\block_1, 1} \cdots \mod X_{\block_k, \abs{\block_k}}^2 - X_{\block_k, \abs{\block_k}} \mod X_{b_1, 1} \cdot X_{b_1, 2}\cdots \mod X_{\block_1, \abs{\block_1} -1} \cdot X_{\block, \abs{\block_1}}\cdots \mod X_{\block_k, 1} \cdot X_{\block_k, 2} \cdots \mod X_{\block_k, \abs{\block_k} - 1}\cdot X_{\block_K, \abs{\block_k}}.
 \]
 \end{Definition}
@ -812,23 +830,156 @@ $\rpoly(\vct{X})$ over the $\bi$ data model is redefined to include the followin
 	\end{algorithmic}
 \end{algorithm}

-We want to analyze the class of queries that are necessary to guarantee that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_{1},\ldots, \prob_{\numvar})}$ is $O(1)$.
+\subsection{Safe Query Class for $\bi$}
+We want to analyze what is the class of queries and data restrictions that are necessary to guarantee that $\frac{\abs{\etree}(1,\ldots, 1)}{\rpoly(\prob_{1},\ldots, \prob_{\numvar})}$ is $O(1)$.

-The condition that causes $\rpoly(\prob_1,\ldots, \prob_\numvar)$ to be $0$ is when all the output tuples in each block cancel each other out.  Such occurs when the annotations of each output tuple break the required $\bi$ property that tuples in the same block must be disjoint.
+First, consider the case when $\rpoly$ cancels out all terms in $\poly$, where $\poly \neq \emptyset$.  For $\rpoly$ to cancel out a tuple $\tup$, by ~\cref{def:bi-alg-rpoly} it must be the case that output tuple $\tup$ is dependent on two different tuples appearing in the same block.  For this condition to occur, it must be that the query $\poly$ contains a self join operation on a table $\rel$, from which $\tup$ has been derived.

-The observation is then the following.  In order for such a condition to occur, we must have a query that is a self-join such that the join is on two different sets of atoms for each block.  This condition can occur when inner query operations with different constraints on input table $\rel$ produce two non-intersecting sets of tuples and then perform a self join on them, such that the join condition \textit{only} holds for tuples that are members of the same block.
+Certain conditions on both the data and query must exist for all tuples $\tup$ to be cancelled out by $\rpoly$ as described above.

-There are two operators that can produce the aforementioned selectivity.  First, consider $\sigma$, where two different selection conditions $\theta$ over $\rel$ can output sets $S_{\sigma_{\theta_1}}$ and $S_{\sigma_{\theta_2}}$ where $S_{\sigma_{\theta_1}} \cap S_{\sigma_{\theta_2}} = \emptyset$.  A join over these two output can produce an ouput $\poly$ where all annotations will be disjoint and $\rpoly$ will effectively cancel them all out.  Second, consider the projection operator $\pi$, such that projections over $\rel$ which project on different attributes can output two non-intersecting sets of tuples, which when joined, again, provided that the join condition holds only for tuples appearing in the same block, output tuples will break the disjoint requirement and $\rpoly$ will cancel them out.
+For $\rpoly$ to be $0$, the data of a $\bi$ must satisfy certain conditions.

-\begin{Example}
-Consider the following table $\rel$ with the following queries $\poly_1 = \sigma_{A = 1}(\rel)\bowtie_{A = B} \sigma_{A = 2}(\rel)$ and $\poly_2 = \pi_{A}(\rel) \bowtie_{A = B} \pi_{B}(\rel)$.  The output of both queries results in $\rpoly = 0$.
+\begin{Definition}[Data Restrictions]\label{def:bi-qtilde-data}
+Consider $\bi$ table $\rel$.  For $\rpoly$ to potentially cancel all its terms, $\rel$ must be such that given a self join, the join constraints remain unsatisfied for all tuple combinations $x_{\block_i, \ell} \times x_{\block_j, \ell'}$ for $i \neq j$, $\ell \in [\abs{\block_i}], \ell' \in [\abs{\block_j}]$, i.e. combinations across different blocks.  Note that this is trivially satisfied with a $\rel$ composed of just one block.  Further, it must be the case that the self join constraint is only satisfied in one or more crossterm combinations $x_{\block, i} \times x_{\block_j}$ for $i \neq j$, i.e., within the same block of the input data.
+\end{Definition}
+
+To be precise, only equijoins are considered in the following definition.  Before preceding, note that a natural self join will never result in $\rpoly$ cancelling all terms, since it is the case that each tuple will necessarily join with itself, and $\rpoly$ will not mod out this case.  Also, although we are using the term self join, we consider cases such that query operations over $\rel$ might be performed on each join input prior to the join operation.  While technically the inputs may not be the same set of tuples, this case must be considered, since all the tuples originate from the table $\rel$.  To this end, let $\poly_1(\rel) = S_1$ and $\poly_2(\rel) = S_2$ be the input tables to the join operation.
+\begin{Definition}[Class of Cancelling Queries]\label{def:bi-qtilde-query-class}
+When ~\cref{def:bi-qtilde-data} is satisfied, it must be that $\poly$ contains a self join that satisfies one of the following sets of characteristics.  \begin{enumerate}
+	\item When the join condition $\theta$ involves equality between matching attributes, it must be that the attributes of the join conditon $\attr{\theta}$ are a strict subset of $\attr{\rel}$.  Then, to satisfy ~\cref{def:bi-qtilde-data} it must be that the join input consists of non-intersecting strict subsets of $\rel$, meaning $S_1 \cap S_2 = \emptyset$ and $S_1, S_2 \neq \emptyset$.  $\poly_1$ in ~\cref{ex:bi-tildeq-0}  illustrates this condition.
+	\item If $\theta$ involves an equality on non-matching attributes, there exist two cases.  
+	\begin{enumerate}
+		\item The first case consists of when the join inputs intersect, i.e., $S_1 \cap S_2 \neq \emptyset$ .  To satisfy ~\cref{def:bi-qtilde-data} it must be the case that no tuple can exist with agreeing values across all attributes in $\attr{\theta}$.  $\poly_3$ of ~\cref{ex:bi-tildeq-0} demonstrates this condition.
+		\item The second case consists of when $S_1 \cap S_2 = \emptyset$ and $S_1, S_2 \neq \emptyset$ in the join input, and this case does not contradict the requirements of ~\cref{def:bi-qtilde-query-class}.  This case is illustrated in $\poly_2$ of ~\cref{ex:bi-tildeq-0}.
+	\end{enumerate}
+\end{enumerate}% , cause $\rpoly$ to be $0$ must have the following characteristics.  First, there must be a self join.  Second, prior to the self join, there must be operations that produce non-intersecting sets of tuples for each block in $\bi$ as input to the self join operation.
+\end{Definition}
+
+Note then that the class of queries described in ~\cref{def:bi-qtilde-query-class} belong to the set of queries containing some form of selction over self cross product.
+%\begin{proof}[Proof of Lemma ~\ref{lem:bi-qtilde-data}]
+%\end{proof}
+%\begin{proof}[Proof of Lemma ~\ref{lem:bi-qtilde-query-class}]
+%\end{proof}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%
+
+%The condition that causes $\rpoly(\prob_1,\ldots, \prob_\numvar)$ to be $0$ is when all the output tuples in each block cancel each other out.  Such occurs when the annotations of each output tuple break the required $\bi$ property that tuples in the same block must be disjoint.  This can only occur for the case when a self-join outputs tuples each of which have been joined to another tuple from its block other than itself.
+%
+%The observation is then the following.  In order for such a condition to occur, we must have a query that is a self-join such that the join is on two different sets of atoms for each block.  This condition can occur when inner query operations with different constraints on input table $\rel$ produce two non-intersecting sets of tuples and then performs a self join on them, such that the join condition \textit{only} holds for tuples that are members of the same block.
+%
+%There are two operators that can produce the aforementioned selectivity.  First, consider $\sigma$, where two different selection conditions $\theta_1$ and $\theta_2$ over $\rel$ can output sets $S_{\sigma_{\theta_1}}$ and $S_{\sigma_{\theta_2}}$ where $S_{\sigma_{\theta_1}} \cap S_{\sigma_{\theta_2}} = \emptyset$.  A join over these two outputs can produce an ouput $\poly$ where all annotations will be disjoint and $\rpoly$ will effectively cancel them all out.  Second, consider the projection operator $\pi$, such that projections over $\rel$ which project on different attributes can output two non-intersecting sets of tuples, which when joined, again, provided that the join condition holds only for tuples appearing in the same block, can output tuples all of which will break the disjoint requirement and $\rpoly$ will cancel them out.
+
+\begin{Example}\label{ex:bi-tildeq-0}
+Consider the following $\bi$ table $\rel$ consisting of one block, with the following queries $\poly_1 = \sigma_{A = 1}(\rel)\bowtie_{B = B'} \sigma_{A = 2}(\rel)$, $\poly_2 = \sigma_{A = 1}(\rel)\bowtie_{A = B'} \sigma_{A = 2}(\rel)$, and $\poly_3 = \rel \bowtie_{A = B} \rel$.  While the output $\poly_1 \neq \emptyset$ and $\poly_2 \neq \emptyset$, both queries have that $\rpoly_i = 0$.  Since $\rel$ consists of only one block, we will use single indexing over the annotations.
+\end{Example}
+
+
+\begin{figure}[ht]
+	\begin{tabular}{ c | c c c }
+		\rel & A & B & $\phi$\\
+		\hline
+		& 1 & 2 & $x_1$\\
+		& 2 & 1 & $x_2$\\
+		& 1 & 3 & $x_3$\\
+		& 3 & 1 & $x_4$\\
+	\end{tabular}
+	\caption{Example~\ref{ex:bi-tildeq-0} Table $\rel$}
+	\label{fig:bi-ex-table}
+\end{figure}
+%%%%%%%%%%Query 1 and 2
+\begin{figure}[ht]
+	\begin{subfigure}{0.2\textwidth}
+		\centering
+		\begin{tabular}{ c | c c c }
+			$\sigma_{\theta_{A = 1}}(\rel )$& A & B & $\phi$\\
+			\hline
+			& 1 & 2 & $x_1$\\
+			& 1 & 3 & $x_3$\\
+		\end{tabular}
+		\caption{$\poly_1, \poly_2$ First Selection}
+		\label{subfig:bi-q1-sigma1}
+	\end{subfigure}
+	\begin{subfigure}{0.2\textwidth}
+		\centering
+		\begin{tabular}{ c | c c c}
+			$\sigma_{\theta_{A = 2}}(\rel)$ & A & B' & $\phi$\\
+			\hline
+			& 2 & 1 & $x_2$\\
+		\end{tabular}
+		\caption{$\poly_1, \poly_2$ Second Selection}
+		\label{subfig:bi-q1-sigma2}
+	\end{subfigure}
+	\begin{subfigure}{0.25\textwidth}
+		\centering
+		\begin{tabular}{ c | c c c c c}
+			$\poly_1(\rel)$ & $A_R$ & $B_R$ & $A_{\rel'}$ & $B_{\rel'}$ & $\phi$\\
+			\hline
+			& 1 & 2 & 2 & 1 & $x_1x_2$\\
+		\end{tabular}
+		\caption{$\poly_1(\rel)$ Output}
+		\label{subfig:bi-q1-output}
+	\end{subfigure}
+	\begin{subfigure}{0.4\textwidth}
+		\centering
+		\begin{tabular}{ c | c c c c c}
+			$\poly_2(\rel)$ & $A_R$ & $B_R$ & $A_{\rel'}$ & $B_{\rel'}$ & $\phi$\\
+			\hline
+			& 1 & 2 & 2 & 1 & $x_1x_2$\\
+			& 1 & 3 & 2 & 1 & $x_2x_3$\\
+		\end{tabular}
+		\caption{$\poly_2(\rel)$ Output}
+		\label{subfig:bi-q2-output}
+	\end{subfigure}
+	\caption{$\poly_1, \poly_2(\rel)$}
+	\label{fig:bi-q1-q2}
+\end{figure}
+%%%%%%%%%%%Query 3
+\begin{figure}[ht]
+%	\begin{subfigure}{0.2\textwidth}
+%		\centering
+%		\begin{tabular}{ c | c  c }
+%			$\pi_{A}(\rel)$ & A & $\phi$\\
+%			\hline
+%			& 1 & $x_1$\\
+%			& 2 & $x_2$\\
+%			& 1 & $x_3$\\
+%			& 3 & $x_4$\\
+%		\end{tabular}
+%		\caption{$\poly_3$ First Projection}
+%		\label{subfig:bi-q3-pi1}
+%	\end{subfigure}
+%	\begin{subfigure}{0.2\textwidth}
+%		\centering
+%		\begin{tabular}{ c | c  c }
+%			$\pi_{B}(\rel)$ & B & $\phi$\\
+%			\hline
+%			& 2 & $x_1$\\
+%			& 1 & $x_2$\\
+%			& 3 & $x_3$\\
+%			& 1 & $x_4$\\
+%		\end{tabular}
+%		\caption{$\poly_3$ Second Projection}
+%		\label{subfig:bi-q3-pi2}
+%	\end{subfigure}
+	\begin{subfigure}{0.2\textwidth}
+		\centering
+		\begin{tabular}{ c | c c c c c }
+			$\poly_3(\rel)$ & A & B & $A_{\rel'}$ & $B_{\rel'}$ & $\phi$\\
+			\hline
+			& 1 & 2& 2 & 1 & $x_1x_2$\\
+			& 1 & 2 & 3 & 1 & $x_1x_2$\\
+			& 2 & 1 & 1 & 2 & $x_1x_2$\\
+			& 1 & 3 & 2 & 1 & $x_2x_3$\\
+			& 1 & 3 & 3 & 1 & $x_3x_4$\\
+			& 3 & 1 & 1 & 3 & $x_3x_4$\\
+		\end{tabular}
+		\caption{$\poly_3(\rel)$ Output}
+		\label{subfig:bi-q3-output}
+	\end{subfigure}
+	\caption{$\poly_3(\rel)$}
+	\label{fig:bi-q3}
+\end{figure}
+
+Note that all of ~\cref{subfig:bi-q1-output}, ~\cref{subfig:bi-q2-output}, and ~\cref{subfig:bi-q3-output} each have a set of tuples, where each annotation has cross terms from its block, and by ~\cref{def:bi-alg-rpoly} $\rpoly$ will eliminate all tuples output in the respective queries.

-\begin{tabular}{ c | c c }
-\rel & A & B\\
-\hline
-& 1 & 2\\
-& 2 & 1\\
-& 1 & 3\\
-& 3 & 1\\
-\end{tabular}
-\end{Example}
--- a/macros.tex
+++ b/macros.tex
@ -24,6 +24,7 @@
 \newcommand{\project}{\pi}
 \newcommand{\union}{\cup}
 \newcommand{\sch}{sch}
+\newcommand{\attr}[1]{attr\left(#1\right)}
 \newcommand{\rw}{\textbf{W}}%\rw for random world
 \newcommand{\graph}[1]{G^{(#1)}}
 \newcommand{\eset}[1]{S^{(#1)}} %edge set for arbitrary subgraph