paper-BagRelationalPDBsAreHard/prob_def.tex

% -*- root: main.tex -*-
\section{Problem Definition}
\label{sec:prob-def}
Our TESTING ! AND 2ork involves overcoing the exponential computation time that it takes to compute world existence for a given tuple $\tup$ by estimating in polynomial time the same quantity within an ($\epsilon$, $\delta$) range.  We employ the technique of sketching to obtain these results.

The setting on which our work applies is as follows.  First, we are given a database $\db$.  We limit ourselves to positive queries.  A positive query $\query$ is a query composed from the following set of operators: selection ($\selection$), projection ($\projection$), join/cross-product ($\join$), and union ($\union$), abbreviated as SPJU.  Given database $\db$, a query Q performs its operations upon all the rows belonging to the tables it involves.

Since our problem space involves estimating the $\kDom$ value for a given world of an incomplete/probabilistic database, we are particularly interested in the projection, union, and join operators.  Because $\selection$ only removes vectors from a query output, rather than combining or merging tuples together, as its counterparts do, $\selection$ is not necessary to consider.

We could picture that each tuple has its own annotation, communicating the $\kDom$ value for each of the possible worlds.  This annotation could be a vector $\genV$ of size $\numWorlds$, if we assume, for example, a Tuple Independent Database, where $N$ is the number of worlds.  Each index $i$ of $\genV$ holds the $\kDom$ value for the $i^{th}$ world.

In the above setting, consider a query $\query$ = $\projection(R\join S\join T\join U)$.  The output of the 4-way join will be tuples who match all the selection conditions for each $\join$ operation.  To calculate world membership, the vectors for each tuple subpart are multiplied.  This is the equivalent of taking the Hadamard product across four vectors, for each tuple in the output.  The final $\projection$ operation, will involve summing the vectors of the tuples from the join output, whose attributes share the same value(s).

Such a query setting generalizes to the Sum of Products operation over the tuple vectors $\genV$.