More changes @atri, @okennedy, @lordpretzel 021121 comments.

2022-02-14 12:28:41 -05:00 · 2022-02-14 12:28:41 -05:00 · 360920a8ec
parent 47fbb0da36
commit 360920a8ec
5 changed files with 72 additions and 55 deletions
--- a/circuits-model-runtime.tex
+++ b/circuits-model-runtime.tex
@ -41,17 +41,17 @@ For these algorithms, $\jointime{R_1, \ldots, R_n}$ is linear in the {\em AGM bo
 \noindent\resizebox{1\linewidth}{!}{
 \begin{minipage}{1.0\linewidth}
  \begin{align*}
-    \qruntimenoopt{R,\db}                    & = |\db.R| &
-    \qruntimenoopt{\sigma \query, \db}       & = \qruntimenoopt{\query,\db} &
-    \qruntimenoopt{\pi \query, \db}          & = \qruntimenoopt{\query,\db} + \abs{\query(\db)}
+    \qruntimenoopt{R,\db,\bound}                    & = |\db.R| &
+    \qruntimenoopt{\sigma \query, \db,\bound}       & = \qruntimenoopt{\query,\db} &
+    \qruntimenoopt{\pi \query, \db,\bound}          & = \qruntimenoopt{\query,\db,\bound} + \abs{\query(\db)}
  \end{align*}\\[-15mm]
  \begin{align*}
-    \qruntimenoopt{\query \cup \query', \db} & = \qruntimenoopt{\query, \db} + 
-                                            \qruntimenoopt{\query', \db} +
+    \qruntimenoopt{\query \cup \query', \db,\bound} & = \qruntimenoopt{\query, \db,\bound} + 
+                                            \qruntimenoopt{\query', \db,\bound} +
                                            \abs{\query(D)}+\abs{\query'(D)} \\
-    \qruntimenoopt{\query_1 \bowtie \ldots \bowtie \query_m, \db} 
-                                        & = \qruntimenoopt{\query_1, \db} + \ldots + 
-                                            \qruntimenoopt{\query_m,\db} + 
+    \qruntimenoopt{\query_1 \bowtie \ldots \bowtie \query_m, \db,\bound} 
+                                        & = \qruntimenoopt{\query_1, \db,\bound} + \ldots + 
+                                            \qruntimenoopt{\query_m,\db,\bound} + 
                                            \jointime{\query_1(\db), \ldots, \query_m(\db)}
 \end{align*}
 \end{minipage}
@ -63,7 +63,7 @@ We assume that full table scans are used for every base relation access. We can
 %Observe that 
 % () .\footnote{This claim can be verified by e.g. simply looking at the {\em Generic-Join} algorithm in~\cite{skew} and {\em factorize} algorithm in~\cite{factorized-db}.} It can be verified that the above cost model on the corresponding $\raPlus$ join queries correctly captures the runtime of current best known .

-Finally, \Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and $\dbbase$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{Q, \dbbase})$. Recall we assumed these two bounds when we moved from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}.
+Finally, \Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and $\tupset$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\tupset,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{Q, \tupset,\bound})$. Recall we assumed these two bounds when we moved from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}.
 %
 %We now make a simple observation on the above cost model:
 %\begin{proposition}
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -4,8 +4,8 @@

 \secrev{
 This work explores the problem of computing the expectation of a tuple's multiplicity in an important special case of bag \abbrTIDB, which we call a \abbrCTIDB.  A \abbrCTIDB,
-$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$.  $\tupset$ is the set of tuples appearing across all possible worlds, and the set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\numvar=\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds.  A given world $\worldvec \in\worlds$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$.  The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{\tup} = j}$ (for $j \in\pbox{\bound}$), where each %distribution 
-$\tup$ is an independent random event. %for $\tup \in \tupset$.
+$\pdb = \inparen{\worlds, \bpd}$ encodes a bag of uncertain tuples such that each tuple in $\pdb$ has a multiplicity of at most $\bound$.  $\tupset$ is the set of tuples appearing across all possible worlds, and the set of all worlds is encoded in $\worlds$, which is the set of all vectors of length $\numvar=\abs{\tupset}$ such that each index corresponds to a distinct $\tup \in \tupset$ storing its multiplicity. $\bpd$ is a product distribution over the set of all worlds.  A given world $\worldvec \in\worlds$ can be interpreted such that, for each $\tup \in \tupset$, $\worldvec\pbox{\tup}$ is the multiplicity of $\tup$ in $\worldvec$.  The resulting product distribution can then be encoded as $\prob_{\tup} = \probOf\pbox{W\pbox{\tup} = j}$ (for $j \in\pbox{\bound}$), where each tuple multiplicity combination $\inparen{\inparen{\tup, \bound} \in \tupset\times\pbox{\bound}}$ %distribution 
+is an independent random event. %for $\tup \in \tupset$.
 }  
 %\mypar{For a later section}
 %\sout{
@ -18,7 +18,11 @@ In this work, since we are generally considering bag query input, we will only b
 We can formally state our problem of computing the expected multiplicity of a result tuple as:

 \begin{Problem}\label{prob:expect-mult}
-Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$.
+Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$
+\footnote{
+A query $\query$ is an $\raPlus$ query if it is composed entirely of one or more of the positive relational operators $\inset{\select, \project, \join, \union}$.
+}
+, and result tuple $\tup$, compute the expected multiplicity of $\tup$: $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$.
 \end{Problem}

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -99,13 +103,13 @@ Given a \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query $\query$, a
 %	\label{fig:ctidb-red}
 %\end{figure}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-It is natural to explore computing the expected multiplicity of result tuple as this is the analog for computing the marginal probability of a tuple in a set \abbrPDB.
+It is natural to explore computing the expected multiplicity of a result tuple as this is the analog for computing the marginal probability of a tuple in a set \abbrPDB.
 In this work we will assume that $c =\bigO{1}$ since this is what typically seen in practice.
 %because of the cancellation effect of queries over a $1$-\abbrBIDB (introduced later), where, for the worst case, a self join query, we would have a factor of $\frac{1}{c^{n-1}}$ cancellations. 
 Allowing for unbounded $c$ is an interesting open problem.

 \mypar{Hardness of Set Query Semantics and Bag Query Semantics}
-Set query evaluation semantics over $1$-\abbrTIDB\xplural have been studied extensively, and the data complexity of the problem in general has been shown by Dalvi and Suicu to be \sharpphard\cite{10.1145/1265530.1265571}.  For our setting, there exists a trivial polytime algorithm to compute~\Cref{prob:expect-mult} for any query over a \abbrCTIDB due to linearity of expection by siimply computing the expectation over a `sum-of-products' representation of the query operations of $\query\inparen{\pdb}\inparen{\tup}$.  %We discuss polynomial representation and equivalence in the following subsection.  
+Set query evaluation semantics over $1$-\abbrTIDB\xplural have been studied extensively, and the data complexity of the problem in general has been shown by Dalvi and Suicu to be \sharpphard\cite{10.1145/1265530.1265571}.  For our setting, there exists a trivial polytime algorithm to compute~\Cref{prob:expect-mult} for any $\raPlus$ query over a \abbrCTIDB due to linearity of expection by simply computing the expectation over a `sum-of-products' representation of the query operations of $\query\inparen{\pdb}\inparen{\tup}$.  %We discuss polynomial representation and equivalence in the following subsection.  
 Since we can compute~\Cref{prob:expect-mult} in polynomial time, the interesting question that we explore deals with analyzing the hardness of computing expectation using fine-grained analysis and parameterized complexity, where we are interested in the exponent of polynomial runtime.
 }

@ -122,7 +126,10 @@ Specifically, in this work we ask if~\Cref{prob:expect-mult} can be solved in ti
 %Define $\gentupset$ to be the set of tuples appearing across all the possible worlds of a $\abbrCTIDB$, formally $\gentupset = \inset{\tup_i ~|~ \forall \worldvec \in \worlds,~\forall i \in \abs{\tupset}:~\worldvec\pbox{i} > 0}$.  When a specific $\pdb = \inparen{\worlds, \bpd}$ is being referred to, we will use $\tupset$ to denote the set of tuples.
 %\end{Definition}

-Let $T_{det}\inparen{\query, \gentupset, \bound} = \query\inparen{\gentupset}$ for arbitrary query $\query$, deterministic database $\gentupset$, and multiplicity bound $c$.  Let $\qruntime{\query, \gentupset, \bound} = \min_{\query':\query'\equiv\query}T_{det}\inparen{\query, \gentupset, \bound}$ be the optimal runtime (with some caveats; discussed in~\Cref{sec:gen}) of query $\query$ on deterministic database $\tupset$.
+Let $\qruntime{\optquery{\query},\gentupset,\bound}$ denote the runtime for query $\optquery{\query}$, deterministic database $\gentupset$, and multiplicity bound $\bound$.  Being we consider $\raPlus$ queries in which order of operators can impact runtime, we denote the optimal query as $\optquery{\query} = \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$.
+%let $\qruntim{\optquery{\query}, \gentupset, \bound} = \min_{\query'\in\raPlus,~\query'\equiv\query}T_{det}\inparen{\query, \gentupset, \bound}$ be the runtime for the optimally structured equivalent $\raPlus$ query $\query'$ (with some caveats; discussed in~\Cref{sec:gen}). % of query $\query$ on deterministic database $\tupset$.
+%{\newline\noindent\centerline{\Huge \textcolor{black}{Or instead$\ldots$}}}
+%\newline\noindent Let $T_{det}\inparen{\query, \gentupset, \bound}$ denote the runtime for $\raPlus$ query $\query$, deterministic database $\gentupset$, and multiplicity bound $\bound$.  Since this paper does not consider optimization schemes, we leave optimization to the reader and show that our results hold across all inputs.

 %We make this runtime concrete later on.
 %We denote by $\dbbase$ the base \abbrCTIDB table containing all possible tuples, formally as,
@ -135,33 +142,33 @@ Let $T_{det}\inparen{\query, \gentupset, \bound} = \query\inparen{\gentupset}$ f
 \hline
 Lower bound on $\timeOf{}^*(\query,\pdb)$ & Num. $\bpd$s & Hardness Assumption\\
 \hline
-$\Omega\inparen{\inparen{\qruntime{\query, \tupset, \bound}}^{1+\eps_0}}$ for {\em some} $\eps_0>0$ & Single & Triangle Detection hypothesis\\
+$\Omega\inparen{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^{1+\eps_0}}$ for {\em some} $\eps_0>0$ & Single & Triangle Detection hypothesis\\
 %\hline
-$\omega\inparen{\inparen{\qruntime{\query, \tupset, \bound}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
+$\omega\inparen{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
 %\hline
-$\Omega\inparen{\inparen{\qruntime{\query, \tupset, \bound}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & \Cref{conj:known-algo-kmatch}\\ %Multiple & Current $k$-matching algorithms\\
+$\Omega\inparen{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & \Cref{conj:known-algo-kmatch}\\ %Multiple & Current $k$-matching algorithms\\
 \hline
 \end{tabular}
-\caption{Our lower bounds for a specific hard query $\query$ parameterized by $k$. For $\pdb = \inset{\worlds, \bpd}$ those with `Multiple' in the second column need the algorithm to be able to handle multiple $\bpd$ (for a given $\tupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
+\caption{Our lower bounds for a specific hard query $\query$ parameterized by $k$.For $\pdb = \inset{\worlds, \bpd}$ those with `Multiple' in the second column need the algorithm to be able to handle multiple $\bpd$, i.e. probability distributions (for a given $\tupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
 \label{tab:lbs}
 \end{table}
 \mypar{Our lower bound results}
-Our question is whether or not it is always true that $\timeOf{}^*\inparen{\query, \pdb}\leq\qruntime{\query, \tupset, \bound}$.  Unfortunately this is not the case.  
+Our question is whether or not it is always true that $\timeOf{}^*\inparen{\query, \pdb}\leq\qruntime{\optquery{\query}, \tupset, \bound}$.  Unfortunately this is not the case.  
 ~\Cref{tab:lbs} shows our results.%our lower bounds for computing~\Cref{prob:expect-mult} on \abbrCTIDB\xplural.  

-Specifically, depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question.  To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le  O\inparen{\inparen{\qruntime{Q, \tupset, \bound}}^k}$, where $k$ is the join width (our notion of join width follows from~\Cref{def:degree-of-poly} and~\Cref{fig:nxDBSemantics}.) of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
+Specifically, depending on what hardness result/conjecture we assume, we get various emphatic versions of {\em no} as an answer to our question.  To make some sense of the other lower bounds in Table~\ref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(Q,\pdb) \le  \bigO{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^k}$, where $k$ is the join width (our notion of join width follows from~\Cref{def:degree-of-poly} and~\Cref{fig:nxDBSemantics}.) of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).

 What our lower bound in the third row says is that one cannot get more than a polynomial improvement over essentially the trivial algorithm for~\Cref{prob:expect-mult}.
- However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\query, \tupset, \bound}$ by just $\numvar$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard. 
+ However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). Further, we note that existing results already imply the claimed lower bounds if we were to replace the $\qruntime{\optquery{\query}, \tupset, \bound}$ by just $\numvar$ (indeed these results follow from known lower bound for deterministic query processing). Our contribution is to then identify a family of hard queries where deterministic query processing is `easy' but computing the expected multiplicities is hard. 

-\mypar{Our upper bound results} We introduce an $(1\pm \epsilon)$-approximation algorithm that computes ~\Cref{prob:expect-mult} in time $O_\epsilon\inparen{\qruntime{\query, \tupset, \bound}}$.  This means, when we are okay with approximation, that we solve~\Cref{prob:expect-mult} in time linear in the size of the deterministic query %$\timeOf{Approx}^*\inparen{\query, \pdb}\leq\qruntime{\query,\tupset,\bound}$ (where $\timeOf{Approx}^*\inparen{\cdot}$ denotes runtime of approximation algorithm), 
+\mypar{Our upper bound results} We introduce an $(1\pm \epsilon)$-approximation algorithm that computes ~\Cref{prob:expect-mult} in time $O_\epsilon\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$.  This means, when we are okay with approximation, that we solve~\Cref{prob:expect-mult} in time linear in the size of the deterministic query %$\timeOf{Approx}^*\inparen{\query, \pdb}\leq\qruntim{\optquery{\query},\tupset,\bound}$ (where $\timeOf{Approx}^*\inparen{\cdot}$ denotes runtime of approximation algorithm), 
 and bag \abbrPDB\xplural are deployable in practice.
 % In particular, we show the following upper bound results.
 %(i) We show that e.g. for a circuit representation of the lineage polynomial (more on this later), when the circuit is a tree and there is a single
 % result tuple, we also have the same runtime  (we can also handle the case of multiple result tuples\footnote{We can approximate the expected result tuple multiplicities (for all result tuples {\em simultanesouly}) with only $O(\log{Z})=O_k(\log{n})$ overhead (where $Z$ is the number of result tuples) over the runtime of a broad class of query processing algorithms (see \Cref{app:sec-cicuits}).}).
 %Further, we show that for {\em any} $\raPlus$ query on a \abbrTIDB $(1$-$\abbrTIDB)$, we also obtain linear runtime for approximation.
 % the approximation algorithm has runtime linear in the size of the compressed lineage encoding (
-In contrast, known approximation techniques (\cite{DBLP:conf/icde/OlteanuHK10,DBLP:journals/jal/KarpLM89}) in set-\abbrPDB\xplural need time $\Omega(\qruntime{\query, \tupset, \bound}^{2k})$ %, where $\circuit$ is a representation of the query operations and input to produce $\tup$; more on this shortly. 
+In contrast, known approximation techniques (\cite{DBLP:conf/icde/OlteanuHK10,DBLP:journals/jal/KarpLM89}) in set-\abbrPDB\xplural need time $\Omega(\qruntime{\optquery{\query}, \tupset, \bound}^{2k})$ %, where $\circuit$ is a representation of the query operations and input to produce $\tup$; more on this shortly. 
 (see \Cref{sec:karp-luby}).
 Further, our approximation algorithm works for a more general notion of bag \abbrPDB\xplural beyond \abbrCTIDB\xplural
 %we generalize the \abbrPDB data model considered by the approximation algorithm to a class of bag-Block Independent Disjoint Databases 
@ -203,12 +210,12 @@ multiplicity of the polynomial $\apolyqdt$ (i.e., $\expct_{\vct{W}\sim \pdassign
 %where $\pdassign$ is the distribution induced by $\pd$ on the relevant assignments $\vct{W}$ to variables of $\apolyqdt$.
 \end{Problem}
 We note that computing \Cref{prob:expect-mult} 
-is equivalent to computing \Cref{prob:bag-pdb-poly-expected} (see \Cref{prop:expection-of-polynom}).
+is equivalent (yields the same result as) to computing \Cref{prob:bag-pdb-poly-expected} (see \Cref{prop:expection-of-polynom}).
 %In this work, we study the complexity of \Cref{prob:bag-pdb-poly-expected} for several models of probabilistic databases and various encodings of such polynomials.
 }

 \secrev{
-All of our results rely on working with a {\em reduced} form of the lineage polynomial $\poly$. In fact, it turns out that for the $1$-\abbrTIDB case, computing the expected multiplicity (over bag query semantics) is {\em exactly} the same as evaluating this reduced polynomial over the probabilities that define the $1$-\abbrTIDB.  This is also true when the query input(s) is a block independent disjoint probabilistice database (with tuple multiplicity of at most $1$), which we refer to as a $1$-\abbrBIDB. 
+All of our results rely on working with a {\em reduced} form $\inparen{\poly}$ of the lineage polynomial $\poly$. In fact, it turns out that for the $1$-\abbrTIDB case, computing the expected multiplicity (over bag query semantics) is {\em exactly} the same as evaluating this reduced polynomial over the probabilities that define the $1$-\abbrTIDB.  This is also true when the query input(s) is a block independent disjoint probabilistice database (with tuple multiplicity of at most $1$), which we refer to as a $1$-\abbrBIDB. 
 % For our results to be applicable to \abbrCTIDB\xplural, we introduce the following reduction.
 %\begin{Definition}
 %Any \abbrCTIDB $\pdb$, can be reduced to an equivalent $1$-\abbrBIDB $\pdb'$ in the following manner.  For each $\tup_i \in \tupset$, create a block of $\bound + 1$ disjoint \abbrBIDB tuples in $\pdb'$ such that each tuple in the newly formed block is mapped to its own boolean variable $X_{i, j}$ for $i \in \abs{D}$ and $j \in \pbox{c+1}$.  Then, given $\worldvec \in \worlds$, the equivalent world in $\pdb'$ will set each variable $X_{i, j} = 1$ for each $\worldvec\pbox{i} = j$, while $\inparen{\text{for }\ell \neq j}$ all other $X_{i, \ell} \in \vct{X}$ of $\pdb'$ are set to $0$.
@ -229,15 +236,15 @@ The lineage polynomial for $Q_1^2$ is given by $\poly_1^2\inparen{A, B, C, E, X,
 $$
 =A^2X^2B^2 + B^2Y^2E^2 + B^2Z^2C^2 + 2AXB^2YE + 2AXB^2ZC + 2B^2YEZC.
 $$
-To compute $\expct\pbox{\poly_1^2}$ we can use linearity of expectation and push the expectation through each summand.  To keep things simple, let us focus on the monomial $\poly_1^{\inparen{ABX}^2} = A^2X^2B^2$ as the procedure is the same for all other monomials of $\poly_1^2$.  Let $\randWorld_X$ be the random variable corresponding to a lineage variable $X$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B^2}$.  Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can further derive $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$.  However, we get stuck with $\expct\pbox{\randWorld_X^2}$, since $\randWorld_X\in\inset{0, 1, 2}$ and for $\randWorld_X \gets 2$, $\randWorld_X^2 \neq \randWorld_X$.
+To compute $\expct\pbox{\poly_1^2}$ we can use linearity of expectation and push the expectation through each summand.  To keep things simple, let us focus on the monomial $\poly_1^{\inparen{ABX}^2} = A^2X^2B^2$ as the procedure is the same for all other monomials of $\poly_1^2$.  Let $\randWorld_X$ be the random variable corresponding to a lineage variable $X$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_X^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B^2}$.  Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can further derive $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_X^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$.  Observe that if $X\in\inset{0, 1}$, then we further would have $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_X}\expct\pbox{\randWorld_B} = \prob_A\cdot\prob_X\cdot\prob_B$ (denoting $\probOf\pbox{\randWorld_A = 1} = \prob_A$) $= \rpoly_1^{\inparen{ABX}^2}\inparen{\prob_A, \prob_X, \prob_B}$ (see $ii)$ of~\Cref{def:reduced-poly}).  However, in this example, we get stuck with $\expct\pbox{\randWorld_X^2}$, since $\randWorld_X\in\inset{0, 1, 2}$ and for $\randWorld_X \gets 2$, $\randWorld_X^2 \neq \randWorld_X$.

 %the expectation is $\expct\pbox{A^2X^2B^2} = A\cdot\prob_A\cdot\inparen{\sum\limits_{i \in [2]}X_i\cdot \prob_{X, i}}\cdot B\prob_B$ for $X \in \inset{0, 1, 2}$.  
 
-Denote the variables of $\poly$ to be $\vars{\poly}.$  In the \abbrCTIDB setting, $\poly\inparen{\vct{X}}$ has an equivalent reformulation $\inparen{\refpoly{}}$ that is of use to us.  Given $X_\tup \in\vars{\poly}$, by definition $X_\tup \in\inset{0,\ldots, c}$.  We can replace $X_\tup$ by $\sum_{j\in\pbox{\bound}}X_{\tup, j}$ where each $X_{\tup, j}\in\inset{0, 1}$.  Then for any $\worldvec\in\worlds$, we set $X_{\tup, j} = 1$ for $\worldvec_\tup = j$, while $X_{\tup, j'} = 0$ for all $j'\neq j\in\pbox{\bound}$.  By construction then $\poly\inparen{\vct{X}}\equiv\refpoly{}\inparen{\vct{X}}$ since for any $X_\tup\in\vars{\poly}$ we have the equality $X_\tup = j = \sum_{j\in\pbox{\bound}}jX_j$.
+Denote the variables of $\poly$ to be $\vars{\poly}.$  In the \abbrCTIDB setting, $\poly\inparen{\vct{X}}$ has an equivalent reformulation $\inparen{\refpoly{}}$ that is of use to us.  Given $X_\tup \in\vars{\poly}$, by definition $X_\tup \in\inset{0,\ldots, c}$.  We can replace $X_\tup$ by $\sum_{j\in\pbox{\bound}}X_{\tup, j}$ where each $X_{\tup, j}\in\inset{0, 1}$.  Then for any $\worldvec\in\worlds$, we set $X_{\tup, j} = 1$ for $\worldvec_\tup = j$, while $X_{\tup, j'} = 0$ for all $j'\neq j\in\pbox{\bound}$.  By construction then $\poly\inparen{\vct{X}}\equiv\refpoly{}\inparen{\vct{X_R}}$ $\inparen{\vct{X_R} = \vars{\refpoly{}}}$ since for any $X_\tup\in\vars{\poly}$ we have the equality $X_\tup = j = \sum_{j\in\pbox{\bound}}jX_j$.

 Considering again our example, 
 \begin{multline*}
-\refpoly{1, }^{\inparen{ABX}^2}\inparen{A, X, B} =  \poly^{\inparen{AXB}^2}\inparen{\sum_{j_1\in\pbox{\bound}}j_1A_{j_1}, \sum_{j_2\in\pbox{\bound}}j_2X_{j_2}, \sum_{j_3\in\pbox{\bound}}j_3B_{j_3}} \\
+\refpoly{1, }^{\inparen{ABX}^2}\inparen{A, X, B} =  \poly_1^{\inparen{AXB}^2}\inparen{\sum_{j_1\in\pbox{\bound}}j_1A_{j_1}, \sum_{j_2\in\pbox{\bound}}j_2X_{j_2}, \sum_{j_3\in\pbox{\bound}}j_3B_{j_3}} \\
 = \inparen{\sum_{j_1\in\pbox{\bound}}j_1A_{j_1}}^2\inparen{\sum_{j_2\in\pbox{\bound}}j_2X_{j_2}}^2\inparen{\sum_{j_3\in\pbox{\bound}}j_3B_{j_3}}^2.
 \end{multline*}
 Since the set of multiplicities for tuple $\tup$ by nature are disjoint we can drop all cross terms and have $\refpoly{1, }^2 = \sum_{j_1, j_2, j_3 \in \pbox{\bound}}j_1^2A^2_{j_1}j_2^2X_{j_2}^2j_3^2B^2_{j_3}$. Computing  expectation we get $\expct\pbox{\refpoly{1, }^2}=\sum_{j_1,j_2,j_3\in\pbox{\bound}}j_1^2j_2^2j_3^2\expct\pbox{\randWorld_{A_{j_1}}}\expct\pbox{\randWorld_{X_{j_2}}}\expct\pbox{\randWorld_{B_{j_3}}}$, since we now have that all $\randWorld_{X_j}\in\inset{0, 1}$.
@ -306,7 +313,7 @@ $, where $\probAllTup = \inparen{\inparen{\prob_{\tup, j}}_{\tup\in\tupset, j\in
 \secrev{
 \subsection{Our Techniques}
 \mypar{Lower Bound Proof Techniques}
-Our main hardness result shows that computing~\Cref{prob:expect-mult} is $\sharpwonehard$ for $1$-\abbrTIDB. To prove this result we show that for the same $\query_1$ from the example above, for an arbitrary `product width' $k$, the query $Q^k$ is able to encode various hard graph-counting problems (assuming $\bigO{\numvar}$ tuples rather than the $O(1)$ tuples in \Cref{fig:two-step}).
+Our main hardness result shows that computing~\Cref{prob:expect-mult} is $\sharpwonehard$ for $1$-\abbrTIDB. To prove this result we show that for the same $\query_1$ from the example above, for an arbitrary `product width' $k$, the query $Q^k$ is able to encode various hard graph-counting problems (assuming $\bigO{\numvar}$ tuples rather than the $\bigO{1}$ tuples in \Cref{fig:two-step}).
 We do so by considering an arbitrary graph $G$ (analogous to relation $\boldsymbol{R}$ of $\query$) and analyzing how the coefficients in the (univariate) polynomial $\widetilde{\poly}\left(p,\dots,p\right)$ relate to counts of subgraphs in $G$ that are isomorphic to various graphs with $k$ edges. E.g., we exploit the fact that the leading coefficient in $\poly$ corresponding to $\query^k$ is proportional to the number of $k$-matchings in $G$, a known hard problem in parameterized/fine-grained complexity literature.

 \mypar{Upper Bound Techniques}
@ -315,20 +322,20 @@ Our negative results (\Cref{tab:lbs}) indicate that \abbrCTIDB{}s (even for $\bo
 \input{two-step-model}
 We adopt the two-step intensional model of query evaluation used in set-\abbrPDB\xplural, as illustrated in \Cref{fig:two-step}:
 (i) \termStepOne (\abbrStepOne): Given input $\tupset$ and $\query$, output every tuple $\tup$ that possibly satisfies $\query$, annotated with its lineage polynomial ($\poly(\vct{X})=\apolyqdt\inparen{\vct{X}}$);
-(ii) \termStepTwo (\abbrStepTwo): Given $\poly(\vct{X})$ for each tuple, compute $\expct\pbox{\poly(\vct{\randWorld})}$.
+(ii) \termStepTwo (\abbrStepTwo): Given $\poly(\vct{X})$ for each tuple, compute $\expct_{\randWorld\sim\bpd}\pbox{\poly(\vct{\randWorld})}$.
 Let $\timeOf{\abbrStepOne}(Q,\tupset,\circuit)$ denote the runtime of \abbrStepOne when it outputs $\circuit$ (which is a representation of $\poly$ as an arithmetic circuit --- more on this representation shortly).
 Denote by $\timeOf{\abbrStepTwo}(\circuit, \epsilon)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo, which we can leverage~\Cref{def:reduced-poly} and~\Cref{lem:tidb-reduce-poly} to address the next formal objective: % to formally define our objective:

 \begin{Problem}[\abbrCTIDB linear time approximation]\label{prob:big-o-joint-steps}
 Given \abbrCTIDB $\pdb$, $\raPlus$ query $\query$,
 is there a $(1\pm\epsilon)$-approximation of $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$ for all result tuples $\tup$ where
-$\exists \circuit : \timeOf{\abbrStepOne}(Q,\tupset, \circuit) + \timeOf{\abbrStepTwo}(\circuit, \epsilon) \le O_\epsilon(\qruntime{Q, \tupset, \bound})$?
+$\exists \circuit : \timeOf{\abbrStepOne}(Q,\tupset, \circuit) + \timeOf{\abbrStepTwo}(\circuit, \epsilon) \le O_\epsilon(\qruntime{\optquery{\query}, \tupset, \bound})$?
 \end{Problem}

-We show in \Cref{sec:circuit-depth} an $O(\qruntime{Q, \tupset, \bound})$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single circuit $\circuit$ with one sink per tuple representing the tuple's lineage).
+We show in \Cref{sec:circuit-depth} an $\bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more more precisely, a single circuit $\circuit$ with one sink per tuple representing the tuple's lineage).
 A key insight of this paper is that the representation of $\circuit$ matters.
-For example, if we insist that $\circuit$ represent the lineage polynomial in \abbrSMB, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{\query, \tupset, \bound}}^k}$,
-and hence, just $\timeOf{\abbrStepOne}(Q,\tupset,\circuit)$ will be too large.
+For example, if we insist that $\circuit$ represent the lineage polynomial in \abbrSMB, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^k}$,
+and hence, just $\timeOf{\abbrStepOne}(\query,\tupset,\circuit)$ will be too large.

 However, systems can directly emit compact, factorized representations of $\poly(\vct{X})$ (e.g., as a consequence of the standard projection push-down optimization~\cite{DBLP:books/daglib/0020812}).
 For example, in~\Cref{fig:two-step}, $B(Y+Z)$ is a factorized representation of the SMB-form $BY+BZ$.
@ -337,9 +344,9 @@ Accordingly, this work uses (arithmetic) circuits\footnote{
 }
 as the representation system of $\poly(\vct{X})$.

-Given that there exists a representation $\circuit^*$ such that $\timeOf{\abbrStepOne}(\query,\tupset,\circuit^*)\le O(\qruntime{\query, \tupset, \bound})$, we can now focus on the complexity of \abbrStepTwo.
+Given that there exists a representation $\circuit^*$ such that $\timeOf{\abbrStepOne}(\query,\tupset,\circuit^*)\le \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$, we can now focus on the complexity of \abbrStepTwo.
 We can represent the factorized lineage polynomial by its correspoding arithmetic circuit $\circuit$ (whose size we denote by $|\circuit|$).
-As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{\query, \tupset, \bound}$ (i.e., $|\circuit^*| \le O(\qruntime{\query, \tupset, \bound})$).
+As we also show in \Cref{sec:circuit-runtime}, this size is also bounded by $\qruntime{\optquery{\query}, \tupset, \bound}$ (i.e., $|\circuit^*| \le \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$).
 Thus, the question of approximation %\Cref{prob:big-o-joint-steps} 
 can be stated as the following stronger (since~\Cref{prob:big-o-joint-steps} has access to \emph{all} equivalent \circuit representing $\query\inparen{\vct{W}}\inparen{\tup}$), but sufficient condition:
 \begin{Problem}\label{prob:intro-stmt}
--- a/macros.tex
+++ b/macros.tex
@ -141,7 +141,7 @@
 \newcommand{\tupset}{D}
 \newcommand{\gentupset}{\overline{D}}
 \newcommand{\world}{\inset{0,\ldots, c}}
-\newcommand{\worldvec}{\vct{M}}
+\newcommand{\worldvec}{\vct{W}}
 \newcommand{\worlds}{\world^\tupset}
 \newcommand{\bpd}{\mathcal{P}}%bpd for bag probability distribution
 %BIDB
@ -211,6 +211,7 @@
 %Instance Variables
 \newcommand{\prob}{p}
 \newcommand{\wElem}{w} %an element of \vct{w}
+\newcommand{\worldinst}{W}
 %Polynomial Variables
 \newcommand{\pVar}{X}%<----not used but recomment instituting this--pVar for polyVar
 \newcommand{\kElem}{k}%the kth element<---where and how are we using this?
@ -333,8 +334,9 @@
 \newcommand{\sharpwonehard}{\#{\sf W}[1]-hard\xspace}
 \newcommand{\ptime}{{\sf PTIME}\xspace}
 \newcommand{\timeOf}[1]{T_{#1}}
-\newcommand{\qruntime}[1]{T^*_{det}(#1)}
-\newcommand{\qruntimenoopt}[1]{T_{det}\inparen{#1}}
+\newcommand{\qruntime}[1]{T_{det}\inparen{#1}}
+\newcommand{\optquery}[1]{\func{OPT}\inparen{#1}}
+\newcommand{\qruntimenoopt}[1]{T_{det}\inparen{#1}}%need to get rid of this--needs to be propagated
 \newcommand{\jointime}[1]{T_{join}(#1)}
 \newcommand{\kmatchtime}{T_{match}\inparen{k, G}}

--- a/mult_distinct_p.tex
+++ b/mult_distinct_p.tex
@ -3,7 +3,7 @@
 \section{Hardness of Exact Computation}
 \label{sec:hard}
 In this section, we will prove the hardness results claimed in Table~\ref{tab:lbs} for a specific (family) of hard instance $(\query,\pdb)$ for \Cref{prob:bag-pdb-poly-expected} where $\pdb$ is a $1$-\abbrTIDB.
- Note that this implies hardness for \abbrCTIDB\xplural $\inparen{\bound\geq1}$, \bis and general \abbrBPDB, showing \Cref{prob:bag-pdb-poly-expected} cannot be done in $\bigO{\qruntime{\query,\tupset}}$ runtime.
+ Note that this implies hardness for \abbrCTIDB\xplural $\inparen{\bound\geq1}$, \bis and general \abbrBPDB, showing \Cref{prob:bag-pdb-poly-expected} cannot be done in $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$ runtime.
 %(and hence the equivalent \Cref{prob:bag-pdb-query-eval})
 %in the negative. 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -55,24 +55,24 @@ as $R_i$ for each $i \in [k]$.  The query $\query^k$ then becomes
 \begin{lstlisting}
 SELECT COUNT(*) FROM $R_1$ JOIN $R_2$ JOIN$\cdots$JOIN $R_k$
 \end{lstlisting}          
-\noindent Further, the \abbrCTIDB instance of~\Cref{fig:two-step} generalizes to one compatible to~\Cref{def:qk} as follows. Relation $T$ has $n$ tuples corresponding to each vertex for $i$ in $[n]$, each with probability $\prob_i$ and $R$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $R$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $R$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
+\noindent Consider again the \abbrCTIDB instance $\pdb$ of~\Cref{fig:two-step} and, for our hard instance, let $\bound = 1$.  $\pdb$ generalizes to one compatible to~\Cref{def:qk} as follows. Relation $T$ has $n$ tuples corresponding to each vertex for $i$ in $[n]$, each with probability $\prob_i$ and $R$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $R$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $R$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
 In other words, for this instance $\tupset$ contains the set of $\numvar$ unary tuples in $T$ (which corresponds to $\vset$) and $\numedge$ binary tuples in $R$ (which corresponds to $\edgeSet$).
 Note that this implies that $\poly_{G}^\kElem$ is indeed a \abbrCTIDB-lineage polynomial. % for a \abbrTIDB \abbrPDB.
-\AH{Can the proofs generalize to $2$-\abbrTIDB, as the new updated~\Cref{fig:two-step} now is?}
+
 Next, we note that the runtime for answering $\query^k$ on deterministic database $\tupset$, as defined above, is $\bigO{\numedge}$ (i.e. deterministic query processing is `easy' for this query):
 \begin{Lemma}\label{lem:tdet-om}
 Let $\query^k$ and $\tupset$ be as defined above. Then
 % of \Cref{def:qk}, the runtime 
 $\qruntimenoopt{\query^k, \tupset}$ is $\bigO{\kElem\numedge}$.
 \end{Lemma}
-\AH{Should the above be $\qruntimenoopt{}$ or $\qruntime$?}
+\AH{Should the above be $\qruntimenoopt{}$ or $\qruntime{}$?}
 \subsection{Multiple Distinct $\prob$ Values}
 \label{sec:multiple-p}
 %Unless otherwise noted, all proofs for this section are in \Cref{app:single-mult-p}.
 We are now ready to present our main hardness result.
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\AH{Note that~\Cref{def:reduced-poly} has been changed, where we compute $\rpoly$ from $\refpoly{}$.}
+
 \begin{Theorem}\label{thm:mult-p-hard-result}
 Let $\prob_0,\ldots,\prob_{2k}$ be $2k + 1$ distinct values in $(0, 1]$.  Then computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ (over all $i\in [2k+1]$ for arbitrary $G=(\vset,\edgeSet)$
 %and any $(2k+1)$ distinct values $\prob_i$ ($0\le i \le 2k$)
--- a/ra-to-poly.tex
+++ b/ra-to-poly.tex
@ -7,18 +7,19 @@
 %We now introduce some terminology 
 %and develop a reduced form of lineage polynomials for a \abbrBIDB or \abbrTIDB.
 %Note that 
-\secrev{A }
- polynomial over $\vct{X}=(X_1,\dots,X_n)$ with individual degree $B <\infty$ 
+\secrev{
+A polynomial over a set of variables $\vct{S}$ with $\abs{S}=\numedge$ and individual degree $B <\infty$ 
 is formally defined as (where $c_{\vct{d}}\in \semN$): 
 \begin{equation}
  \label{eq:sop-form}
-\poly\inparen{X_1,\dots,X_n}=\secrev{\sum_{\vct{d}\in\{0,\ldots,B\}^\tupset} c_{\vct{d}}\cdot \prod_{\tup\in\tupset} X_\tup^{d_\tup}.}
+\poly\inparen{S_1,\dots,S_\numedge}=\sum_{\vct{d}\in\{0,\ldots,B\}^\tupset} c_{\vct{d}}\cdot \prod_{i\in\pbox{\numedge}}S_i^{d_i}.
 \end{equation}
+}
 %where $c_{\vct{d}}\in \semN$.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[Standard Monomial Basis]\label{def:smb}
-The term $\prod_{\tup\in\tupset} X_\tup^{d_\tup}$ in \Cref{eq:sop-form} is a {\em monomial}. A polynomial $\poly\inparen{\vct{X}}$ is in standard monomial basis (\abbrSMB) when we keep only the terms with $c_{\vct{d}}\ne 0$ from \Cref{eq:sop-form}.
+\secrev{The term $\prod_{i\in\pbox{\numedge}} S_i^{d_i}$ }in \Cref{eq:sop-form} is a {\em monomial}. A polynomial $\poly\inparen{\vct{X}}$ is in standard monomial basis (\abbrSMB) when we keep only the terms with $c_{\vct{d}}\ne 0$ from \Cref{eq:sop-form}.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Unless othewise noted, we consider all polynomials to be in \abbrSMB representation. 
@ -26,8 +27,9 @@ When it is unclear, we use $\smbOf{\poly}$ to denote the \abbrSMB form of a poly

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[Degree]\label{def:degree-of-poly}
-The degree of polynomial $\poly(\vct{X})$ is the largest \secrev{$\norm{\vct{d}}_1$}% = \sum_{\tup\in\tupset} d_\tup$ 
-such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
+The degree of polynomial $\poly(\vct{X})$ is the largest \secrev{$\vct{d} = \sum_{i\in\pbox{\numedge}}d_i %= \norm{\vct{d}}_1
+$}% = \sum_{\tup\in\tupset} d_\tup$ 
+ such that $c_{(d_1,\dots,d_n)}\ne 0$. % maximum sum of exponents, over all monomials in $\smbOf{\poly(\vct{X})}$.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 As an example, the degree of the polynomial $X^2+2XY^2+Y^2$ is $3$.
@ -41,13 +43,19 @@ or simply lineage polynomial), if there exists a $\raPlus$ query $\query$, \abbr


 %Following the typical representation of bags in production databases, for query inputs, we will use \abbrBPDB\xplural with multiplicities $\{0, 1\}$ (see \Cref{sec:gener-results-beyond} for more on this choice).
-\subsubsection{\abbrCTIDB\xplural and \abbrOneBIDB\xplural}
+\subsection{\abbrCTIDB\xplural and \abbrOneBIDB\xplural}
 \label{subsec:tidbs-and-bidbs}
-An \textit{incomplete database} $\Omega$ is a set of deterministic databases $\omega$ called possible worlds.
+%An \textit{incomplete database} $\Omega$ is a set of deterministic databases $\worldvec$ called possible worlds.

 \noindent\secrev{
-A \abbrCTIDB $\pdb$ is a pair $\inparen{\worlds, \bpd}$ such that $\worlds$ is an incomplete database whose set of possible worlds is the $c+1^\numvar$ tuple/multiplicity combinations across all $\tup\in\tupset$, where $\abs{\tupset} = \numvar$, $\tupset = \bigcup_{\worldvec\in\worlds,~\worldvec_{\tup}\geq 1}\tup$ is the set of possible tuples across possible worlds, and $\bpd$ is a probability distribution over $\worlds$.  
-
+A block independent database \abbrBIDB $\pdb'$ can viewed as a $1$-\abbrTIDB $\pdb$ with the added flexibility that each $\tup\in\tupset$ has multiple disjoint alternatives, i.e., all $\tup \in \tupset'$ are partitioned into $m$ independent blocks with the condition that tuples $\tup \in \block_i$ for $i \in \pbox{m}$ are disjoint events.  We define next a specific construction of \abbrBIDB that is useful for out work.
+\begin{Definition}[$1$-\abbrBIDB]\label{def:one-bidb}
+Define a $1$-\abbrBIDB to be the pair $\pdb' = \inparen{\prod_{\tup\in\tupset'}\inset{0, \bound_\tup}, \bpd'},$  where $\tupset'$ is the set of possible tuples such that each $\tup \in \tupset'$ has a multiplicity domain of $\inset{0, \bound_\tup}$, with $\bound_\tup \in \mathbb{N}$.  The term $\prod_{\tup\in\tupset'}$ is the direct product of all such multiplicity domain pairs.  The tuples $\tup\in\tupset'$ are further partitioned into $m$ independent blocks $\block_i,~i\in\pbox{m}$ of disjoint tuples.  $\bpd$ is the probability distribution across all worlds such that, given $\worldvec\in\prod_{\tup\in\tupset'}\inset{0,\bound_\tup},\tup,~\tup'\in\block_i~:~\probOf\pbox{\worldvec_\tup, \worldvec_\tup'>0} = 0$.
+\end{Definition}
+%A \abbrCTIDB $\pdb$ is a pair $\inparen{\worlds, \bpd}$ such that $\worlds$ is an incomplete database whose set of possible worlds is the $c+1^\numvar$ tuple/multiplicity combinations across all $\tup\in\tupset$, where $\abs{\tupset} = \numvar$, $\tupset = \bigcup_{\worldvec\in\worlds,~\worldvec_{\tup}\geq 1}\tup$ is the set of possible tuples across possible worlds, and $\bpd$ is a probability distribution over $\worlds$.  
+\begin{Definition}[$\bound$-Block Independent Disjoint Database ($\bound$-\abbrBIDB)]\label{def:bidb}
+A $\bound$-block independent database ($\bound$-\abbrBIDB) $\pdb' = \inparen{\inset{0,\ldots,\bound}^{\tupset'}, \bpd'}$ is a probabilistic database such that the all worlds set is encoded as the set of vectors $\worldvec\in\inset{0,\ldots,\bound}^{\abs{\tupset'}}$ where $\worldvec_\tup\leq\bound$ is the multiplicity for tuple $\tup$.  $\pdb'$ requires the set of all possible tuples $\tupset = \bigcup_{\worldvec\in\inset{0,\ldots, \bound}^{\tupset'},~\worldvec_\tup \geq 1}\tup$ to be partitioned into $m$ independent blocks $\block_i$ ($i\in\pbox{m}$) where all tuples $\tup_{i, j}\in \block_i$ are disjoint.  $\bpd'$ is the probability distribution where, for all $\worldvec\in\inset{0,\ldots,\bound}^{\tupset'}$ such that $\worldvec_{\tup_{i, j}},\worldvec_{\tup_{i, j'}}\neq 0, j\neq j'$ for any $\block_i$, $\probOf\pbox{\worldvec} = 0$, where all other $\worldvec$ has $0<\probOf\pbox{\worldvec}\leq 1$.%bpd'$ set with the all worlds set $\worlds$ and probability distribution $\bpd'$ such that $\tupset' = \bigcup_{\worldvec\in\worlds, \worldvec_\tup \geq 1}\tup$ is the set of all possible tuples for which all $\tup\in\tupset'$ can be partitioned into $\numedge$ blocks $\block_i$ where the set of tuples $\tup_j \in \block_i$ are all disjoint, while blocks $\block_i$ are independent of one another.  Each $\tup\in\tupset'$ has a multiplicity of at most $\bound$.  $\bpd'$ is the distribution such that for any $\worldvec\in\worlds$ with $\worldvec_{\tup_{i, j}}\geq 1$ and $\worldvec_{\tup_{i, j'}}\geq 1$, $j\neq j'$ in any $\block_i$ more than one tuple present from the same block $\block_i$ has probability $\probOf\pbox{\worldvec} = 0$.
+\end{Definition}
 A block independent database (\abbrBIDB) is a related probabilistic data model $\pdb=\inparen{\Omega, \bpd}$ such that the base set of tuples $\tupset = \bigcup_{\omega\in\Omega,~\tup\in\omega}\tup$ is partitioned into a set of $\numvar$ independent blocks $\inset{\inparen{\block_\tup}_{\tup\in\pbox{\numvar}}}$ such that the set of tuples $\inset{\inparen{\tup_j}_{j\in\pbox{\abs{\block}}}}$ in block $\block_\tup$ are disjoint from one another. This construction produces the set of possible worlds $\Omega$ that consists of all unique combinations of tuples in $\tupset$ with the constraint that for any $\omega\in\Omega$, no two tuples $\tup_j, \tup_{j'}, j\neq j'$ from the same block $\block_\tup$ exist together.  A $\bound$-\abbrBIDB has the further requirement that each block has a multiplicity of at most $c$.  We present a reduction that is useful in producing our results:

 \begin{Definition}[\abbrCTIDB reduction]\label{def:ctidb-reduct}