set to bag

2021-09-20 08:54:17 -05:00 · 2021-09-20 08:54:17 -05:00 · b1ffd39f78
parent 626e86536f
commit b1ffd39f78
2 changed files with 33 additions and 13 deletions
--- a/app_set_to_bag_pdb.tex
+++ b/app_set_to_bag_pdb.tex
@ -1,8 +1,28 @@

-\section{Generalizing Results Beyond set-TIDBs}
+\section{Generalizing Beyond Set Inputs}
 \label{sec:gener-results-beyond}

-For results for \abbrTIDBs, we assumed a model of \abbrTIDBs where each input tuple is assigned a probability of having multiplicity $1$.
+\subsection{\abbrTIDB{}s}
+\label{sec:abbrtidbs}
+
+For results for \abbrTIDBs, we assumed a model of \abbrTIDBs where each input tuple is assigned a probability $p$ of having multiplicity $1$. That is, we assumed inputs to be sets, but interpret queries under bag semantics. Other sensible interpretations of what the generalization of \abbrTIDBs from sets to bags should be exist.
+
+One important such generalization is to assign each input tuple $\tup$ a multiplicity $m_\tup$ and probability $p$: the tuple has probability $p$ to exists with multiplicity $m_\tup$, and otherwise has multiplicity $0$. If the maximal multiplicity of all tuples in the \abbrTIDB is bound by some constant, then a generalization of our hardness results and approximation algorithm can be achieved by changing the construction of lineage polynomials as follows:
+
+\begin{align*}
+  \polyqdt{\rel}{\dbbase}{\tup} =&\begin{cases}
+                                           		m_\tup X_\tup & \text{if }\dbbase.\rel\inparen{\tup} = m_\tup \\
+                                           		0		 &\text{otherwise.}\end{cases}
+\end{align*}
+That is the variable representing a tuple is multiplied by $m_\tup$ to encode the tuple's multiplicity $m_\tup$.
+
+Yet another option would be to assign each tuple a probability distribution over multiplicities. It seems clear that our results would not extend to a model that allows arbitrary probability distributions for this purpose. However, we would like to note that the special case of a normal distribution over multiplicities can be handled as follows: we add an additional identifier attribute to each relation in the database. For a tuple $\tup$ with  maximal multiplicity  $m_\tup$, we create $m_\tup$ copies of $\tup$ with different identifiers. To answer a query over this encoding, we first project away the identifier attribute.
+
+\subsection{\abbrBIDB{}s}
+\label{sec:abbrbidbs}
+
+The approach described above works for \abbrBIBD{}s as well if we define the bag version of \abbrBIDB{}s to associate each tuple $\tup$  a multiplicity $m_\tup$. Recall that we associate each tuple in a block with a unique variable. Thus, the modified lineage polynomial construction shown above can be applied for \abbrBIDB{}s too.
+


 %%% Local Variables:
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -85,11 +85,11 @@ In this work, we study the complexity of \Cref{prob:bag-pdb-poly-expected} for s

 \mypar{\abbrTIDB\xplural}
 We initially focus on tuple-independent probabilistic bag-databases\footnote{See \cite{DBLP:series/synthesis/2011Suciu} for a survey of set-\abbrTIDBs; the bag encoding is analogous~\cite{DBLP:conf/pods/GreenKT07}.} (\abbrTIDB\xplural), a compressed encoding of probabilistic databases where the presence of each individual tuple (out of a total of $\numvar$ input tuples) in a possible world is modeled as an independent probabilistic event.\footnote{
-  This model is exactly the definition of \abbrTIDB{}s \cite{VS17} under classical set semantics.
-  Mirroring the implementation of bag relations in production database systems (e.g., Postgresql, DB2), tuple multiplicities are modeled by retaining copies of each tuple (up to its largest possible multiplicity).
-  % To make each duplicate tuple unique in a set-\abbrTIDB we can assign unique keys across all duplicates.
-  When the multiplicity of input tuple is bound by some constant,
-  the increased input size is negligible.\label{footnote:set-not-limit}
+  This model is exactly the definition of \abbrTIDB{}s \cite{VS17} under set semantics. Note that this is only one possible definition of \abbrTIDB{}s under bag semantics. In \Cref{sec:gener-results-beyond} we discuss alternatives and to what degree our results extend to these alternatives.
+  % Mirroring the implementation of bag relations in production database systems (e.g., Postgresql, DB2), tuple multiplicities are modeled by retaining copies of each tuple (up to its largest possible multiplicity).
+  % % To make each duplicate tuple unique in a set-\abbrTIDB we can assign unique keys across all duplicates.
+  % When the multiplicity of input tuple is bound by some constant,
+  % the increased input size is negligible.\label{footnote:set-not-limit}
 }
 % OK: I tidied things up a touch.
 %\BG{The footnote is still a bit hard to follow I think, but I do not have a great suggestion on how to improve it.}