set to bag
This commit is contained in:
parent
626e86536f
commit
b1ffd39f78
|
@ -1,8 +1,28 @@
|
|||
|
||||
\section{Generalizing Results Beyond set-TIDBs}
|
||||
\section{Generalizing Beyond Set Inputs}
|
||||
\label{sec:gener-results-beyond}
|
||||
|
||||
For results for \abbrTIDBs, we assumed a model of \abbrTIDBs where each input tuple is assigned a probability of having multiplicity $1$.
|
||||
\subsection{\abbrTIDB{}s}
|
||||
\label{sec:abbrtidbs}
|
||||
|
||||
For results for \abbrTIDBs, we assumed a model of \abbrTIDBs where each input tuple is assigned a probability $p$ of having multiplicity $1$. That is, we assumed inputs to be sets, but interpret queries under bag semantics. Other sensible interpretations of what the generalization of \abbrTIDBs from sets to bags should be exist.
|
||||
|
||||
One important such generalization is to assign each input tuple $\tup$ a multiplicity $m_\tup$ and probability $p$: the tuple has probability $p$ to exists with multiplicity $m_\tup$, and otherwise has multiplicity $0$. If the maximal multiplicity of all tuples in the \abbrTIDB is bound by some constant, then a generalization of our hardness results and approximation algorithm can be achieved by changing the construction of lineage polynomials as follows:
|
||||
|
||||
\begin{align*}
|
||||
\polyqdt{\rel}{\dbbase}{\tup} =&\begin{cases}
|
||||
m_\tup X_\tup & \text{if }\dbbase.\rel\inparen{\tup} = m_\tup \\
|
||||
0 &\text{otherwise.}\end{cases}
|
||||
\end{align*}
|
||||
That is the variable representing a tuple is multiplied by $m_\tup$ to encode the tuple's multiplicity $m_\tup$.
|
||||
|
||||
Yet another option would be to assign each tuple a probability distribution over multiplicities. It seems clear that our results would not extend to a model that allows arbitrary probability distributions for this purpose. However, we would like to note that the special case of a normal distribution over multiplicities can be handled as follows: we add an additional identifier attribute to each relation in the database. For a tuple $\tup$ with maximal multiplicity $m_\tup$, we create $m_\tup$ copies of $\tup$ with different identifiers. To answer a query over this encoding, we first project away the identifier attribute.
|
||||
|
||||
\subsection{\abbrBIDB{}s}
|
||||
\label{sec:abbrbidbs}
|
||||
|
||||
The approach described above works for \abbrBIBD{}s as well if we define the bag version of \abbrBIDB{}s to associate each tuple $\tup$ a multiplicity $m_\tup$. Recall that we associate each tuple in a block with a unique variable. Thus, the modified lineage polynomial construction shown above can be applied for \abbrBIDB{}s too.
|
||||
|
||||
|
||||
|
||||
%%% Local Variables:
|
||||
|
|
|
@ -85,11 +85,11 @@ In this work, we study the complexity of \Cref{prob:bag-pdb-poly-expected} for s
|
|||
|
||||
\mypar{\abbrTIDB\xplural}
|
||||
We initially focus on tuple-independent probabilistic bag-databases\footnote{See \cite{DBLP:series/synthesis/2011Suciu} for a survey of set-\abbrTIDBs; the bag encoding is analogous~\cite{DBLP:conf/pods/GreenKT07}.} (\abbrTIDB\xplural), a compressed encoding of probabilistic databases where the presence of each individual tuple (out of a total of $\numvar$ input tuples) in a possible world is modeled as an independent probabilistic event.\footnote{
|
||||
This model is exactly the definition of \abbrTIDB{}s \cite{VS17} under classical set semantics.
|
||||
Mirroring the implementation of bag relations in production database systems (e.g., Postgresql, DB2), tuple multiplicities are modeled by retaining copies of each tuple (up to its largest possible multiplicity).
|
||||
% To make each duplicate tuple unique in a set-\abbrTIDB we can assign unique keys across all duplicates.
|
||||
When the multiplicity of input tuple is bound by some constant,
|
||||
the increased input size is negligible.\label{footnote:set-not-limit}
|
||||
This model is exactly the definition of \abbrTIDB{}s \cite{VS17} under set semantics. Note that this is only one possible definition of \abbrTIDB{}s under bag semantics. In \Cref{sec:gener-results-beyond} we discuss alternatives and to what degree our results extend to these alternatives.
|
||||
% Mirroring the implementation of bag relations in production database systems (e.g., Postgresql, DB2), tuple multiplicities are modeled by retaining copies of each tuple (up to its largest possible multiplicity).
|
||||
% % To make each duplicate tuple unique in a set-\abbrTIDB we can assign unique keys across all duplicates.
|
||||
% When the multiplicity of input tuple is bound by some constant,
|
||||
% the increased input size is negligible.\label{footnote:set-not-limit}
|
||||
}
|
||||
% OK: I tidied things up a touch.
|
||||
%\BG{The footnote is still a bit hard to follow I think, but I do not have a great suggestion on how to improve it.}
|
||||
|
|
Loading…
Reference in a new issue