paper-BagRelationalPDBsAreHard/app_set-to-bag-pdb.tex

33 lines
3.2 KiB
TeX
Raw Normal View History

\onecolumn
2021-09-20 09:54:17 -04:00
\section{Generalizing Beyond Set Inputs}
2021-09-20 09:24:56 -04:00
\label{sec:gener-results-beyond}
2021-09-20 09:54:17 -04:00
\subsection{\abbrTIDB{}s}
\label{sec:abbrtidbs}
2021-09-20 23:27:03 -04:00
In our definition of \abbrTIDBs (\Cref{subsec:tidbs-and-bidbs}), we assumed a model of \abbrTIDBs where each input tuple is assigned a probability $p$ of having multiplicity $1$. That is, we assumed inputs to be sets, but interpret queries under bag semantics. Other sensible generalizations of \abbrTIDBs from set semantics to bag semantics also exist.
2021-09-20 09:54:17 -04:00
2021-09-20 23:27:03 -04:00
One very natural such generalization is to assign each input tuple $\tup$ a multiplicity $m_\tup$ and probability $p$: the tuple has probability $p$ to exists with multiplicity $m_\tup$, and otherwise has multiplicity $0$. If the maximal multiplicity of all input tuples in the \abbrTIDB is bounded by some constant, then a generalization of our hardness results and approximation algorithm can be achieved by changing the construction of lineage polynomials (in \Cref{fig:nxDBSemantics}) as follows (all other cases remain the same as in \cref{fig:nxDBSemantics}):
2021-09-20 09:54:17 -04:00
\begin{align*}
\polyqdt{\rel}{\dbbase}{\tup} =&\begin{cases}
m_\tup X_\tup & \text{if }\dbbase.\rel\inparen{\tup} = m_\tup \\
0 &\text{otherwise.}\end{cases}
\end{align*}
2021-09-20 23:27:03 -04:00
That is the variable representing a tuple is multiplied by $m_\tup$ to encode the tuple's multiplicity $m_\tup$. We note that our lower bounds still hold for this model since we only need $m_\tup=1$ for all tuples $\tup$. Further, it can be argued that our proofs (as is) for approximation algorithms also work for this model. The only change is that since we now allow $m_\tup>1$ some of the constants in the runtime analysis of our algorithms change but the overall asymptotic runtime bound remains the same.
2021-09-20 09:54:17 -04:00
2021-09-20 23:27:03 -04:00
Yet another option would be to assign each tuple a probability distribution over multiplicities. It seems very unlikely that our results would extend to a model that allows arbitrary probability distributions over multiplicities (our current proof techniques definitely break down). However, we would like to note that the special case of a Poisson binomial distribution (sum of independent but not necessarily identical Bernoulli trials) over multiplicities can be handled as follows: we add an additional identifier attribute to each relation in the database. For a tuple $\tup$ with maximal multiplicity $m_\tup$, we create $m_\tup$ copies of $\tup$ with different identifiers. To answer a query over this encoding, we first project away the identifier attribute (note that as per \Cref{fig:nxDBSemantics}, in $\poly$ this would add up all the variables corresponding to the same tuple $\tup$).
2021-09-20 09:54:17 -04:00
\subsection{\abbrBIDB{}s}
\label{sec:abbrbidbs}
2021-09-20 23:27:03 -04:00
The approach described above works for \abbrBIDB\xplural as well if we define the bag version of \abbrBIDB{}s to associate each tuple $\tup$ a multiplicity $m_\tup$. Recall that we associate each tuple in a block with a unique variable. Thus, the modified lineage polynomial construction shown above can be applied for \abbrBIDB{}s too (and our approximation results also hold).
2021-09-20 09:54:17 -04:00
2021-09-20 09:24:56 -04:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: