Done with pass on App A

2021-09-20 23:27:03 -04:00 · 2021-09-20 23:27:03 -04:00 · 1d3a255539
parent 14cd63ac12
commit 1d3a255539
2 changed files with 6 additions and 6 deletions
--- a/app_set_to_bag_pdb.tex
+++ b/app_set_to_bag_pdb.tex
@ -5,23 +5,23 @@
 \subsection{\abbrTIDB{}s}
 \label{sec:abbrtidbs}

-For results for \abbrTIDBs, we assumed a model of \abbrTIDBs where each input tuple is assigned a probability $p$ of having multiplicity $1$. That is, we assumed inputs to be sets, but interpret queries under bag semantics. Other sensible interpretations of what the generalization of \abbrTIDBs from sets to bags should be exist.
+In our definition of \abbrTIDBs (\Cref{subsec:tidbs-and-bidbs}), we assumed a model of \abbrTIDBs where each input tuple is assigned a probability $p$ of having multiplicity $1$. That is, we assumed inputs to be sets, but interpret queries under bag semantics. Other sensible generalizations of \abbrTIDBs from set semantics to bag semantics also exist.

-One important such generalization is to assign each input tuple $\tup$ a multiplicity $m_\tup$ and probability $p$: the tuple has probability $p$ to exists with multiplicity $m_\tup$, and otherwise has multiplicity $0$. If the maximal multiplicity of all tuples in the \abbrTIDB is bound by some constant, then a generalization of our hardness results and approximation algorithm can be achieved by changing the construction of lineage polynomials as follows:
+One very natural such generalization is to assign each input tuple $\tup$ a multiplicity $m_\tup$ and probability $p$: the tuple has probability $p$ to exists with multiplicity $m_\tup$, and otherwise has multiplicity $0$. If the maximal multiplicity of all input tuples in the \abbrTIDB is bounded by some constant, then a generalization of our hardness results and approximation algorithm can be achieved by changing the construction of lineage polynomials (in \Cref{fig:nxDBSemantics}) as follows (all other cases remain the same as in \cref{fig:nxDBSemantics}):

 \begin{align*}
  \polyqdt{\rel}{\dbbase}{\tup} =&\begin{cases}
                                           		m_\tup X_\tup & \text{if }\dbbase.\rel\inparen{\tup} = m_\tup \\
                                           		0		 &\text{otherwise.}\end{cases}
 \end{align*}
-That is the variable representing a tuple is multiplied by $m_\tup$ to encode the tuple's multiplicity $m_\tup$.
+That is the variable representing a tuple is multiplied by $m_\tup$ to encode the tuple's multiplicity $m_\tup$. We note that our lower bounds still hold for this model since we only need $m_\tup=1$ for all tuples $\tup$. Further, it can be argued that our proofs (as is) for approximation algorithms also work for this model. The only change is that since we now allow $m_\tup>1$ some of the constants in the runtime analysis of our algorithms change but the overall asymptotic runtime bound remains the same.

-Yet another option would be to assign each tuple a probability distribution over multiplicities. It seems clear that our results would not extend to a model that allows arbitrary probability distributions for this purpose. However, we would like to note that the special case of a normal distribution over multiplicities can be handled as follows: we add an additional identifier attribute to each relation in the database. For a tuple $\tup$ with  maximal multiplicity  $m_\tup$, we create $m_\tup$ copies of $\tup$ with different identifiers. To answer a query over this encoding, we first project away the identifier attribute.
+Yet another option would be to assign each tuple a probability distribution over multiplicities. It seems very unlikely that our results would extend to a model that allows arbitrary probability distributions over multiplicities (our current proof techniques definitely break down). However, we would like to note that the special case of a Poisson binomial distribution (sum of independent but not necessarily identical Bernoulli trials) over multiplicities can be handled as follows: we add an additional identifier attribute to each relation in the database. For a tuple $\tup$ with  maximal multiplicity  $m_\tup$, we create $m_\tup$ copies of $\tup$ with different identifiers. To answer a query over this encoding, we first project away the identifier attribute (note that as per \Cref{fig:nxDBSemantics}, in $\poly$ this would add up all the variables corresponding to the same tuple $\tup$).

 \subsection{\abbrBIDB{}s}
 \label{sec:abbrbidbs}

-The approach described above works for \abbrBIDB\xplural as well if we define the bag version of \abbrBIDB{}s to associate each tuple $\tup$  a multiplicity $m_\tup$. Recall that we associate each tuple in a block with a unique variable. Thus, the modified lineage polynomial construction shown above can be applied for \abbrBIDB{}s too.
+The approach described above works for \abbrBIDB\xplural as well if we define the bag version of \abbrBIDB{}s to associate each tuple $\tup$  a multiplicity $m_\tup$. Recall that we associate each tuple in a block with a unique variable. Thus, the modified lineage polynomial construction shown above can be applied for \abbrBIDB{}s too (and our approximation results also hold).



--- a/appendix.tex
+++ b/appendix.tex
@ -331,7 +331,7 @@ Respectively, these are: \\
 %\end{proof}

 \section{Higher Moments}
-%\label{sec:momemts}
+\label{sec:momemts}
 %
 We make a simple observation to conclude the presentation of our results.
 So far we have only focused on the expectation of $\poly$.