Update on Overleaf.

master
Atri Rudra 2022-06-04 02:08:17 +00:00 committed by node
parent 62d3856d4c
commit 5cd5940b29
4 changed files with 47 additions and 41 deletions

View File

@ -19,9 +19,9 @@ The term $\prod_{\tup\in S} X_\tup^{d_\tup}$ in \Cref{eq:sop-form} is a {\em mon
Unless othewise noted, we consider all polynomials to be in \abbrSMB representation.
When it is unclear, we use $\smbOf{\genpoly}$
to denote the \abbrSMB form of a polynomial $\genpoly~\inparen{\poly}$.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
We call a polynomial $\poly\inparen{\vct{X}}$ a \emph{\abbrCTIDB-lineage polynomial} (or simply lineage polynomial), if it is clear from context that there exists an $\raPlus$ query $\query$, \abbrCTIDB $\pdb$, and result tuple $\tup$ such that $\poly\inparen{\vct{X}} = \apolyqdt\inparen{\vct{X}}.$
\subsection{\abbrOneBIDB}\label{subsec:one-bidb}
\label{subsec:tidbs-and-bidbs}
@ -36,8 +36,7 @@ Define a \emph{\abbrOneBIDB} to be the pair $\pdb' = \inparen{\bigtimes_{\tup\in
\end{cases}$
\noindent$\bpd'$ is the probability distribution across all worlds such that, given $W\in\bigtimes_{\tup \in \tupset'}\inset{0,\bound_\tup}$, $\probOf\pbox{\worldvec = W} = \prod_{\tup\in\tupset'}\prob_{\tup}(W)$.
\footnote{
We slightly abuse notation here, denoting a world vector as $W$ rather than $\worldvec$ to distinguish between the random variable and the world instance. When there is no ambiguity, we will denote a world vector as $\worldvec$.}
%\footnote{We slightly abuse notation here, denoting a world vector as $W$ rather than $\worldvec$ to distinguish between the random variable and the world instance. When there is no ambiguity, we will denote a world vector as $\worldvec$.}
\end{Definition}
Lineage polynomials for arbitrary deterministic $\gentupset'$ are constructed in a manner analogous to $1$-\abbrTIDB\xplural (see \Cref{fig:nxDBSemantics}), differing only in the base case.

View File

@ -101,7 +101,7 @@ $\Omega\inparen{\inparen{\qruntime{\optquery{\qhard}, \tupset, \bound}}^{c_0\cdo
\hline
\end{tabular}
\savecaptionspace{
\caption{Our lower bounds for $\qhard$ parameterized by $k$ $\inparen{\Cref{sec:hard:sub:pre}}$ over $1$-TIDB $\pdb$. % = \inset{\worlds, \bpd}$.
\caption{Our lower bounds for $\qhard$ parameterized by $k$ over $1$-TIDB $\pdb$. % = \inset{\worlds, \bpd}$.
Those with `Multiple' in the second column need the algorithm to be able to handle multiple $\bpd$. See~\Cref{sec:hard} for further details.}%, i.e. probability distributions (for a given $\tupset$). The last column states the hardness assumptions that imply the lower bounds in the first column ($\eps_o,C_0,c_0$ are constants that are independent of $k$).}
\label{tab:lbs}
}{0cm}{-0.73cm}
@ -141,11 +141,11 @@ Further, our approximation algorithm works for a more general notion of bag \abb
\subsection{Polynomial Equivalence}\label{sec:intro-poly-equiv}
A common encoding of probabilistic databases (e.g., in \cite{IL84a,4497507,DBLP:conf/vldb/AgrawalBSHNSW06} and many others) annotates tuples with lineages, propositional formulas that describe the set of possible worlds that the tuple appears in. The bag semantics analog is a provenance/lineage polynomial (see~\Cref{fig:nxDBSemantics}) $\apolyqdt$~\cite{DBLP:conf/pods/GreenKT07}, a polynomial with non-zero integer coefficients and exponents, over variables $\vct{X}$ encoding input tuple multiplicities. The lineage polynomial for result tuple $t_{out}$ evaluates to $t_{out}$'s multiplicity in a given possible world when each $X_{t_{in}}$ is replaced by the multiplicity of $t_{in}$ in the possible world.
We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion. We now specify the problem of computing the expectation of tuple multiplicity in the language of lineage polynomials (which is equivalent to \Cref{prob:bag-pdb-poly-expected}-- see \Cref{prop:expection-of-polynom}):
We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion. We now state the problem of computing the expectation of tuple multiplicity in terms of lineage polynomials (which is equivalent to \Cref{prob:bag-pdb-poly-expected}-- see \Cref{prop:expection-of-polynom}):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Problem}[Expected Multiplicity of Lineage Polynomials]\label{prob:bag-pdb-poly-expected}
Given an $\raPlus$ query $\query$, \abbrCTIDB $\pdb$ and result tuple $\tup$, compute the expected
multiplicity of the polynomial $\poly$ (i.e.,
Given an $\raPlus$ query $\query$, \abbrCTIDB $\pdb$ and result tuple $\tup$,
%compute the expected multiplicity of the polynomial $\poly$ (i.e.,
%for $\worldvec\in\worlds$,
compute $\expct_{\vct{W}\sim \pdassign}\pbox{\poly\inparen{\worldvec}}$).
\end{Problem}
@ -265,13 +265,13 @@ We will formalize the notions of circuits and hence, \Cref{prob:intro-stmt} in \
This is illustrated in the following example using $\query_1^2$ from earlier. To aid in presentation we again limit our focus to $\refpoly{1, }^{\inparen{ABU}^2}$, assume $\bound = 2$ for variable $U$ and $\bound = 1$ for all other variables. Let $\prob_A$ denote $\probOf\pbox{A = 1}$.
%In computing $\rpoly$, we have some cancellations to deal with:
Then we have:
\begin{footnotesize}
\begin{equation*}
\refpoly{1, }^{\inparen{ABU}^2}\inparen{\vct{X}} = A^2\inparen{U_1^2 + 4U_1U_2 + 4U_2^2}B^2 =A^2U_1^2B^2 + 4A^2U_1U_2B^2+4A^2U_2^2B^2
%
%\begin{footnotesize}
%\begin{equation*}
$\refpoly{1, }^{\inparen{ABU}^2}\inparen{\vct{X}} = A^2\inparen{U_1^2 + 4U_1U_2 + 4U_2^2}B^2 =A^2U_1^2B^2 + 4A^2U_1U_2B^2+4A^2U_2^2B^2$
%&\qquad+ 2AX_2B^2YE + 2AX_1B^2ZC + 2AX_2B^2ZC + 2B^2YEZC\\
\end{equation*}
\end{footnotesize}
%\end{equation*}
%\end{footnotesize}
Recall that
%\begin{footnotesize}
%\begin{equation*}

View File

@ -7,9 +7,9 @@ Note that this implies hardness for \abbrCTIDB\xplural $\inparen{\bound\geq1}$
%; \Cref{prob:bag-pdb-poly-expected} cannot be done in $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$ runtime. The results also apply to
as well as \abbrOneBIDB. % and other \abbrPDB\xplural.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preliminaries}\label{sec:hard:sub:pre}
%\subsection{Preliminaries}\label{sec:hard:sub:pre}
Our hardness results are based on (exactly) counting the number of (not necessarily induced) subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity. We can think of $H$ as being of constant size and $G$ as growing.
In particular, we will consider computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). We use $\kmatchtime$ to denote the optimal runtime of computing $\numocc{G}{\kmatch}$ exactly. Our hardness results in \Cref{sec:multiple-p} are based on the following known (conditional) hardness results:
In particular, we will consider computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). We use $\kmatchtime$ to denote the optimal runtime of computing $\numocc{G}{\kmatch}$ exactly. Our results in \Cref{sec:multiple-p} are based on the following known (conditional) hardness results:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}[\cite{k-match}]
@ -59,28 +59,30 @@ For any graph $G=(V,\edgeSet)$ and $\kElem\ge 1$, define
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\noindent Returning to \Cref{fig:two-step}, it can be seen that $\poly_{G}^\kElem(\vct{X})$ is the lineage polynomial from query $\qhard^k$, which we define next. Recall $\query_1$ from~\Cref{sec:intro}, which is $\qhard^1$.
\noindent Returning to \Cref{fig:two-step}, it can be seen that $\poly_{G}^\kElem(\vct{X})$ is the lineage polynomial from query $\qhard^k$, which we define next.
%Let us alias
%\begin{lstlisting}
%SELECT DISTINCT 1 FROM T $t_1$, R r, T $t_2$
%WHERE $t_1$.Point = r.Point$_1$ AND $t_2$.Point = %r.Point$_2$
%\end{lstlisting}
%as $Q^1$.
The query $\qhard^k$ then becomes
%The query $\qhard^k$ then becomes
\mdfdefinestyle{underbrace}{topline=false, rightline=false, bottomline=false, leftline=false, backgroundcolor=black!15!white, innerbottommargin=0pt}
\begin{mdframed}[style=underbrace]
\begin{lstlisting}
SELECT COUNT(*) FROM $\underbrace{Q_1\text{ JOIN }Q_1\text{ JOIN}\cdots\text{JOIN }Q_1}_{k\rm\ times}$
\end{lstlisting}
\end{mdframed}
\noindent %Consider again the \abbrCTIDB instance $\pdb$ of~\Cref{fig:two-step} and, for our hard instance, let $\bound = 1$. $\pdb$ generalizes to one compatible
In the above, $\query_1$ is defined in \Cref{sec:intro}, which is the same as $\qhard^1$.
%
%\noindent %Consider again the \abbrCTIDB instance $\pdb$ of~\Cref{fig:two-step} and, for our hard instance, let $\bound = 1$. $\pdb$ generalizes to one compatible
We next define the instances for $T$ and $R$ that lead to the lineage polynomial in~\Cref{def:qk} as follows. Relation $T$ has $n$ tuples corresponding to each vertex for $i$ in $[n]$, each with probability $\prob$ and $R$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $R$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $R$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
In other words, this instance $\tupset$ contains the set of $\numvar$ unary tuples in $T$ (which corresponds to $\vset$) and $\numedge$ binary tuples in $R$ (which corresponds to $\edgeSet$).
Note that this implies that $\poly_{G}^\kElem$ is indeed a $1$-\abbrTIDB lineage polynomial.
Next, we note that the runtime for answering $\qhard^k$ on deterministic database $\tupset$, as defined above, is $O_k\inparen{\numedge}$ (i.e. deterministic query processing is `easy' for this query):
\begin{Lemma}\label{lem:tdet-om}
For $\qhard^k,\tupset$ defined as above,
For $\qhard^k,\tupset$ as above,
$\qruntimenoopt{\qhard^k, \tupset, \bound}$ is $O_k\inparen{\numedge}$.
\end{Lemma}

View File

@ -3,7 +3,7 @@
\subsection{Formalizing \Cref{prob:intro-stmt}}\label{sec:expression-trees}
We focus on the problem of computing $\expct_{\worldvec\sim\pdassign}\pbox{\poly\inparen{\vct{\randWorld}}}$ from now on.%, assume implicit $\query, \tupset, \tup$, and drop them from $\apolyqdt$ (i.e., $\poly\inparen{\vct{X}}$ will denote a polynomial).
%
\Cref{prob:intro-stmt} asks if there exists a linear time approximation algorithm in the size of a given circuit \circuit which encodes $\poly\inparen{\vct{X}}$. Recall that in this work we
represent lineage polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way. Since we are specifically using circuits to model lineage polynomials, we can refer to these circuits as lineage circuits. However, when the meaning is clear, we will drop the term lineage and only refer to them as circuits.
@ -14,28 +14,10 @@ A circuit $\circuit$ is a Directed Acyclic Graph (DAG) with source gates (in deg
Each gate has the following members: \type, \vari{input}, %\val,
\vpartial, \degval, \vari{Lweight}, and \vari{Rweight}, where \type is the value type $\{\circplus, \circmult, \var, \tnum\}$ and \vari{input} the list of inputs. Source gates have an extra member \val for the value. $\circuit_\linput$ ($\circuit_\rinput$) denotes the left (right) input of \circuit.
\end{Definition}
We refer to the structure when the underlying DAG is a tree (with edges pointing towards the root) as an expression tree \etree. The circuits $\inparen{1}$ and $\inparen{2}$ in column $\poly$ of \Cref{fig:two-step} are both expression trees. %encode their respective polynomials in column $\poly$.
Members not described in~\Cref{def:circuit} are defined and used in the appendix proofs. %In such a case, the root of \etree is analogous to the sink of \circuit. The fields \vari{partial}, \degval, \vari{Lweight}, and \vari{Rweight} are used in the proofs of \Cref{sec:proofs-approx-alg}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Note that the ciricuit \circuit representing $AX$ and the circuit \circuit' representing $B\inparen{Y+Z}$ each encode a tree, with edges pointing towards the root.
The function $\polyf\inparen{\cdot}$ (\Cref{def:poly-func}) maps a circuit to its corresponding polynomial. We next define its inverse:
%of the function $\polyf(\cdot)$.% (\Cref{def:poly-func}).%, which maps a circuit to the polynomial it encodes.
\begin{Definition}[Circuit Set]\label{def:circuit-set}
$\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \polyX$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Cref{fig:circuit} depicts a circuit \circuit in $\circuitset{2X^2+3XY-2Y^2}$. Light-text annotations denote the computation of $\abs{\circuit}\inparen{1, \ldots, 1}$ which we introduce in~\Cref{sec:algo}.%One can think of $\circuitset{\polyX}$ as the infinite set of circuits where for each element \circuit, $\polyf\inparen{\circuit} = \polyX$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\medskip
\noindent We are now ready to formally state the final version of \Cref{prob:intro-stmt}.%our \textbf{main problem}.
\begin{wrapfigure}{r}{0.3\textwidth}
\begin{wrapfigure}{r}{0.2\textwidth}
%\begin{figure}[t!]
\centering
%see https://tex.stackexchange.com/questions/26846/how-to-scale-a-tikzpicture-including-texts#26852
@ -102,6 +84,29 @@ $\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $
%\vspace{-0.58cm}
\end{wrapfigure}
We refer to the structure when the underlying DAG is a tree (with edges pointing towards the root) as an expression tree \etree. The circuits $\inparen{1}$ and $\inparen{2}$ in column $\poly$ of \Cref{fig:two-step} are both expression trees. %encode their respective polynomials in column $\poly$.
Members not described in~\Cref{def:circuit} are defined and used in the appendix. %In such a case, the root of \etree is analogous to the sink of \circuit. The fields \vari{partial}, \degval, \vari{Lweight}, and \vari{Rweight} are used in the proofs of \Cref{sec:proofs-approx-alg}.
%Note that the ciricuit \circuit representing $AX$ and the circuit \circuit' representing $B\inparen{Y+Z}$ each encode a tree, with edges pointing towards the root.
The function $\polyf\inparen{\cdot}$ (\Cref{def:poly-func}) maps a circuit to its corresponding polynomial. We next define its inverse:
%of the function $\polyf(\cdot)$.% (\Cref{def:poly-func}).%, which maps a circuit to the polynomial it encodes.
\begin{Definition}[Circuit Set]\label{def:circuit-set}
$\circuitset{\polyX}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \polyX$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Cref{fig:circuit} depicts a circuit \circuit in $\circuitset{2X^2+3XY-2Y^2}$. Light-text annotations
%denote the computation of $\abs{\circuit}\inparen{1, \ldots, 1}$ which we introduce
can be ignored until~\Cref{sec:algo}.%One can think of $\circuitset{\polyX}$ as the infinite set of circuits where for each element \circuit, $\polyf\inparen{\circuit} = \polyX$.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\medskip
\noindent We are now ready to formally state the final version of \Cref{prob:intro-stmt}.%our \textbf{main problem}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%