From b567466df7e4a485afc8d0c8f420407c2f0ac29f Mon Sep 17 00:00:00 2001 From: Boris Glavic Date: Sat, 18 Sep 2021 11:38:11 -0500 Subject: [PATCH 1/2] ua citation --- app_notation-background.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/app_notation-background.tex b/app_notation-background.tex index 1169db1..4aebff7 100644 --- a/app_notation-background.tex +++ b/app_notation-background.tex @@ -22,7 +22,7 @@ To justify the use of $\semNX$-databases, we need to show that we can encode any As mentioned above we will use $\semNX$-databases paired with a probability distribution as a representation system, referring to such databases as \abbrNXPDB\xplural. Given \abbrNXPDB $\pxdb$, one can think of the of $\pd$ as the probability distribution across all worlds $\inset{0, 1}^\numvar$. Denote a particular world to be $\vct{w}$. For convenience let $\assign_\vct{w}: \pxdb\rightarrow\pndb$ be a function that computes the corresponding $\semN$-\abbrPDB upon assigning all values $w_i \in \vct{w}$ to $X_i \in \vct{X}$ of $\db_{\semNX}$. Note the one-to-one correspondence between elements $\vct{w}\in\inset{0, 1}^\numvar$ to the worlds encoded by $\db_{\semNX}$ when $\vct{w}$ is assigned to $\vct{X}$ (assuming a domain of $\inset{0, 1}$ for each $X_i$). %and a probability distribution $\pd$ over assignments $\assign$ of the variables $\vct{X} = \{X_1, \ldots, X_\numvar\}$ occurring in annotations of $\idb_{\semNX}$ to $\{0,1\}$. \AH{There was a big ICDT reviewer complaint in this section, but I don't know that I think it confuses things to think of them both an assignment and/or a vector of variables.} -%Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. +%Note that an assignment $\assign: \vct{X} \to \{0,1\}^\numvar$ can be represented as a vector $\vct{w} \in \{0,1\}^n$ where $\vct{w}[i]$ records the value assigned to variable $X_i$. Thus, from now on we will solely use such vectors which we refer to as \emph{world vectors} and implicitly understand them to represent assignments. We can think of $\assign_\vct{w}(\pxdb)\inparen{\tup}$ as the semiring homomorphism $\semNX \to \semN$ that applies the assignment $\vct{w}$ to all variables $\vct{X}$ of a polynomial and evaluates the resulting expression in $\semN$. \BG{explain connection to homomorphism lifting in K-relations} @@ -56,7 +56,7 @@ Importantly, as the following proposition shows, any finite $\semN$-PDB can be e \begin{proof} To prove that \abbrNXPDB\xplural are complete consider the following construction that for any $\semN$-PDB $\pdb = (\idb, \pd)$ produces an \abbrNXPDB $\pxdb = (\db_{\semNX}, \pd')$ such that $\rmod(\pxdb) = \pdb$. Let $\idb = \{D_1, \ldots, D_{\abs{\idb}}\}.$ %and let $max(D_i)$ \AH{What are we using $max(D_i)$ for?} - %denote $max_{\tup} D_i(\tup)$. + %denote $max_{\tup} D_i(\tup)$. For each world $D_i$ we create a corresponding variable $X_i$. %variables $X_{i1}$, \ldots, $X_{im}$ where $m = max(D_i)$. In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial: @@ -66,7 +66,7 @@ In $\db_{\semNX}$ we assign each tuple $\tup$ the polynomial: \] The probability distribution $\pd'$ assigns all world vectors zero probability except for $\abs{\idb}$ world vectors (representing the possible worlds) $\vct{w}_i$. All elements of $\vct{w}_i$ are zero except for the position corresponding to variables $X_{i}$ which is set to $1$. Unfolding definitions it is trivial to show that $\rmod(\pxdb) = \pdb$. Thus, \abbrNXPDB\xplural are a complete representation system. -Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{w}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{w}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{w} := X = 0, Y=1$. We get $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$. % It is trivial to show that an assignment is a semiring homomorphism. +Since $\semNX$ is the free object in the variety of semirings, Birkhoff's HSP theorem~\cite{graetzer-08-un} implies that any assignment $\vct{X} \to \semN$, which includes as a special case the assignments $\assign_{\vct{w}}$ used here, uniquely extends to the semiring homomorphism alluded to above, $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}: \semNX \to \semN$. For a polynomial $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup}$ substitutes variables based on $\vct{w}$ and then evaluates the resulting expression in $\semN$. For instance, consider the polynomial $\pxdb\inparen{\tup} = \poly = X + Y$ and assignment $\vct{w} := X = 0, Y=1$. We get $\assign_\vct{w}\inparen{\pxdb}\inparen{\tup} = 0 + 1 = 1$. % It is trivial to show that an assignment is a semiring homomorphism. Closure under $\raPlus$ queries follows from this and from \cite{DBLP:conf/pods/GreenKT07}'s Proposition 3.5, which states that semiring homomorphisms commute with queries over $\semK$-relations. From 3d67dbbb1b53a99856f5c2f8ba544cc8469567ab Mon Sep 17 00:00:00 2001 From: Oliver Date: Sat, 18 Sep 2021 12:49:54 -0400 Subject: [PATCH 2/2] bounding depth --- appendix.tex | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/appendix.tex b/appendix.tex index fd2a3c4..8f5d3ab 100644 --- a/appendix.tex +++ b/appendix.tex @@ -37,8 +37,8 @@ We require that vertices have an in-degree of at most two. Note that we can construct circuits for \bis in time linear in the time required for deterministic query processing over a possible world of the \bi under the aforementioned assumption that $\abs{\pxdb} \leq c \cdot \abs{\db}$. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Circuit size vs. runtime} -\label{sec:circuit-runtime} + +\subsection{Modeling Circuit Construction} \newcommand{\bagdbof}{\textsc{bag}(\pxdb)} @@ -184,6 +184,32 @@ As in projection, newly created vertices will have an in-degree of $k$, and a fa There are $|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ such vertices, so the corrected circuit has $|V_{Q_1,\pxdb}|+\ldots+|V_{Q_k,\pxdb}|+(k-1)|{Q_1} \bowtie \ldots \bowtie {Q_k}|$ vertices. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsubsection{Bounding circuit depth} +\label{sec:circuit-depth} + +We first show that the depth of the circuit (\depth; \Cref{def:size-depth}) is bounded by the size of the query. Denote by $|\query|$ the number of relational operators in query $\query$. + +\begin{Proposition}[Circuit depth is bounded] +\label{prop:circuit-depth} +Let $\query$ be a relational query and $\dbbase$ be a \dbbaseName. There exists a (lineage) circuit $\circuit$ encoding the lineage of all tuples $\tup \in \query(\dbbase)$ for which +$\depth(\circuit) \leq O_k(|\query|\log(n))$ +\end{Proposition} + +\begin{proof} +We show that the bound of \Cref{prop:circuit-depth} holds for the circuit constructed by \Cref{alg:lc}. +First, observe that \Cref{alg:lc} is invoked exactly once for every relational operator or base relation in $\query$; It thus suffices to show that an invocation \Cref{alg:lc} adds at most $O_k(\log(n))$ to the depth of any circuit produced by a recursive invocation. +Second, observe that modulo the logarithmic fan-in of the projection and join cases, the depth of the output is at most one greater than the depth of any input. +For the join case, the number of in-edges can be no greater than the join width, which itself is bounded by $k$. The depth thus increases by at most a constant factor of $\lceil \log(k) \rceil = O_k(1)$. +For the projection case, observe that the fan-in is bounded by $|\query'(\dbbase)|$, which is in turn bounded by $n^k$. The depth increase for any projection node is thus at most $\lceil \log(n^k)\rceil = O(k\log(n)) = O_k(\log(n))$. +\qed +\end{proof} + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsubsection{Circuit size vs. runtime} +\label{sec:circuit-runtime} + \begin{Lemma}\label{lem:circ-model-runtime} \label{lem:circuits-model-runtime} Given a \abbrNXPDB $\pxdb$ with \dbbaseName $\dbbase$, and query plan $Q$, the runtime of $Q$ over $\dbbase$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq (k-1)\qruntime{Q, \dbbase}+1$, where $k$ is the maximal degree of any polynomial in $Q(\pxdb)$. @@ -248,6 +274,9 @@ The property holds for all recursive queries, and the proof holds. \qed \end{proof} +\subsubsection{Runtime of \abbrStepOne} +\label{sec:lc-runtime} + We next need to show that we can construct the circuit in time linear in the deterministic runtime. \begin{lemma}\label{lem:tlc-is-the-same-as-det} Given a query $\query$ over a \dbbaseName $\dbbase$, the runtime $\timeOf{\abbrStepOne}(\query,\dbbase,\circuit) \le O(\qruntime{\query, \dbbase})$