Minor tweaks on appendix D.

master
Aaron Huber 2022-05-11 10:10:51 -04:00
parent 00dc258028
commit dbb14420db
7 changed files with 29 additions and 27 deletions

View File

@ -59,7 +59,8 @@ $d \leq k + 1$, it is true that $\circuit_\linput$ and $\circuit_\rinput$ both r
It is easy to check that except for lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}, all lines take $O(1)$ time. Consider an execution of \cref{alg:sample-times-union}. We note that we will be adding a given set of variables to some set at most once: since the sum of the sizes of the sets at a given level is at most $\degree(\circuit)$, each gate visited takes $O(\log{\degree(\circuit)})$. For \Cref{alg:sample-plus-bsamp}, note that we pick $\circuit_\linput$ with probability $\frac a{a+b}$ where $a=\circuit.\vari{Lweight}$ and $b=\circuit.\vari{Rweight}$. We can implement this step by picking a random number $r\in[a+b]$ and then checking if $r\le a$. It is easy to check that $a+b\le \abs{\circuit}(1,\dots,1)$. This means we need to add and compare $\log{\abs{\circuit}(1,\ldots, 1)}$-bit numbers, which can certainly be done in time $\multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}$ (note that this is an over-estimate).
Denote \cost(\circuit) (\Cref{eq:cost-sampmon}) to be an upper bound of the number of gates visited by \sampmon. Then the runtime is $O\left(\cost(\circuit)\cdot \log{\degree(\circuit)}\cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}\right)$.
We now bound the number of recursive calls in $\sampmon$ by $O\left((\degree(\circuit) + 1)\right.$$\left.\cdot\right.$ $\left.\depth(\circuit)\right)$, which by the above will prove the claimed runtime. The reason for this is that the number of recursive calls is exactly the number of calls to lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}.
\AH{We don't really justify why we can bound the number of recursive calls as we claim in what follows.}
Since there can be at most $k = \degree\inparen{\circuit}$ nodes visited at every level of the circuit, and each of the first $d - 1$ levels (going from the sink to the source nodes) will contain at least one recursive call, we can upperbound the number of recursive calls in $\sampmon$ by $O\left((\degree(\circuit) + 1)\right.$$\left.\cdot\right.$ $\left.\depth(\circuit)\right)$, which by the above will prove the claimed runtime of~\Cref{lem:sample}. %The reason for this is that the number of recursive calls is exactly the number of calls to lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}.
Let \cost$(\cdot)$ be a function that models an upper bound on the number of gates that can be visited in the run of \sampmon. We define \cost$(\cdot)$ recursively as follows.
@ -83,11 +84,11 @@ Note that \cref{eq:strict-upper-bound} implies the claimed runtime.
\AH{If the claimed runtime is from the first paragraph, then I don't follow.}
We prove \cref{eq:strict-upper-bound} for the number of gates traversed in \sampmon using induction over $\depth(\circuit)$. Recall how degree is defined in \cref{def:degree}.
\AH{In the following, by~\Cref{def:size-depth}, we would have that $\depth\inparen{\circuit} = 1$ \emph{technically}.}
For the base case $\degree(\circuit) \in \inset{0, 1}, \depth(\circuit) = 1$, $\cost(\circuit) = 1$, and it is trivial to see that the inequality $2\degree(\circuit) \cdot \depth(\circuit) + 1 \geq \cost(\circuit)$ holds.
For the base case $\degree(\circuit) = \inset{0, 1}, \depth(\circuit) = 0$, $\cost(\circuit) = 1$, and it is trivial to see that the inequality $2\degree(\circuit) \cdot \depth(\circuit) + 1 \geq \cost(\circuit)$ holds.
\AH{Why equality here instead of inequality? Also, it could be more obvious for why depth must be at least $1$.}
For the inductive hypothesis, we assume the bound holds for any circuit where $\ell \geq \depth(\circuit) \geq 0$.
Now consider the case when \sampmon has an arbitrary circuit \circuit input with $\depth(\circuit) = \ell + 1$. By definition \circuit.\type $\in \{\circplus, \circmult\}$. Note that since $\depth(\circuit) \geq 1$, \circuit must have input(s). Further we know that by the inductive hypothesis the inputs $\circuit_i$ for $i \in \{\linput, \rinput\}$ of the sink gate \circuit uphold the bound
Now consider the case when \sampmon has an arbitrary circuit \circuit input with $\depth(\circuit) = \ell + 1$. By definition \circuit.\type $\in \{\circplus, \circmult\}$. Note that since $\depth(\circuit) \geq 2$, \circuit must have input(s). Further we know that by the inductive hypothesis the inputs $\circuit_i$ for $i \in \{\linput, \rinput\}$ of the sink gate \circuit uphold the bound
\begin{equation}
2\left(\degree(\circuit_i) + 1\right)\cdot \depth(\circuit_i) + 1 \geq \cost(\circuit_i).\label{eq:ih-bound-cost}
\end{equation}
@ -115,15 +116,15 @@ where $\depth_{\max}$ is used to denote the maximum depth of the two input subci
Putting \Cref{eq:times-lhs-expanded} and \Cref{eq:times-middle-expanded} together we get
\begin{align}
&2\degree(\circuit_\linput)\cdot\depth_{\max} + 2\degree(\circuit_\rinput)\cdot\depth_{\max} + 4\depth_{\max} + 2\degree(\circuit_\linput) + 2\degree(\circuit_\rinput) + 5\nonumber\\
&\qquad\geq 2\degree(\circuit_\linput)\cdot\depth(\circuit_\linput) + 2\degree(\circuit_\rinput)\cdot\depth(\circuit_\rinput) + 2\depth(\circuit_\linput) + 2\depth(\circuit_\rinput) + 3\label{eq:times-lhs-middle}
&\qquad\geq 2\degree(\circuit_\linput)\cdot\depth(\circuit_\linput) + 2\degree(\circuit_\rinput)\cdot\depth(\circuit_\rinput) + 2\depth(\circuit_\linput) + 2\depth(\circuit_\rinput) + 3.\label{eq:times-lhs-middle}
\end{align}
Since the following is always true,
\begin{align*}
&2\degree(\circuit_\linput)\cdot\depth_{\max} + 2\degree(\circuit_\rinput)\cdot\depth_{\max} + 4\depth_{\max} + 5\\
&\qquad \geq 2\degree(\circuit_\linput)\cdot\depth(\circuit_\linput) + 2\degree(\circuit_\rinput)\cdot\depth(\circuit_\rinput) + 2\depth(\circuit_\linput) + 2\depth(\circuit_\rinput) + 3,
\end{align*}
then it is the case that \Cref{eq:times-lhs-middle} is \emph{always} true.
%Since the following is always true,
%\begin{align*}
%&2\degree(\circuit_\linput)\cdot\depth_{\max} + 2\degree(\circuit_\rinput)\cdot\depth_{\max} + 4\depth_{\max} + 5\\
%&\qquad \geq 2\degree(\circuit_\linput)\cdot\depth(\circuit_\linput) + 2\degree(\circuit_\rinput)\cdot\depth(\circuit_\rinput) + 2\depth(\circuit_\linput) + 2\depth(\circuit_\rinput) + 3,
%\end{align*}
%then it is the case that \Cref{eq:times-lhs-middle} is \emph{always} true.
Now to justify (\ref{eq:times-rhs}) which holds for the following reasons. First, \cref{eq:times-rhs}
is the result of \Cref{eq:cost-sampmon} when $\circuit.\type = \circmult$. \Cref{eq:times-middle}
@ -140,8 +141,8 @@ To prove (\ref{eq:plus-middle}), \cref{eq:plus-lhs-inequality} expands to
\begin{equation}
2\degree_{\max}\depth_{\max} + 2\degree_{\max} + 2\depth_{\max} + 2 + 1.\label{eq:plus-lhs-expanded}
\end{equation}
\AH{It seems more confusing to add an extra term in the RHS of the leftmost inequality.}
Since $\degree_{\max} \cdot \depth_{\max} \geq \degree(\circuit_i)\cdot \depth(\circuit_i),$ the following upper bound holds for the expansion of \cref{eq:plus-middle}:
Since $\degree_{\max} \cdot \depth_{\max} \geq \degree(\circuit_i)\cdot \depth(\circuit_i),$ the following upperbounds the expansion of \cref{eq:plus-middle}:
\begin{equation}
2\degree_{\max}\depth_{\max} + 2\depth_{\max} + 2
\label{eq:plus-middle-expanded}

View File

@ -54,7 +54,7 @@ Note that we can construct circuits for \bis in time linear in the time required
We now connect the size of a circuit (where the size of a circuit is the number of vertices in the corresponding DAG)
for a given $\raPlus$ query $Q$ and \abbrNXPDB $\pxdb$ to
the runtime $\qruntime{Q,\tupset}$ of the PDB's \dbbaseName $\tupset$.
the runtime $\qruntime{\query,\tupset, \bound}$ of the PDB's \dbbaseName $\tupset$.
\AH{@atri: do we use $\tupset$ or $\gentupset$ here?}
We do this formally by showing that the size of the circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
@ -75,8 +75,7 @@ encodes a polynomial, realized as
We define the circuit for a $\raPlus$ query $\query$ recursively by cases as follows. In each case, let $\tuple{V_{Q_i,\pxdb}, E_{Q_i,\pxdb}, \phi_{Q_{i},\pxdb}, \ell_{Q_i,\pxdb}}$ denote the circuit for subquery $Q_i$. We implicitly include in all circuits a global zero node $v_0$ s.t., $\ell_{Q, \pxdb}(v_0) = 0$ for any $Q,\pxdb$.
\AH{Questions for below:\par\begin{enumerate}\item Why did we choose the name \lincirc?\item What is $\domain\inparen{\phi'}$?\item Unsure of $\ell\gets\ell\cup\inset{v_t, \rel\inparen{\tup}}$ since $\ell$ is defined as a function with a range of $\inset{+, \times}\cup\mathbb{N}\cup\vct{X}$. Since $\rel\in\tupset$ where $\tupset$ is the deterministic bounding database for $\mathbb{N}\pbox{\vct{X}}$-PDB $\mathcal{D}_{\mathbb{N}\pbox{\vct{X}}}$, then $\rel\inparen{\tup}\in\mathbb{N}\pbox{\vct{X}}\not\subseteq\inset{+, \times}\cup\mathbb{N}\cup\vct{X}$ \emph{unless} it is implicit that each tuple in a base relation is annotated with an element $X\in \vct{X}$. OR are we thinking of $\tupset$ as an $\mathbb{N}$ bounding database, which upperbounds the multiplicity of every possible tuple?\item The comment on garbage collecting for \textbf{Selection}: it seems to me, that with or without garbage collection, the upperbound would still hold.\item Are we using the notation $\project_{A}$ to mean anything different than $\project_A$?\item In pseudocode, is it okay to start using new variables without first declaring or defining them?\item The ``output'' of Algo 4 is \circuit, yet nothing is returned. Are we assuming a data structure that is modified in place, like e.g. passing a pointer to the data structure in a C-language method?\item On the surface it appears that line 23 should overwrite $V, E, \ell$ modifications in line 22, but really should be accumulating them. Is the $\gets$ symbol okay here since we have already stated that the variables are accumulators?\item Should line 37 be $i\in\pbox{m}$?\item Shouldn't the first inequality RHS of m-ary join have the $k$'s subscripted? \end{enumerate}
}
\begin{algorithm}
\caption{\lincirc$(\query, \tupset, E, V, \ell)$}
\label{alg:lc}
@ -87,25 +86,25 @@ We define the circuit for a $\raPlus$ query $\query$ recursively by cases as fol
\Ensure $\circuit = \tuple{V, E, \phi, \ell}$: a circuit encoding the lineage of each tuple in $\query(\tupset)$
\If{$\query$ is $\rel$} \Comment{\textbf{Case 1}: $\query$ is a relation atom}
\For{$t \in \tupset.\rel$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{\inparen{v_t, \rel\inparen{\tup}}\}$ \Comment{Allocate a fresh node $v_t$; note that when $\rel\inparen{\tup}\not\in\vct{X}$ but $\rel\inparen{\tup}\in\mathbb{N}\pbox{\vct{X}}$, we assume the algorithm generates a $3$ node circuit encoding the multiplcation of $\bound\cdot X_\tup$, adding the new vertices, edges, and vertice/label pairs to their respective sets.}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{\inparen{v_t, \rel\inparen{\tup}}\}$ \Comment{Allocate a fresh node $v_t$; note that when $\rel\inparen{\tup} = \bound\cdot X_\tup$ for $\bound > 1$, we assume the algorithm generates a $3$ node circuit encoding the multiplcation of $\bound\cdot X_\tup$, adding the new vertices, edges, and vertice/label pairs to their respective sets.}
\State $\phi(t) \gets v_t$
\EndFor
\State\Return $\tuple{V, E, \phi, \ell}$
\ElsIf{$\query$ is $\sigma_\theta(\query')$} \Comment{\textbf{Case 2}: $\query$ is a Selection}
\State $\tuple{V, E, \phi', \ell} \gets \lincirc(\query', \tupset, V, E, \ell)$
\ElsIf{$\query$ is $\sigma_\theta(\query_1)$} \Comment{\textbf{Case 2}: $\query$ is a Selection}
\State $\tuple{V, E, \phi', \ell} \gets \lincirc(\query_1, \tupset, V, E, \ell)$
\For{$t \in \domain(\phi')$}
\State \textbf{if }$\theta(t)$
\textbf{ then } $\phi(t) \gets \phi'(t)$
\textbf{ else } $\phi(t) \gets v_0$
\EndFor
\State\Return $\tuple{V, E, \phi, \ell}$
\ElsIf{$\query$ is $\pi_{A}(\query')$} \Comment{\textbf{Case 3}: $\query$ is a Projection}
\State $\tuple{V, E, \phi', \ell} \gets \lincirc(\query', \tupset, V, E, \ell)$
\For{$t \in \pi_{A}(\query'(\tupset))$}
\ElsIf{$\query$ is $\pi_{A}(\query_1)$} \Comment{\textbf{Case 3}: $\query$ is a Projection}
\State $\tuple{V, E, \phi', \ell} \gets \lincirc(\query_1, \tupset, V, E, \ell)$
\For{$t \in \pi_{A}(\query_1(\tupset))$}
\State $V \leftarrow V \cup \{v_t\}$; $\ell \leftarrow \ell \cup \{(v_t, +)\}$\Comment{Allocate a fresh node $v_t$}
\State $\phi(t) \leftarrow v_t$
\EndFor
\For{$t \in \query'(\tupset)$}
\For{$t \in \query_1(\tupset)$}
\State $E \leftarrow E \cup \{(\phi'(t), \phi(\pi_{A}t))\}$
\EndFor
\State Correct nodes with in-degrees $>2$ by appending an equivalent fan-in two tree instead
@ -189,7 +188,7 @@ For the projection case, observe that the fan-in is bounded by $|\query'(\dbbase
\label{lem:circuits-model-runtime}
Given a \abbrNXPDB $\pxdb$ with \dbbaseName $\tupset$, and an $\raPlus$ query $Q$, the runtime of $Q$ over $\tupset$ has the same or greater complexity as the size of the lineage of $Q(\pxdb)$. That is, we have $\abs{V_{Q,\pxdb}} \leq k\qruntime{Q, \tupset}+1$, where $k\ge 1$ is the maximal degree of any polynomial in $Q(\pxdb)$.
\end{Lemma}
\AH{Why are the number of vertices considered to be the size of the lineage?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}
We prove by induction that $\abs{V_{Q,\pxdb} \setminus \{v_0\}} \leq k\qruntime{Q, \tupset}$. For clarity, we implicitly exclude $v_0$ in the proof below.

View File

@ -16,8 +16,8 @@
\BOOKMARK [2][-]{subsection.4.2}{\376\377\0004\000.\0002\000\040\000O\000u\000r\000\040\000m\000a\000i\000n\000\040\000r\000e\000s\000u\000l\000t}{section.4}% 16
\BOOKMARK [1][-]{section.5}{\376\377\0005\000\040\000R\000e\000l\000a\000t\000e\000d\000\040\000W\000o\000r\000k}{}% 17
\BOOKMARK [1][-]{section.6}{\376\377\0006\000\040\000C\000o\000n\000c\000l\000u\000s\000i\000o\000n\000s\000\040\000a\000n\000d\000\040\000F\000u\000t\000u\000r\000e\000\040\000W\000o\000r\000k}{}% 18
\BOOKMARK [1][-]{section*.11}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 19
\BOOKMARK [1][-]{section*.13}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 20
\BOOKMARK [1][-]{section*.12}{\376\377\000A\000c\000k\000n\000o\000w\000l\000e\000d\000g\000m\000e\000n\000t\000s}{}% 19
\BOOKMARK [1][-]{section*.14}{\376\377\000R\000e\000f\000e\000r\000e\000n\000c\000e\000s}{}% 20
\BOOKMARK [1][-]{appendix.A}{\376\377\000A\000\040\000M\000i\000s\000s\000i\000n\000g\000\040\000d\000e\000t\000a\000i\000l\000s\000\040\000f\000r\000o\000m\000\040\000S\000e\000c\000t\000i\000o\000n\000\040\0002}{}% 21
\BOOKMARK [2][-]{subsection.A.1}{\376\377\000A\000.\0001\000\040\000B\000a\000c\000k\000g\000r\000o\000u\000n\000d\000\040\000d\000e\000t\000a\000i\000l\000s\000\040\000f\000o\000r\000\040\000p\000r\000o\000o\000f\000\040\000o\000f\000\040\000p\000r\000o\000p\000:\000e\000x\000p\000e\000c\000t\000i\000o\000n\000-\000o\000f\000-\000p\000o\000l\000y\000n\000o\000m}{appendix.A}% 22
\BOOKMARK [2][-]{subsection.A.2}{\376\377\000A\000.\0002\000\040\000P\000r\000o\000o\000f\000\040\000o\000f\000\040\000p\000r\000o\000p\000:\000c\000t\000i\000d\000b\000-\000r\000e\000d\000u\000c\000t}{appendix.A}% 23

BIN
main.pdf

Binary file not shown.

Binary file not shown.

View File

@ -46,7 +46,9 @@
\usepackage[normalem]{ulem}
\usepackage{subcaption}
\usepackage{booktabs}
\usepackage[disable]{todonotes}
\usepackage%[disable]
{todonotes}
\usepackage{graphicx}
\usepackage{listings}
%%%%%%%%%% SQL + proveannce listing settings