paper-BagRelationalPDBsAreHard/approx_alg.tex

221 lines
21 KiB
TeX
Raw Normal View History

%root: main.tex
2020-12-19 01:15:50 -05:00
%!TEX root=./main.tex
2020-12-17 16:40:48 -05:00
\section{$1 \pm \epsilon$ Approximation Algorithm}\label{sec:algo}
2022-03-06 22:08:00 -05:00
In \Cref{sec:hard}, we showed that \Cref{prob:bag-pdb-poly-expected} cannot be solved in $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$ runtime. In light of this, we desire to produce an approximation algorithm that runs in time $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$. We do this by showing the result via circuits,
such that our approximation algorithm for this problem runs in $\bigO{\abs{\circuit}}$ for a very broad class of circuits, (thus affirming~\Cref{prob:intro-stmt}); see the discussion after \Cref{lem:val-ub} for more.
The following approximation algorithm applies to bag query semantics over both
\abbrCTIDB lineage polynomials and general \abbrBIDB lineage polynomials in practice, where for the latter we note that a $1$-\abbrTIDB is equivalently a \abbrBIDB (blocks are size $1$). Our experimental results (see~\Cref{app:subsec:experiment}) which use queries from the PDBench benchmark~\cite{pdbench} show a low $\gamma$ (see~\Cref{def:param-gamma}) supporting the notion that our bounds hold for general \abbrBIDB in practice.
Corresponding proofs and pseudocode for all formal statements and algorithms
2022-02-17 10:07:33 -05:00
can be found in \Cref{sec:proofs-approx-alg}.
2020-12-14 11:47:18 -05:00
%it is then desirable to have an algorithm to approximate the multiplicity in linear time, which is what we describe next.
2022-02-17 10:07:33 -05:00
2020-12-14 11:47:18 -05:00
\subsection{Preliminaries and some more notation}
2021-09-20 21:41:06 -04:00
We now introduce definitions and notation related to circuits and polynomials that we will need to state our upper bound results. First we introduce the expansion $\expansion{\circuit}$ of circuit $\circuit$ which % encodes the reduced polynomial for $\polyf\inparen{\circuit}$ and is the basis
2022-03-06 22:08:00 -05:00
is used in our auxiliary algorithm \sampmon for sampling monomials when computing the approximation. % (part of our approximation algorithm).
2021-04-10 14:35:38 -04:00
\begin{Definition}[$\expansion{\circuit}$]\label{def:expand-circuit}
2021-09-20 21:41:06 -04:00
For a circuit $\circuit$, we define $\expansion{\circuit}$ as a list of tuples $(\monom, \coef)$, where $\monom$ is a set of variables and $\coef \in \domN$.
$\expansion{\circuit}$ has the following recursive definition ($\circ$ is list concatenation).
$\expansion{\circuit} =
\begin{cases}
\expansion{\circuit_\linput} \circ \expansion{\circuit_\rinput} &\textbf{ if }\circuit.\type = \circplus\\
\left\{(\monom_\linput \cup \monom_\rinput, \coef_\linput \cdot \coef_\rinput) ~|~(\monom_\linput, \coef_\linput) \in \expansion{\circuit_\linput}, (\monom_\rinput, \coef_\rinput) \in \expansion{\circuit_\rinput}\right\} &\textbf{ if }\circuit.\type = \circmult\\
\elist{(\emptyset, \circuit.\val)} &\textbf{ if }\circuit.\type = \tnum\\
\elist{(\{\circuit.\val\}, 1)} &\textbf{ if }\circuit.\type = \var.\\
\end{cases}
$
2020-12-19 01:15:50 -05:00
\end{Definition}
2021-09-19 23:41:02 -04:00
Later on, we will denote the monomial composed of the variables in $\monom$ as $\encMon$. As an example of $\expansion{\circuit}$, consider $\circuit$ illustrated in \Cref{fig:circuit}. $\expansion{\circuit}$ is then $[(X, 2), (XY, -1), (XY, 4), (Y, -2)]$. This helps us redefine $\rpoly$ (see \Cref{eq:tilde-Q-bi}) in a way that makes our algorithm more transparent.
2021-09-10 11:49:29 -04:00
\begin{Definition}[$\abs{\circuit}$]\label{def:positive-circuit}
For any circuit $\circuit$, the corresponding
2021-04-08 22:30:03 -04:00
{\em positive circuit}, denoted $\abs{\circuit}$, is obtained from $\circuit$ as follows. For each leaf node $\ell$ of $\circuit$ where $\ell.\type$ is $\tnum$, update $\ell.\vari{value}$ to $|\ell.\vari{value}|$.
\end{Definition}
2021-09-10 11:49:29 -04:00
We will overload notation and use $\abs{\circuit}\inparen{\vct{X}}$ to mean $\polyf\inparen{\abs{\circuit}}$.
Conveniently, $\abs{\circuit}\inparen{1,\ldots,1}$ gives us $\sum\limits_{\inparen{\monom, \coef} \in \expansion{\circuit}}\abs{\coef}$.
2021-09-03 12:34:08 -04:00
\begin{Definition}[\size($\cdot$), \depth$\inparen{\cdot}$]\label{def:size-depth}
The functions \size and \depth output the number of gates and levels respectively for input \circuit.
\end{Definition}
2021-09-10 11:49:29 -04:00
\begin{Definition}[$\degree(\cdot)$]\label{def:degree}\footnote{Note that the degree of $\polyf(\abs{\circuit})$ is always upper bounded by $\degree(\circuit)$ and the latter can be strictly larger (e.g. consider the case when $\circuit$ multiplies two copies of the constant $1$-- here we have $\deg(\circuit)=1$ but degree of $\polyf(\abs{\circuit})$ is $0$).}
$\degree(\circuit)$ is defined recursively as follows:
\[\degree(\circuit)=
\begin{cases}
\max(\degree(\circuit_\linput),\degree(\circuit_\rinput)) & \text{ if }\circuit.\type=+\\
\degree(\circuit_\linput) + \degree(\circuit_\rinput)+1 &\text{ if }\circuit.\type=\times\\
1 & \text{ if }\circuit.\type = \var\\
0 & \text{otherwise}.
\end{cases}
\]
\end{Definition}
2021-09-18 01:47:02 -04:00
Next, we use the following notation for the complexity of multiplying integers:
\begin{Definition}[$\multc{\cdot}{\cdot}$]\footnote{We note that when doing arithmetic operations on the RAM model for input of size $N$, we have that $\multc{O(\log{N})}{O(\log{N})}=O(1)$. More generally we have $\multc{N}{O(\log{N})}=O(N\log{N}\log\log{N})$.}
2021-09-18 01:47:02 -04:00
In a RAM model of word size of $W$-bits, $\multc{M}{W}$ denotes the complexity of multiplying two integers represented with $M$-bits. (We will assume that for input of size $N$, $W=O(\log{N})$.)
2021-04-06 10:40:05 -04:00
\end{Definition}
2021-09-20 21:41:06 -04:00
Finally, to get linear runtime results, we will need to define another parameter modeling the (weighted) number of monomials in %$\poly\inparen{\vct{X}}$
$\expansion{\circuit}$
2022-02-17 10:07:33 -05:00
that need to be `canceled' when monomials with dependent variables are removed (\Cref{subsec:one-bidb}). %def:hen it is modded with $\mathcal{B}$ (\Cref{def:mod-set-polys}).
2021-09-10 11:49:29 -04:00
Let $\isInd{\cdot}$ be a boolean function returning true if monomial $\encMon$ is composed of independent variables and false otherwise; further, let $\indicator{\theta}$ also be a boolean function returning true if $\theta$ evaluates to true.
2020-12-14 11:47:18 -05:00
\begin{Definition}[Parameter $\gamma$]\label{def:param-gamma}
Given a \abbrOneBIDB circuit $\circuit$ define
2021-09-03 12:34:08 -04:00
\[\gamma(\circuit)=\frac{\sum_{(\monom, \coef)\in \expansion{\circuit}} \abs{\coef}\cdot \indicator{\neg\isInd{\encMon}} }%\encMon\mod{\mathcal{B}}\equiv 0}}
{\abs{\circuit}(1,\ldots, 1)}.\]
2020-12-14 11:47:18 -05:00
\end{Definition}
2021-09-18 01:47:02 -04:00
\subsection{Our main result}\label{sec:algo:sub:main-result}
We solve~\Cref{prob:intro-stmt} for any fixed $\epsilon > 0$ in what follows.
2020-12-14 11:47:18 -05:00
2021-09-18 01:47:02 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Algorithm Idea}
2021-09-20 21:41:06 -04:00
%We prove \Cref{lem:approx-alg} by developing an
Our approximation algorithm (\approxq pseudo code in \Cref{sec:proof-lem-approx-alg})
%with the desired runtime. This algorithm
2021-09-18 01:47:02 -04:00
is based on the following observation.
% The algorithm (\approxq detailed in \Cref{alg:mon-sam}) to prove \Cref{lem:approx-alg} follows from the following observation.
2022-02-17 10:07:33 -05:00
Given a lineage polynomial $\poly(\vct{X})=\polyf(\circuit)$ for circuit \circuit over
2022-03-06 22:08:00 -05:00
\abbrOneBIDB (recall that all \abbrCTIDB can be reduced to \abbrOneBIDB by~\Cref{prop:ctidb-reduct}), we have: % can exactly represent $\rpoly(\vct{X})$ as follows:
2021-09-18 01:47:02 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{equation}
\label{eq:tilde-Q-bi}
2021-09-20 18:12:43 -04:00
\rpoly\inparen{p_1,\dots,p_\numvar}=\hspace*{-1mm}\sum_{(\monom,\coef)\in \expansion{\circuit}} %\hspace*{-2mm}
2021-09-18 01:47:02 -04:00
\indicator{\isInd{\encMon}%\mod{\mathcal{B}}\not\equiv 0
2021-09-20 18:12:43 -04:00
}\cdot \coef\cdot\hspace*{-2mm}\prod_{X_i\in \monom}\hspace*{-2mm} p_i.
2021-09-18 01:47:02 -04:00
\end{equation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Given the above, the algorithm is a sampling based algorithm for the above sum: we sample (via \sampmon) $(\monom,\coef)\in \expansion{\circuit}$ with probability proportional
to $\abs{\coef}$ and compute $\vari{Y}=\indicator{\isInd{\encMon}}
2021-09-20 21:41:06 -04:00
\cdot \prod_{X_i\in \monom} p_i$. %Taking $\ceil{\frac{2 \log{\frac{2}{\conf}}}{\error^2}}$ samples
2022-02-17 10:07:33 -05:00
Repeating the sampling an appropriate number of times
2021-09-18 01:47:02 -04:00
and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is used to compute the sampling probabilities needed in \sampmon (details are in \Cref{sec:proofs-approx-alg}).
%%%%%%%%%%%%%%%%%%%%%%%
%The following results assume input circuit \circuit computed from an arbitrary $\raPlus$ query $\query$ and arbitrary \abbrBIDB $\pdb$. We refer to \circuit as a \abbrBIDB circuit.
%\AH{Verify that the proof for \Cref{lem:approx-alg} doesn't rely on properties of $\raPlus$ or \abbrBIDB.}
%\begin{Theorem}\label{lem:approx-alg}
2021-09-20 21:41:06 -04:00
%Let \circuit be an arbitrary \abbrBIDB circuit %for a UCQ over \bi
2021-09-18 01:47:02 -04:00
%and define $\poly(\vct{X})=\polyf(\circuit)$ and let $k=\degree(\circuit)$.
%Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$ can be computed in time
%{\small
%\[O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot \abs{\circuit}^2(1,\ldots, 1)\cdot k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error}^2\cdot\rpoly^2(\prob_1,\ldots, \prob_\numvar)}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)\]
%}
%such that
%\begin{equation}
%\label{eq:approx-algo-bound}
%\probOf\left(\left|\mathcal{E} - \rpoly(\prob_1,\dots,\prob_\numvar)\right|> \error \cdot \rpoly(\prob_1,\dots,\prob_\numvar)\right) \leq \conf.
%\end{equation}
%\end{Theorem}
\mypar{Runtime analysis} We can argue the following runtime for the algorithm outlined above:
% We next present a few corollaries of \Cref{lem:approx-alg}.
\begin{Theorem}
\label{cor:approx-algo-const-p}
Let \circuit be an arbitrary \emph{\abbrOneBIDB} circuit, define $\poly(\vct{X})=\polyf(\circuit)$, let $k=\degree(\circuit)$, and let $\gamma=\gamma(\circuit)$. Further let it be the case that $\prob_i\ge \prob_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$
2021-09-20 21:41:06 -04:00
satisfying
2021-09-18 01:47:02 -04:00
\begin{equation}
\label{eq:approx-algo-bound-main}
\probOf\left(\left|\mathcal{E} - \rpoly(\prob_1,\dots,\prob_\numvar)\right|> \error' \cdot \rpoly(\prob_1,\dots,\prob_\numvar)\right) \leq \conf
\end{equation}
can be computed in time
\begin{equation}
\label{eq:approx-algo-runtime}
O\left(\left(\size(\circuit) + \frac{\log{\frac{1}{\conf}}\cdot k\cdot \log{k} \cdot \depth(\circuit))}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right).
\end{equation}
2021-04-06 16:35:11 -04:00
In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the above runtime simplifies to $O_k\left(\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)\cdot\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}\right)$.
2021-09-18 01:47:02 -04:00
\end{Theorem}
2020-12-17 16:40:48 -05:00
2022-02-22 22:49:47 -05:00
%\begin{Corollary}
%Given any \abbrCTIDB circuit \circuit, $\poly\inparen{\vct{X}} = \polyf\inparen{\circuit}$, for $k =\degree\inparen{\circuit}$, $\gamma\inparen{\circuit}$, and $\prob_i\ge\prob_0$ for all $i\in\pbox{\numvar}$. The results of~\Cref{cor:approx-algo-const-p} follow for estimating $\rpoly\inparen{\prob_1,\ldots, \prob_\numvar}$.
%\end{Corollary}
The restriction on $\gamma$ is satisfied by any
2022-03-06 22:08:00 -05:00
$1$-\abbrTIDB (where $\gamma=0$ in the equivalent $1$-\abbrBIDB of~\Cref{prop:ctidb-reduct})
as well as for all three queries of the PDBench \abbrBIDB benchmark (see \Cref{app:subsec:experiment} for experimental results). Further, we can also argue the following result
\secrev{
, recalling from~\Cref{sec:intro} for \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, where $\tupset$ is the set of possible tuples across all possible worlds of $\pdb$.
}
2022-03-01 11:34:16 -05:00
\secrev{
2022-02-22 22:49:47 -05:00
\begin{Lemma}
2022-03-06 22:08:00 -05:00
\label{lem:ctidb-gamma}
Given $\raPlus$ query $\query$ and \abbrCTIDB $\pdb$, let \circuit be the circuit computed by $\query\inparen{\tupset}$. Then, for the reduced \abbrOneBIDB $\pdb'$ there exists an equivalent circuit \circuit' obtained from $\query\inparen{\tupset'}$, such that $\gamma\inparen{\circuit'}\leq 1 - \inparen{\bound + 1}^{-\inparen{k-1}}$ with $\size\inparen{\circuit'} \leq \size\inparen{\circuit} + \bigO{\numvar\bound}$ %\cdot\inparen{2^{\inparen{\ceil{\log{2\bound}}}+ 1} - 1}$
and $\depth\inparen{\circuit'} = \depth\inparen{\circuit} + \bigO{\log{\bound}}$.%\ceil{\log{2\bound}}$.
2022-02-22 22:49:47 -05:00
\end{Lemma}
2022-03-06 22:08:00 -05:00
}
2022-03-01 11:34:16 -05:00
2022-03-06 22:08:00 -05:00
\secrev{
\begin{proof}[Proof of~\Cref{lem:ctidb-gamma}]
%Let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \pdb'}$ be the reduced \abbrOneBIDB and $\pdb = \inparen{\worlds, \pdb}$ the original \abbrCTIDB.
The circuit \circuit' is built from \circuit in the following manner. For each input gate $\gate_i$ with $\gate_i.\val = X_\tup$, replace $\gate_i$ with the circuit \subcircuit encoding the sum $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$. We argue that \circuit' is a valid circuit by the following facts. Let $\pdb = \inparen{\worlds, \bpd}$ be the original \abbrCTIDB \circuit was generated from. Then, by~\Cref{prop:ctidb-reduct} there exists a \abbrOneBIDB $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$, with $\tupset' = \inset{\intup{\tup, j}~|~\tup\in\tupset, j\in\pbox{\bound}}$, from which the conversion from \circuit to \circuit' follows. Both $\polyf\inparen{\circuit}$ and $\polyf\inparen{\circuit'}$ have the same expected multiplicity since (by~\Cref{prop:ctidb-reduct}) the distributions $\bpd$ and $\bpd'$ are equivalent and each $j\cdot\worldvec'_{\tup, j} = \worldvec_\tup$ for $\worldvec'\in\inset{0, 1}^{\bound\numvar}$ and $\worldvec\in\worlds$. Finally, note that because there exists a (sub) circuit encoding $\sum_{j = 1}^\bound j\cdot X_{\tup, j}$ that is a \emph{balanced} binary tree, the above conversion implies the claimed size and depth bounds of the lemma.
2022-03-07 09:20:36 -05:00
Next we argue the claim on $\gamma\inparen{\circuit'}$. Consider the list of expanded monomials $\expansion{\circuit}$ for \abbrCTIDB circuit \circuit. Let \monom be an arbitrary monomial such that the set of variables in \monom is $\encMon = X_{\tup_1}^{d_1},\ldots,X_{\tup_\ell}^{d_\ell}$ with $\ell$ variables. Then \monom yields the set of monomials $\vari{E}_\monom\inparen{\circuit'}=\inset{j_1^{d_1}\cdot X_{\tup, j_1}^{d_1}\times\cdots\times j_\ell^{d_\ell}\cdot X_{\tup, j_\ell}^{d_\ell}}_{j_1,\ldots, j_\ell \in \pbox{0, \bound}}$ in $\expansion{\circuit'}$. Recall that a cancellation occurs when we have a monomial \monom such that there exists $\tup\neq\tup'$ in the same block $\block$ where variables $X_\tup, X_{\tup'}$ are in the set of variables $\encMon$ of \monom. Observe that cancellations can only occur for each $X_{\tup}^{d_\tup}\in \encMon$, where the expansion $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}}^{d_\tup}$ represents the monomial $X_\tup^{d_\tup}$ in $\tupset'$. Consider the number of cancellations for $\inparen{\sum_{j = 1}^\bound j\cdot X_{\tup, j}^{d_\tup}}^{d_\ell}$. Then $\gamma \leq 1 - \inparen{c + 1}^{d_\tup - 1}$, since for each element in the set of cross products$\inset{\bigtimes_{i\in\pbox{d_\tup}, j_i\in\pbox{0, \bound}}X_{\tup, j_i}}$ there are \emph{exactly} $\bound+1$ surviving elements with $j_1=\cdots=j_{d_\tup}$, i.e. $X_j^{d_\tup}$ for each $j\in\pbox{0, \bound}$. The rest of the $\inparen{\bound + 1}^{d_\tup-1}$ cross terms cancel. Regarding the whole monomial \monom it is the case that the proportion of non-cancellations across each $X_\tup^{d_\tup}\in\encMon$ multiply as non-cancelling terms for $X_\tup$ can only be joined with non-cancelling terms of $X_{\tup'}^{d_{\tup'}}$. This then yields the fraction of cancelled monomials $1 - \prod_{i = 1}^{\ell}\inparen{c +1}^{d_i - 1}\leq \gamma \leq 1 - \inparen{c + 1}^{-\inparen{k - 1}}$ where the inequalities take into account the fact that $\sum_{i = 1}^\ell d_i \leq k$.
2022-03-07 09:20:36 -05:00
Since this is true for arbitrary \monom, the bound follows for $\polyf\inparen{\circuit'}$.
2022-03-01 11:34:16 -05:00
\end{proof}
\qed
}
2021-09-18 01:47:02 -04:00
2022-03-06 22:08:00 -05:00
We briefly connect the runtime in \Cref{eq:approx-algo-runtime} to the algorithm outline earlier (where we ignore the dependence on $\multc{\cdot}{\cdot}$, which is needed to handle the cost of arithmetic operations over integers). The $\size(\circuit)$ comes from the time taken to run \onepass once (\onepass essentially computes $\abs{\circuit}(1,\ldots, 1)$ using the natural circuit evaluation algorithm on $\circuit$). We make $\frac{\log{\frac{1}{\conf}}}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}$ many calls to \sampmon (each of which essentially traces $O(k)$ random sink to source paths in $\circuit$ all of which by definition have length at most $\depth(\circuit)$).
2020-12-19 23:20:31 -05:00
2021-04-10 09:48:26 -04:00
Finally, we address the $\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}$ term in the runtime. %In \Cref{susec:proof-val-up}, we show the following:
2021-04-06 11:21:52 -04:00
\begin{Lemma}
\label{lem:val-ub}
For any \emph{\abbrOneBIDB} circuit $\circuit$ with $\degree(\circuit)=k$, we have
2021-09-18 16:45:49 -04:00
$\abs{\circuit}(1,\ldots, 1)\le 2^{2^k\cdot \depth(\circuit)}.$
Further, %under either of the following conditions:
%\begin{enumerate}
if $\circuit$ is a tree, then
%\item $\circuit$ encodes the run of the algorithm on a FAQ~\cite{DBLP:conf/pods/KhamisNR16}/AJAR~\cite{ajar} query,
%\end{enumerate}
2021-04-08 22:30:03 -04:00
we have $\abs{\circuit}(1,\ldots, 1)\le \size(\circuit)^{O(k)}.$
2021-04-06 11:21:52 -04:00
\end{Lemma}
2022-02-17 10:07:33 -05:00
Note that the above implies that with the assumption $\prob_0>0$ and $\gamma<1$ are absolute constants from \Cref{cor:approx-algo-const-p}, then the runtime there simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)^2\cdot \log{\frac{1}{\conf}}\right)$ for general circuits $\circuit$. If $\circuit$ is a tree, then the runtime simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$, which then answers \Cref{prob:intro-stmt} with yes for such circuits.
%\AH{Is it standard to assume that in the asymptotic notation above, $\error$ and $\delta$ are constant? Otherwise this does not uphold~\Cref{prob:intro-stmt}.}
2021-09-18 16:45:49 -04:00
2022-03-06 22:08:00 -05:00
Finally, note that by \Cref{prop:circuit-depth} and \Cref{lem:circ-model-runtime} for any $\raPlus$ query $\query$, there exists a circuit $\circuit^*$ for $\apolyqdt$ such that $\depth(\circuit^*)\le O_{|Q|}(\log{n})$ and $\size(\circuit)\le O_k\inparen{\qruntime{\query, \tupset, \bound}}$. Using this along with \Cref{lem:val-ub}, \Cref{cor:approx-algo-const-p} and the fact that $n\le \qruntime{\query, \tupset, \bound}$, we have the following corollary:
2021-09-18 01:47:02 -04:00
\begin{Corollary}
2021-09-19 22:27:17 -04:00
\label{cor:approx-algo-punchline}
2022-02-22 22:49:47 -05:00
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \emph{\abbrOneBIDB} with $p_0>0$ and $\gamma<1$ (where $p_0,\gamma$ as in \Cref{cor:approx-algo-const-p}) are absolute constants. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $p_i$ for each $i\in [n]$ that defines $\pd$).
%Let $\poly(\vct{X})$ be a \abbrBIDB-lineage polynomial correspoding to an \abbrBIDB circuit $\circuit$ that satisfies the specific conditions in \Cref{lem:val-ub}. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time
% $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$. % for the case when $\circuit$ satisfies the specific conditions in \Cref{lem:val-ub}.
\end{Corollary}
2022-03-06 22:08:00 -05:00
Next, we note that the above result along with \Cref{lem:ctidb-gamma}
2022-02-22 22:49:47 -05:00
answers \Cref{prob:big-o-joint-steps} in the affirmative as follows:
\begin{Corollary}
\label{cor:approx-algo-punchline-ctidb}
2022-03-01 11:34:16 -05:00
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$ (where $p_0$ as in \Cref{cor:approx-algo-const-p}) is an absolute constant. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf,\bound}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $\prob_{\tup, j}$ for each $\tup\in\tupset,~j\in\pbox{\bound}$ that defines $\bpd$).
2021-09-18 16:45:49 -04:00
%Let $\poly(\vct{X})$ be a \abbrBIDB-lineage polynomial correspoding to an \abbrBIDB circuit $\circuit$ that satisfies the specific conditions in \Cref{lem:val-ub}. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time
2021-09-20 21:41:06 -04:00
% $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$. % for the case when $\circuit$ satisfies the specific conditions in \Cref{lem:val-ub}.
2021-09-18 01:47:02 -04:00
\end{Corollary}
2022-03-01 11:34:16 -05:00
\secrev{
\begin{proof}[Proof of~\Cref{cor:approx-algo-punchline-ctidb}]
2022-03-07 09:20:36 -05:00
The proof follows by~\Cref{lem:ctidb-gamma}, and~\Cref{cor:approx-algo-punchline}.
2022-03-01 11:34:16 -05:00
\end{proof}
\qed
}
%\AH{What is $\abs{\query}$? Isn't that just $k$?}
2022-03-01 11:34:16 -05:00
2021-09-20 18:12:43 -04:00
If we want to approximate the expected multiplicities of all $Z=O(n^k)$ result tuples $\tup$ simultaneously, we just need to run the above result with $\conf$ replaced by $\frac \conf Z$. Note this increases the runtime by only a logarithmic factor.
2021-09-18 16:45:49 -04:00
%\AR{The above Corollary needs to be improved/generalized. This is a place-holder for now.}
%In \Cref{app:proof-lem-val-ub} we argue that these conditions are very general and encompass many interesting scenarios, including query evaluation under FAQ/AJAR setup.
2021-09-18 01:47:02 -04:00
%\AH{AJAR reference.}
2021-04-10 14:35:38 -04:00
2020-12-19 16:44:18 -05:00
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: