Done with S5

master
Atri Rudra 2020-12-19 23:40:43 -05:00
parent abdbd0da1a
commit d7eb7d1544
1 changed files with 5 additions and 4 deletions

View File

@ -22,7 +22,7 @@ In~\Cref{sec:results-circuits} we argue why results from earlier sections also h
\subsubsection{Extending our results to lineage circuits}
\label{sec:results-circuits}
We first note that since expression trees are a special case of them, all of our hardness results in~\Cref{sec:hard} are still valid for lineage circuits.
We first note that since expression trees are a special case of linear circuits, all of our hardness results in~\Cref{sec:hard} are still valid for the latter.
Observe that \textsc{Approx}\textsc{imate}$\rpoly$ (\Cref{alg:mon-sam} in \Cref{sec:algo}) works for lineage circuits as long as the same guarantees on $\onepass$ and $\sampmon$ (\Cref{lem:one-pass} and \Cref{lem:sample} respectively) hold for lineage circuits as well.
It turns out that this is the case, simply because both algorithms rely on only one property of expression trees: that each node has two children;
@ -76,7 +76,8 @@ For the specifics on how lineage circuits are translated to represent polynomial
\subsubsection{Circuit size vs. runtime}
\label{sec:circuit-runtime}
We now connect the size of a lineage circuit (where the size of a lineage circuit is the number of vertices in the corresponding DAG\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.}) for a given SPJU query $Q$ to its $\qruntime{Q}$. We do this formally by showing that the size of the lineage circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
We now connect the size of a lineage circuit (where the size of a lineage circuit is the number of vertices in the corresponding DAG %\footnote{since each node has indegree at most two, this also is the same up to constants to counting the number of edges in the DAG.})
for a given SPJU query $Q$ to its $\qruntime{Q}$. We do this formally by showing that the size of the lineage circuit is asymptotically no worse than the corresponding runtime of a large class of deterministic query processing algorithms.
\begin{lemma}
\label{lem:circuits-model-runtime}
@ -89,7 +90,7 @@ We now have all the pieces to argue the following, which formally states that ou
Given an SPJU query $Q$ for a TIDB, we can present $(1\pm\eps)$ approximation to the expectation of each output tuple with probability at least $1-\delta$ in time $O_k\left(\frac 1{\eps^2}\cdot\qruntime{Q}\cdot \log{\frac{1}{\conf}}\cdot \log(n)\right)$.
\end{Corollary}
\begin{proof}
This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit counterpart-- see~\Cref{sec:results-circuits} of)~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit counterpart-- see~\Cref{sec:results-circuits})~\Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that~\Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
\end{proof}
\subsection{Higher moments}
@ -101,4 +102,4 @@ In addition, we could e.g. prove bounds of probability of the multiplicity being
While we do not have a good approximation algorithm for this problem, we can make some progress as follows:
Note that for any positive integer $m$ we can compute the expectation $\poly^m$ (since this only changes the degree of the corresponding lineage polynomial by a factor of $m$).
In other words, we can compute the $m$-th moment of the multiplicities as well allowing us to e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in.
However, we leave the question of coming up with a wider range of approximation algorithms for future work.
However, we leave the question of coming up with a more accurate approximation algorithms for future work.