Finished pass on S.3.

master
Aaron Huber 2021-09-02 17:01:17 -04:00
parent 49efdb6668
commit 950d06ccfb
2 changed files with 75 additions and 64 deletions

View File

@ -3,13 +3,12 @@
\section{Hardness of exact computation}
\label{sec:hard}
In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs under bag semantics. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold.
In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs under bag semantics. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold. As mentioned previously, all proofs are in the appendix.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preliminaries}
\AH{One of the complaints with reviewers is that the term \emph{subgraph} is not precise enough.}
Our hardness results are based on (exactly) counting the number of occurrences of a subgraph $H$ in $G$. Let $\numocc{G}{H}$ denote the number of occurrences of $H$ in graph $G$. We can think of $H$ as being of constant size and $G$ as growing. %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
Our hardness results are based on (exactly) counting the number of subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity. We can think of $H$ as being of constant size and $G$ as growing. %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
In particular, we will consider the problems of computing the following counts (given $G$ as an input and its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). Our hardness result in \Cref{sec:multiple-p} is based on the following result:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -41,25 +40,25 @@ $\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j.$
The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}\label{def:qk}
For any graph $G=([n],\edgeSet)$ and $\kElem\ge 1$, define
For any graph $G=(V,\edgeSet)$ and $\kElem\ge 1$, define
\[\poly_{G}^\kElem(X_1,\dots,X_n) = \left(\sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j\right)^\kElem\]
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Our hardness results only need a \ti instance; We also consider the special case when all the tuple probabilities (probabilities assigned to $X_i$ by $\probAllTup$) are the same value. Note that our hardness results % do not require the general circuit representation and
even hold for the expression trees. %this polynomial can be encoded in an expression tree of size $\Theta(km)$.
%Our hardness results only need a \ti instance; We also consider the special case when all the tuple probabilities (probabilities assigned to $X_i$ by $\probAllTup$) are the same value. Note that our hardness results % do not require the general circuit representation and
%even hold for the expression trees. %this polynomial can be encoded in an expression tree of size $\Theta(km)$.
\AH{Dangling pointers needs to be fixed.}
\noindent Returning to \Cref{fig:ex-shipping-simp}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ generalizes our running example query:
\resizebox{1\linewidth}{!}{
\begin{minipage}{1.05\linewidth}
\[\poly^k_G\dlImp OnTime(C_1),Route(C_1, C_1'),OnTime(C_1'),\dots,OnTime(C_\kElem),Route(C_\kElem,C_\kElem'),OnTime(C_\kElem')\]
\end{minipage}
}
where adapting the PDB instance in \Cref{fig:ex-shipping-simp}, relation $OnTime$ has $n$ tuples corresponding to each vertex in $\vset=[n]$ each with probability $\prob$ and $Route(\text{City}_1, \text{City}_2)$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\AH{This footnote is probably unnecessary now since we changed the example.}\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
Note that this implies that our hard query polynomial can be represented as an expression tree \AH{If we speak of an expression tree, we need to have defined this somewhere.} produced by a project-join query with same probability value for each input tuple $\prob_i$.
\AH{The above discussion from \cref{def:qk} seems to be a bit ambiguous. I'm not sure it's entirely accurate to end with $\prob_i$.}
\noindent Returning to \cref{fig:two-step}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ generalizes our example query from the introduction:
\begin{align*}
\query^k_G \coloneqq &\inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}\times_2\cdots\\
&\cdots \times_k \inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}
\end{align*}
%\resizebox{1\linewidth}{!}{
%\begin{minipage}{1.05\linewidth}
%\[\poly^k_G\dlImp OnTime(C_1),Route(C_1, C_1'),OnTime(C_1'),\dots,OnTime(C_\kElem),Route(C_\kElem,C_\kElem'),OnTime(C_\kElem')\]
%\end{minipage}
%}
where adapting the PDB instance in \cref{fig:two-step}, relation $OnTime$ has $4$ tuples corresponding to each vertex $v_i$ in $\vset$ for $i$ in $[4]$, each with probability $\prob_i$ and $Route$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
Note that this implies that our hard query polynomial can be represented as an expression tree produced by a project-join query with same probability value for each input tuple $\prob_i$.
\subsection{Multiple Distinct $\prob$ Values}
\label{sec:multiple-p}
@ -74,10 +73,16 @@ Computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$ and any $(2
%
We will prove the above result by reducing from the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state-of-the-art $k$-matching algorithms are improved, we cannot hope to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$ where $m=\abs{\edgeSet}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its \abbrSMB form and then using \Cref{cor:expct-sop}. The approximation algorithm we present in \Cref{sec:algo} has runtime $O_k\inparen{m}$ for this query. % (since it runs in linear-time on all lineage polynomials).
\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem (and proves \Cref{thm:mult-p-hard-result}):
\begin{Lemma}\label{lem:qEk-multi-p}
Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$. Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $\bigO{\kElem^3}$ time.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%NEEDS to be moved to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem (and proves \Cref{thm:mult-p-hard-result}):
%\begin{Lemma}\label{lem:qEk-multi-p}
%Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$. Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $\bigO{\kElem^3}$ time.
%\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%END move to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Local Variables:
%%% mode: latex

View File

@ -19,48 +19,54 @@ The above shows the hardness for a very specific query polynomial but it is easy
Unlike \Cref{thm:mult-p-hard-result} the above result does not show that computing $\rpoly_{G}^3(\prob,\dots,\prob)$ for a fixed $\prob\in (0,1)$ is \sharpwonehard.
However, in \Cref{sec:algo} we show that if we are willing to compute an approximation, then this problem (and indeed solving our problem for a much more general setting) is in linear time.
%\AH{@atri needs to put in the result for triangles of $\numvar^{\frac{4}{3}}$ runtime.}
We will prove the above result by the following reduction:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{th:single-p}
Fix $\prob\in (0,1)$. Let $G$ be a graph on $\numedge$ edges.
If we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly in $T(\numedge)$ time, then we can exactly compute $\numocc{G}{\tri}$ %count the number of triangles, 3-paths, and 3-matchings in $G$
in $O\inparen{T(\numedge) + \numedge}$ time.
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Before we move on to the proof itself, we state the results, lemmas, and defintions that will be useful in the proof.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
To prove \cref{th:single-p}, we use the following notion.
\begin{Definition}\label{def:Gk}
For $\ell \geq 1$, let graph $\graph{\ell}$ be a graph generated from an arbitrary graph $G$, by replacing every edge $e$ of $G$ with a $\ell$-path, such that all inner vertexes of an $\ell$-path replacement edge are disjoint from all other vertexes.\footnote{Note that $G\equiv \graph{1}$.}% of any other $\ell$-path replacement edge. % in the sense that they only intersect at the original intersection endpoints as seen in $\graph{1}$.
\end{Definition}
The following result immediately implies \Cref{th:single-p}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:lin-sys}
Fix $\prob\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [2]$, we can compute in $O(m)$ time a vector $\vct{b}\in\mathbb{R}^3$ such that
\[ \begin{pmatrix}
1 - 3p & -(3\prob^2 - \prob^3)\\
10(3\prob^2 - \prob^3) & 10(3\prob^2 - \prob^3)
\end{pmatrix}
\cdot
\begin{pmatrix}
\numocc{G}{\tri}]\\
\numocc{G}{\threedis}
\end{pmatrix}
=\vct{b},
\]
allowing us to compute $\numocc{G}{\tri}$ and $\numocc{G}{\threedis}$ in $O(1)$ time.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%
%NEED to move to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%
%%\AH{@atri needs to put in the result for triangles of $\numvar^{\frac{4}{3}}$ runtime.}
%We will prove the above result by the following reduction:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\begin{Theorem}\label{th:single-p}
%Fix $\prob\in (0,1)$. Let $G$ be a graph on $\numedge$ edges.
%If we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly in $T(\numedge)$ time, then we can exactly compute $\numocc{G}{\tri}$ %count the number of triangles, 3-paths, and 3-matchings in $G$
%in $O\inparen{T(\numedge) + \numedge}$ time.
%\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%%Before we move on to the proof itself, we state the results, lemmas, and defintions that will be useful in the proof.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%To prove \cref{th:single-p}, we use the following notion.
%\begin{Definition}\label{def:Gk}
%For $\ell \geq 1$, let graph $\graph{\ell}$ be a graph generated from an arbitrary graph $G$, by replacing every edge $e$ of $G$ with a $\ell$-path, such that all inner vertexes of an $\ell$-path replacement edge are disjoint from all other vertexes.\footnote{Note that $G\equiv \graph{1}$.}% of any other $\ell$-path replacement edge. % in the sense that they only intersect at the original intersection endpoints as seen in $\graph{1}$.
%\end{Definition}
%
%
%
%The following result immediately implies \Cref{th:single-p}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%\begin{Lemma}\label{lem:lin-sys}
%Fix $\prob\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [2]$, we can compute in $O(m)$ time a vector $\vct{b}\in\mathbb{R}^3$ such that
%\[ \begin{pmatrix}
%1 - 3p & -(3\prob^2 - \prob^3)\\
%10(3\prob^2 - \prob^3) & 10(3\prob^2 - \prob^3)
%\end{pmatrix}
%\cdot
%\begin{pmatrix}
%\numocc{G}{\tri}]\\
%\numocc{G}{\threedis}
%\end{pmatrix}
%=\vct{b},
%\]
%allowing us to compute $\numocc{G}{\tri}$ and $\numocc{G}{\threedis}$ in $O(1)$ time.
%\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%END move to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\AH{Corollary needs refinement.}
\begin{Corollary}