Finished pass on S3

master
Aaron Huber 2021-09-09 11:42:30 -04:00
parent d7b906dd41
commit aebe234c9b
3 changed files with 42 additions and 28 deletions

View File

@ -139,4 +139,11 @@ The number of triangles in $\graph{\ell}$ for $\ell \geq 2$ will always be $0$ f
\end{proof}
\subsubsection{Proof of \Cref{lem:lin-sys}}
\input{lin_sys}
\input{lin_sys}
\subsubsection{Proof of \Cref{cor:bounds-tlc}}
\begin{proof}
We start by showing that there exists an algorithm that computes $\poly$ in $\bigO{\numedge}$ time for our hard query in \Cref{def:qk}. Assume that the edges $E$ of graph $G$ are encoded in a relation $R$. Then a simple table scan of $\rel$ will iterate through the entire set of $\numedge$ edges computing summation of $\poly$ in $\numedge$ steps, and we can replicate this sum $k$ times to output the final $\poly$, in at most $\numedge + k = \bigO{\numedge}$ steps.
This implies that $\numedge \geq \Omega\inparen{\timeOf{\abbrStepOne}}$, and since the results of \Cref{thm:mult-p-hard-result} and \Cref{th:single-p-hard} are in the number of edges $\numedge$, then it follows that our lower bounds hold with respect to $\timeOf{\abbrStepOne}$.
\end{proof}

View File

@ -3,13 +3,12 @@
\section{Hardness of exact computation}
\label{sec:hard}
In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs under bag semantics. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold. As mentioned previously, all proofs are in the appendix.
In this section, we will prove that computing $\expct\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard. Note that this implies hardness for \bis and general \abbrBPDB, answering \Cref{prob:intro-stmt} in the negative. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preliminaries}
Our hardness results are based on (exactly) counting the number of subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity. We can think of $H$ as being of constant size and $G$ as growing. %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
In particular, we will consider the problems of computing the following counts (given $G$ as an input and its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). Our hardness result in \Cref{sec:multiple-p} is based on the following result:
Our hardness results are based on (exactly) counting the number of (not necessarily induced) subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity. We can think of $H$ as being of constant size and $G$ as growing. %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
In particular, we will consider the problems of computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). Our hardness result in \Cref{sec:multiple-p} is based on the following result:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}[\cite{k-match}]
@ -25,7 +24,7 @@ Our hardness result in Section~\ref{sec:single-p} is based on the following conj
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{hypo}
\label{conj:graph}
There exists a constant $\eps_0>0$ such that given an undirected graph $G=(\vset,\edgeSet)$, computing exactly $\numocc{G}{\tri}$ cannot be done in time $o\inparen{|\edgeSet|^{1+\eps_0}}$.
There exists a constant $\eps_0>0$ such that given an undirected graph $G=(\vset,\edgeSet)$, computing $\numocc{G}{\tri}$ exactly cannot be done in time $o\inparen{|\edgeSet|^{1+\eps_0}}$.
\end{hypo}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
@ -33,8 +32,8 @@ Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-har
%The current best known algorithm to count the number of $3$-matchings, to
%\AR{Need to add something about 3-paths and 3-matchings as well.}
Both of our hardness results rely on a simple query polynomial encoding of the edges of a graph.
To prove our hardness result, consider a graph $G(\vset, \edgeSet)$, where $|\edgeSet| = m$, $|\vset| = \numvar$. Our query polynomial has a variable $X_i$ for every $i$ in $[\numvar]$.
Both of our hardness results rely on a simple lineage polynomial encoding of the edges of a graph.
To prove our hardness result, consider a graph $G(\vset, \edgeSet)$, where $|\edgeSet| = m$, $\vset = [\numvar]$. Our lineage polynomial has a variable $X_i$ for every $i$ in $[\numvar]$.
Consider the polynomial
$\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j.$
The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above:
@ -47,18 +46,28 @@ For any graph $G=(V,\edgeSet)$ and $\kElem\ge 1$, define
%Our hardness results only need a \ti instance; We also consider the special case when all the tuple probabilities (probabilities assigned to $X_i$ by $\probAllTup$) are the same value. Note that our hardness results % do not require the general circuit representation and
%even hold for the expression trees. %this polynomial can be encoded in an expression tree of size $\Theta(km)$.
\noindent Returning to \cref{fig:two-step}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ generalizes our example query from the introduction:
\begin{align*}
\query^k_G \coloneqq &\inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}\times_2\cdots\\
&\cdots \times_k \inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}
\end{align*}
\noindent Returning to \Cref{fig:two-step}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ generalizes our example query from the introduction. Let us alias
\begin{lstlisting}
SELECT 1 FROM OnTime a, Route r, OnTime b
WHERE a.city = r.city1 AND b.city = r.city2
\end{lstlisting}
as $R_i$ for each $i \in [k]$. The query then becomes
\begin{lstlisting}
SELECT 1 FROM $R_1$ JOIN $R_2$ JOIN$\cdots$JOIN $R_k$
\end{lstlisting}
%RA format for the same query
%\begin{align*}
%\query^k_G \coloneqq &\inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}\times_2\cdots\\
%&\cdots \times_k \inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}
%\end{align*}
%\resizebox{1\linewidth}{!}{
%\begin{minipage}{1.05\linewidth}
%\[\poly^k_G\dlImp OnTime(C_1),Route(C_1, C_1'),OnTime(C_1'),\dots,OnTime(C_\kElem),Route(C_\kElem,C_\kElem'),OnTime(C_\kElem')\]
%\end{minipage}
%}
where adapting the PDB instance in \cref{fig:two-step}, relation $OnTime$ has $4$ tuples corresponding to each vertex $v_i$ in $\vset$ for $i$ in $[4]$, each with probability $\prob_i$ and $Route$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
Note that this implies that our hard query polynomial can be represented as an expression tree produced by a project-join query with same probability value for each input tuple $\prob_i$.
where adapting the PDB instance in \Cref{fig:two-step}, relation $OnTime$ has $4$ tuples corresponding to each vertex for $i$ in $[4]$, each with probability $\prob_i$ and $Route$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
Note that this implies that our hard lineage polynomial can be represented as an expression tree produced by a project-join query with same probability value for each input tuple $\prob_i$, and hence is indeed a lineage polynomial for a \abbrTIDB \abbrPDB.
\subsection{Multiple Distinct $\prob$ Values}
\label{sec:multiple-p}
@ -67,7 +76,9 @@ We are now ready to present our main hardness result.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{thm:mult-p-hard-result}
Computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$ and any $(2k+1)$ distinct values $\prob_i$ ($0\le i \le 2k$) is \sharpwonehard (parameterization is in $k$).
Let $\prob_0,\ldots,\prob_{2k}$ be $2k + 1$ distinct values in $(0, 1]$. Then computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$
%and any $(2k+1)$ distinct values $\prob_i$ ($0\le i \le 2k$)
is \sharpwonehard (parameterization is in $k$).
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%

View File

@ -6,18 +6,18 @@
\label{sec:single-p}
%In this discussion, let us fix $\kElem = 3$.
While \Cref{thm:mult-p-hard-result} shows that computing $\rpoly(\prob,\dots,\prob)$ in general is hard it does not rule out the possibility that one can compute this value exactly for a {\em fixed} value of $\prob$. Indeed, it is easy to check that one can compute $\rpoly(\prob,\dots,\prob)$ exactly in linear time for $\prob\in \inset{0,1}$. In this section, we show that these two are the only possibilities:
While \Cref{thm:mult-p-hard-result} shows that computing $\rpoly(\prob,\dots,\prob)$ for multiple values of $\prob$ in general is hard it does not rule out the possibility that one can compute this value exactly for a {\em fixed} value of $\prob$. Indeed, it is easy to check that one can compute $\rpoly(\prob,\dots,\prob)$ exactly in linear time for $\prob\in \inset{0,1}$. Next we show that these two are the only possibilities:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{th:single-p-hard}
Fix $\prob\in (0,1)$. Then assuming \Cref{conj:graph} is true, any algorithm that computes $\rpoly_{G}^3(\prob,\dots,\prob)$ from $G$ exactly has to run in time $\Omega\inparen{m^{1+\eps_0}}$, where $\eps_0$ is as defined in \Cref{conj:graph}.
Fix $\prob\in (0,1)$. Then assuming \Cref{conj:graph} is true, any algorithm that computes $\rpoly_{G}^3(\prob,\dots,\prob)$ for arbitrary $G = (\vset, \edgeSet)$ exactly has to run in time $\Omega\inparen{\abs{\edgeSet}^{1+\eps_0}}$, where $\eps_0$ is as defined in \Cref{conj:graph}.
\end{Theorem}
%\begin{proof}[Proof of Corollary ~\ref{th:single-p-gen-k}]
%Consider $\poly^3_{G}$ and $\poly' = 1$ such that $\poly'' = \poly^3_{G} \cdot \poly'$. By \Cref{th:single-p}, query $\poly''$ with $\kElem = 4$ has $\Omega(\numvar^{\frac{4}{3}})$ complexity.
%\end{proof}
The above shows the hardness for a very specific query polynomial but it is easy to come up with an infinite family of hard query polynomials by `embedding' $\rpoly_{G}^3$ into an infinite family of trivial query polynomials.
The above shows the hardness for a very specific lineage polynomial but it is easy to convert this into a parameterized complexity result as follows. One can come up with an infinite family of hard query polynomials by `embedding' $\rpoly_{G}^3$ into an infinite family of trivial query polynomials.
Unlike \Cref{thm:mult-p-hard-result} the above result does not show that computing $\rpoly_{G}^3(\prob,\dots,\prob)$ for a fixed $\prob\in (0,1)$ is \sharpwonehard.
However, in \Cref{sec:algo} we show that if we are willing to compute an approximation, then this problem (and indeed solving our problem for a much more general setting) is in linear time.
However, in \Cref{sec:algo} we show that if we are willing to compute an approximation, then this problem (and indeed solving our problem for a much more general setting) is in linear time, yielding an affirmative answer to \Cref{prob:intro-stmt}.
%%%%%%%%%%%%%%%%%%%%%%%%%
%NEED to move to appendix
@ -39,7 +39,7 @@ However, in \Cref{sec:algo} we show that if we are willing to compute an approxi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%To prove \cref{th:single-p}, we use the following notion.
%To prove \Cref{th:single-p}, we use the following notion.
%\begin{Definition}\label{def:Gk}
%For $\ell \geq 1$, let graph $\graph{\ell}$ be a graph generated from an arbitrary graph $G$, by replacing every edge $e$ of $G$ with a $\ell$-path, such that all inner vertexes of an $\ell$-path replacement edge are disjoint from all other vertexes.\footnote{Note that $G\equiv \graph{1}$.}% of any other $\ell$-path replacement edge. % in the sense that they only intersect at the original intersection endpoints as seen in $\graph{1}$.
%\end{Definition}
@ -67,17 +67,13 @@ However, in \Cref{sec:algo} we show that if we are willing to compute an approxi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%END move to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The bounds of \Cref{thm:mult-p-hard-result} and \Cref{th:single-p-hard} imply the following corollary.
\AH{Corollary needs refinement.}
\begin{Corollary}
The lower bounds of \cref{thm:mult-p-hard-result} and \cref{th:single-p-hard} hold with respect to $\timeOf{\abbrStepOne}$.
\begin{Corollary}\label{cor:bounds-tlc}
The lower bounds of \Cref{thm:mult-p-hard-result} and \Cref{th:single-p-hard} hold with respect to $\timeOf{\abbrStepOne}$.
\end{Corollary}
\begin{proof}
We start by showing that there exists an algorithm that computes $\poly$ in $\bigO{\numedge}$ time for our hard query in \cref{def:qk}. Assume that the edges $E$ of graph $G$ are encoded in a relation $R$. Then a simple table scan of $\rel$ will iterate through the entire set of $\numedge$ edges computing summation of $\poly$ in $\numedge$ steps, and we can replicate this sum $k$ times to output the final $\poly$, in at most $\numedge + k = \bigO{\numedge}$ steps.
This implies that $\numedge \geq \Omega\inparen{\timeOf{\abbrStepOne}}$, and since the results of \cref{thm:mult-p-hard-result} and \cref{th:single-p-hard} are in the number of edges $\numedge$, then it follows that our lower bounds hold with respect to $\timeOf{\abbrStepOne}$.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%