Done with pass on S3

2021-04-09 00:02:33 -04:00 · 2021-04-09 00:02:33 -04:00 · 2577788aac
parent bfe8baf674
commit 2577788aac
2 changed files with 8 additions and 8 deletions
--- a/mult_distinct_p.tex
+++ b/mult_distinct_p.tex
@ -3,13 +3,13 @@
 \section{Hardness of exact computation}
 \label{sec:hard}

-In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial  $\poly(\vct{X})$ generated from a project-join query is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf(X_i) = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold.
+In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial  $\poly(\vct{X})$ generated from a project-join query (even in an expression tree representation) is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs. Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold.


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Preliminaries}

-Our hardness results are based on (exactly) counting the number of occurrences of a subgraph $H$ in $G$. Let $\numocc{G}{H}$ denote the number of occurrences of $H$ in graph $G$.  We can think of $H$ as being of constant size and $G$ as growing.  In query processing, $H$ can be viewed as the query while $G$ as the database instance.
+Our hardness results are based on (exactly) counting the number of occurrences of a subgraph $H$ in $G$. Let $\numocc{G}{H}$ denote the number of occurrences of $H$ in graph $G$.  We can think of $H$ as being of constant size and $G$ as growing.  %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
 In particular, we will consider the problems of computing the following counts (given $G$ as an input and its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings).  Our hardness result in \Cref{sec:multiple-p} is based on the following result:

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -37,7 +37,7 @@ Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-har
 Both of our hardness results rely on a simple query polynomial encoding of the edges of a graph.
 To prove our hardness result, consider a graph $G(V, E)$, where $|E| = m$, $|V| = \numvar$. Our query polynomial has a variable $X_i$ for every $i$ in $[\numvar]$.
 Consider the polynomial 
-\[\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in E} X_i \cdot X_j\]
+$\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in E} X_i \cdot X_j.$
 The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above:
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}\label{def:qk}
@ -52,7 +52,7 @@ Our hardness results only need a \ti instance; We also consider the special case
 \noindent Returning to in \cref{fig:ex-shipping-simp}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ generalizes our running example query:
 \[\poly^k_G:- Loc(C_1),Route(C_1, C_1'),Loc(C_1'),\dots,Loc(C_\kElem),Route(C_\kElem,C_\kElem'),Loc(C_\kElem')\]
 where adapting the PDB instance in \cref{fig:ex-shipping-simp}, relation $Loc$ has $n$ tuples corresponding to each vertex in $V=[n]$ each with probability $\prob$ and $Route(\text{City}_1, \text{City}_2)$ has tuples corresponding to the edges $E$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
-Note that this implies that our hard query polynomial can be represented even as an expression tree, created from a project-join query with some probability value for each $\prob_i$; our hardness result transfers here as well.
+Note that this implies that our hard query polynomial can be represented even as an expression tree and is created from a project-join query with same probability value for each $\prob_i$. %; our hardness result transfers here as well.
 % OK: The following (commented-out) sentence feels a bit misplaced here.
 % -- by contrast our approximation algorithm in \Cref{sec:algo} can handle lineage polynomials represented as circuits generated by union of select-project-join (SPJU) queries with potentially distinct $\prob_i$ values. % (i.e. we do not need union or select operator to derive our hardness result).

@ -60,7 +60,7 @@ Note that this implies that our hard query polynomial can be represented even as

 \subsection{Multiple Distinct $\prob$ Values}
 \label{sec:multiple-p}
-Unless otherwise noted, all proofs for this section are in~\Cref{app:single-mult-p}.
+%Unless otherwise noted, all proofs for this section are in~\Cref{app:single-mult-p}.
 We are now ready to present our main hardness result.
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
@ -69,9 +69,9 @@ Computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$ and any $(2
 \end{Theorem}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
-We will prove the above result by reducing from the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state-of-the-art $k$-matching algorithms are improved, we cannot hope to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its \abbrSMB form and then using \Cref{cor:expct-sop}. By contrast the approximation algorithm we present in \Cref{sec:algo} has runtime $O_k\inparen{m}$ for  this query (since it runs in linear-time on all lineage polynomials).
+We will prove the above result by reducing from the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state-of-the-art $k$-matching algorithms are improved, we cannot hope to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its \abbrSMB form and then using \Cref{cor:expct-sop}. By contrast the approximation algorithm we present in \Cref{sec:algo} has runtime $O_k\inparen{m}$ for  this query. % (since it runs in linear-time on all lineage polynomials).

-\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem:
+\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem (and proves~\Cref{thm:mult-p-hard-result}):
 \begin{Lemma}\label{lem:qEk-multi-p}
 Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$.  Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $O\inparen{\kElem^3}$ time.
 \end{Lemma}
--- a/single_p.tex
+++ b/single_p.tex
@ -36,6 +36,7 @@ in $O\inparen{T(\numedge) + \numedge}$ time.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


+The following result immediately implies \Cref{th:single-p}:
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Lemma}\label{lem:lin-sys}
 Fix $\prob\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [2]$, we can compute in $O(m)$ time a vector $\vct{b}\in\mathbb{R}^3$ such that
@ -54,7 +55,6 @@ allowing us to compute $\numocc{G}{\tri}$ and $\numocc{G}{\threedis}$ in $O(1)$
 \end{Lemma}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
-This result immediately implies \Cref{th:single-p}:
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%