Merge branch 'master' of gitlab.odin.cse.buffalo.edu:ahuber/SketchingWorlds

This commit is contained in:
Oliver Kennedy 2020-12-18 16:14:22 -05:00
commit b096ab186a
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
4 changed files with 94 additions and 50 deletions

View file

@ -2,17 +2,20 @@
\section{Hardness of exact computation}
\label{sec:hard}
\AH{The notation used here is different than in~\Cref{sec:background}, in particular~\Cref{eq:expect-q-nx}. Maybe we should decide on a notation and try to stick to it as much as possible?}
We would like to argue for a compressed version of $\poly(\vct{X})$, in general $\expct\limits_{\vct{X} \sim \pd}\pbox{\poly(\vct{X})}$ even for TIDB, cannot be computed in linear time. We will argue two flavors of such a hardness result. In Section~\ref{sec:multiple-p}, we argue that computing the expected value exactly for all query polynommials $\poly(\vct{X})$ for multiple values of $p$ is \sharpwonehard. However, this does not rule out the possibility of being able to solve the problem for a any {\em fixed} value of $p$ in linear time. In Section~\ref{sec:single-p}, we rule out even this possibility (based on some popular hardness conjectures in fine-grained complexity).
\AH{The notation used here is different than in~\Cref{sec:background}, in particular~\Cref{eq:expect-q-nx}. Maybe we should decide on a notation and try to stick to it as much as possible?}
\BG{We sometimes use $\expct_{\vct{X} \sim P}$ sometimes $\expct_{\vct{X}}$}
In this section, we will prove that computing $\expct\limits_{\vct{X} \sim \pd}\pbox{\poly(\vct{X})}$ for a \tis-lineage polynomial $\poly(\vct{X})$ generated from a project-join query is \sharpwonehard. Note that this implies hardness for \bis and general $\semNX$-PDBs. Furthermore, we demonstrate \Cref{sec:single-p} that the problem remains hard, even if $\pd(X_i) = p$ for all $X_i$ and some fixed valued $p$ as long as these conjectures hold. Finally, using popular hardness conjectures in fine-grained complexity we show that if these conjectures hold and except for the trivial choices of $p \in \{0,1\}$, the problem is hard for any given $p$.
% We would like to argue for a compressed version of $\poly(\vct{X})$, in general $\expct\limits_{\vct{X} \sim \pd}\pbox{\poly(\vct{X})}$ even for tis, cannot be computed in linear time. We will argue two flavors of such a hardness result. In Section~\ref{sec:multiple-p}, we argue that computing the expected value exactly for all query polynommials $\poly(\vct{X})$ for multiple values of $p$ is \sharpwonehard. However, this does not rule out the possibility of being able to solve the problem for a any {\em fixed} value of $p$ in linear time. In Section~\ref{sec:single-p}, we rule out even this possibility (based on some popular hardness conjectures in fine-grained complexity).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preliminaries}
Our hardness results are based on (exactly) counting the number of occurrences of a fixed graph $H$ as a subgraph in $G$. Let $\numocc{G}{H}$ denote the number of occurrences of pattern $H$ in graph $G$. %, where, for example, $\numocc{G}{\ed}$ means the number of single edges in $G$.
In particular, we will consider the problems of computing the following counts (given $G$ as an input in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threepath}$ (the number of $3$-paths), $\numocc{G}{\threedis}$ (the number of $3$-matchings or collection of three node disjoint edges) and its generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings or collections fo $k$ node disjoint edges).
In particular, we will consider the problems of computing the following counts (given $G$ as an input in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threepath}$ (the number of $3$-paths), $\numocc{G}{\threedis}$ (the number of $3$-matchings or collection of three node disjoint edges) and its generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings or collections of $k$ node-disjoint edges).
Our hardness result in Section~\ref{sec:multiple-p} is based on the following hardness result:
Our hardness result in \Cref{sec:multiple-p} is based on the following hardness result:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}[\cite{k-match}]
@ -25,42 +28,50 @@ Given a positive integer $k$ and an undirected graph $G$ with no self-loops or
The above result means that we cannot hope to count the number of $k$-matchings in $G=(V,E)$ in time $f(k)\cdot |V|^{O(1)}$ for any function $f$. In fact, all known algorithms to solve this problem take time $|V|^{\Omega(k)}$.
Our hardness result in Section~\ref{sec:single-p} is based on the following conjectured hardness result:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{hypo}
\label{conj:graph}
There exists a constant $\eps_0>0$ such that given an undirected graph $G=(V,E)$, computing exactly the values $\numocc{G}{\tri}$, $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ cannot be done in time $o\inparen{|E|^{1+\eps_0}}$.
\end{hypo}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-hard}), which states that detection whether $G$ has a triangle or not takes time $\Omega\inparen{|E|^{4/3}}$, implies that in Conjecture~\ref{conj:graph} we can take $\eps_0\ge \frac 13$.
\AR{Need to add something about 3-paths and 3-matchings as well.}
Both of our hardness results use a query polynomial that is based on a simple encoding of the edges of a graph.
To prove our hardness result, consider a graph $G(V, E)$, where $|E| = m$, $|V| = \numvar$. Our query polynomial will have a variable $X_i$ for every $i$ in $[\numvar]$.
Now consider the query
Now consider the polynomial
\[\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in E} X_i \cdot X_j.\]
The hard query polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above, i.e.
The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above, i.e.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}
Let $G=([n],E)$ be a graph. Then for any $\kElem\ge 1$, define
\[\poly_{G}^\kElem(X_1,\dots,X_n) = \left(\sum\limits_{(i, j) \in E} X_i \cdot X_j\right)^\kElem.\]
\end{Definition}
Our hardness results only need a TIDB instance and further, we consider the special case when all the tuple probabilities are the same value. It is not too hard to see that we can encode the above polynomial in an expression tree of size $\Theta(km)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Our hardness results only need a \ti instance and further, we consider the special case when all the tuple probabilities (probabilities assigned by to $X_i$ by $\vct{p}$) are the same value. It is not too hard to see that we can encode the above polynomial in an expression tree of size $\Theta(km)$.
Following up on the discussion around Example~\ref{ex:intro}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ is the query polynomial corresponding to the following query:
\[\poly^k_G:- R(A_1),E(A_1,B_1),R(B_1),\dots,R(A_\kElem),E(A_\kElem,B_\kElem),R(B_\kElem)\]
where generalizaing the PDB instance in Example~\ref{ex:intro}, relation $R$ has $n$ tuples corresponding to each vertex in $V=[n]$ each with probability $p$ and $E(A,B)$ has tuples corresponding to the edges in $E$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $E$ as well but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $E$ also are present with probability $p$ but to make notation a bit simpler, we make this simplification.}
where generalizaing the PDB instance in Example~\ref{ex:intro}, relation $R$ has $n$ tuples corresponding to each vertex in $V=[n]$ each with probability $p$ and $E(A,B)$ has tuples corresponding to the edges in $E$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $E$ as well but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $E$ also are present with probability $p$ but to simplify notation we assign probability $1$ to edges.}
Note that this imples that our hard query polynimial can be created from a join-project query-- by contrast our approximation algorithm in Section~\ref{sec:algo} can handle lineage polynonmials generated by union of select-project-join queries. % (i.e. we do not need union or select operator to derive our hardness result).
Note that this imples that our hard query polynomial can be created from a project-join query -- by contrast our approximation algorithm in \Cref{sec:algo} can handle lineage polynomials generated by union of select-project-join queries. % (i.e. we do not need union or select operator to derive our hardness result).
%\AR{need discussion on the `tightness' of various params. First, this is for degree 6 poly-- while things are easy for say deg 2. Second this is for any fixed p. Finally, we only need porject-join queries to get the hardness results. Also need to compare this with the generality of the approx upper bound results.}
%\AR{need discussion on the `tightness' of various params. First, this is for degree 6 poly-- while things are easy for say deg 2. Second this is for any fixed p. Finally, we only need project-join queries to get the hardness results. Also need to compare this with the generality of the approx upper bound results.}
\subsection{Multiple Distinct $\prob$ Values}
\label{sec:multiple-p}
We are now ready to present our main hardness result.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{thm:mult-p-hard-result}
Computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$ and any $(2k+1)$ values $\prob_i$ ($0\le i \le 2k$) is \sharpwonehard.
\end{Theorem}
We will prove the above result by reducing the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state of the art $k$-matching algorithms are improved, we cannot hope to have a better runtime to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its SOP form and then using~\Cref{cor:expct-sop}. By constrast our approximation algorithm would run in time $O_k\inparen{m}$ on this query (since it runs in linear-time on all query polynomials).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
We will prove the above result by reduction from the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state-of-the-art $k$-matching algorithms are improved, we cannot hope to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its \abbrSMB form and then using \Cref{cor:expct-sop}. By contrast the approximation algorithm we present in \Cref{sec:algo} runtime is in$O_k\inparen{m}$ for this query (since it runs in linear-time on all lineage polynomials).
As mentioned earlier, we prove our hardness result by presenting a reduction from the problem of couting $\kElem$-matchings in a graph:
\begin{Lemma}\label{lem:qEk-multi-p}
@ -68,6 +79,7 @@ Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$. Then give
\end{Lemma}
Before we prove the above Lemma, let us use it to prove~\Cref{thm:mult-p-hard-result}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof of Theorem~\ref{thm:mult-p-hard-result}]
For the sake of contradiction, let us assume we can solve our problem in $f(\kElem)\cdot m^c$ time for some absolute constant $c$. Then given a graph $G$ we can compute the query polynomial $\rpoly_G^\kElem$ (in the obvious way) in $O(km)$ time. Then after we run our algorithm on $\rpoly_G^\kElem$, we get $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$ in additional $f(\kElem)\cdot m^c$ time. \Cref{lem:qEk-multi-p} then computes the number of $k$-matchings in $G$ in $O(\kElem^3)$ time. Thus, overall we have an algorithm for computing the number of $k$-matchings in time
\begin{align*}
@ -75,11 +87,13 @@ For the sake of contradiction, let us assume we can solve our problem in $f(\kEl
&\le \inparen{O(\kElem^3) + f(\kElem)}\cdot m^{c+1} \\
&\le \inparen{O(\kElem^3) + f(\kElem)}\cdot n^{2c+2},
\end{align*}
which contradicts~\cref{thm:k-match-hard}.
which contradicts \Cref{thm:k-match-hard}.
\end{proof}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Finally, we are ready to prove~\Cref{lem:qEk-multi-p}:
\begin{proof}[Proof of ~\cref{lem:qEk-multi-p}]
Finally, we are ready to prove \Cref{lem:qEk-multi-p}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof of \Cref{lem:qEk-multi-p}]
%It is trivial to see that one can readily expand the exponential expression by performing the $n^\kElem$ product operations, yielding the polynomial in the sum of products form of the lemma statement. By definition $\rpoly_{G}^\kElem$ reduces all variable exponents greater than $1$ to $1$. Thus, a monomial such as $X_i^\kElem X_j^\kElem$ is $X_iX_j$ in $\rpoly_{G}^\kElem$, and the value after substitution is $p_i\cdot p_j = p^2$. Further, that the number of terms in the sum is no greater than $2\kElem + 1$, can be easily justified by the fact that each edge has two endpoints, and the most endpoints occur when we have $\kElem$ distinct edges (such a subgraph is also known as a $\kElem$-matching), with non-intersecting points, a case equivalent to $p^{2\kElem}$.
We first argue that $\rpoly_{G}^\kElem(\prob,\ldots, \prob) = \sum\limits_{i = 0}^{2\kElem} c_i \cdot \prob^i$. First, since $\poly_G(\vct{X})$ has %$\kElem$ products of monomials of
degree $2$, it follows that $\poly_G^\kElem(\vct{X})$ has degree $2\kElem$.
@ -94,7 +108,7 @@ By definition, $\rpoly_{G}^{\kElem}(\vct{X})$ sets every exponent $e > 1$ to $e
\rpoly_{G}^{\kElem}(\prob,\ldots, \prob) = \sum_{i = 0}^{2\kElem} c_i \prob^i
\end{equation*}
We note that $c_i$ is {\em exactly} the number of monomials in the SOP expansion of $\poly_{G}^{\kElem}(\vct{X})$ composed of $i$ distinct variables.%, with $\prob$ substituted for each distinct variable
We note that $c_i$ is {\em exactly} the number of monomials in the SOP\BG{\abbrSMB?} expansion of $\poly_{G}^{\kElem}(\vct{X})$ composed of $i$ distinct variables.%, with $\prob$ substituted for each distinct variable
\footnote{Since $\rpoly_G^\kElem(\vct{X})$ does not have any monomial with degree $< 2$, it is the case that $c_0 = c_1 = 0$ but for the sake of simplcity we will ignore this observation.}
Given that we then have $2\kElem + 1$ distinct values of $\rpoly_{G}^\kElem(\prob,\ldots, \prob)$ for $0\leq i\leq2\kElem$, it follows that
@ -105,7 +119,7 @@ Given that we then have $2\kElem + 1$ distinct values of $\rpoly_{G}^\kElem(\pro
We claim that $c_{2\kElem}$ is $\kElem! \cdot \numocc{G}{\kmatch}$. This can be seen intuitively by looking at the original factorized representation
\[\poly_{G}^\kElem(\vct{X}) = \sum_{\substack{(i_1, j_1),\\\cdots,\\(i_\kElem, j_\kElem) \in E}}X_{i_1}X_{j_1}\cdots X_{i_\kElem}X_{j_\kElem},\]
where across each of the $\kElem$ products, an arbitrary $\kElem$-matching can be selected $\prod_{i = 1}^\kElem \kElem = \kElem!$ times. Next, note that each $\kElem$-matching $(i_1, j_1)\ldots$ $(i_k, j_k)$ in $G$ corresponds to the monomial $\prod_{\ell = 1}^\kElem X_{i_\ell}X_{j_\ell}$ in $\poly_{G}^\kElem(\vct{X})$, with all indexes distinct. %Since each index is distinct, then each variable has an exponent $e = 1$ and this monomial survives in $\rpoly_{G}^{\kElem}(\vct{X})$ Since $\rpoly$ contains only exponents $e \leq 1$, the only degree $2\kElem$ terms that can exist in $\rpoly_{G}^\kElem$ are $\kElem$-matchings since every other monomial in $\poly_{G}^\kElem(\vct{X})$ has strictly less than $2\kElem$ distinct variables, which, as stated earlier implies that every other non-$\kElem$-matching monomial in $\rpoly_{G}^\kElem(\vct{X})$ has degree $< 2\kElem$.
Second, the only surviving monomiasl $\prod_{\ell = 1}^\kElem X_{i_\ell}X_{j_\ell}$ of degree exactly $2k$ in $\rpoly_{G}^{\kElem}(\vct{X})$ must have that all of $i_1,j_1,\dots,i_\kElem,j_\kElem$ are distinct in $\poly_{G}^{\kElem}(\vct{X})$. Then, by the last two statements, only monomials composed of $2k$ distinct variables in $\poly_{G}^{\kElem}(\vct{X})$ (and hence of degree $2\kElem$ in $\rpoly_{G}^{\kElem}(\vct{X})$) correspond to a $k$-matching in $G$.
Second, the only surviving monomials $\prod_{\ell = 1}^\kElem X_{i_\ell}X_{j_\ell}$ of degree exactly $2k$ in $\rpoly_{G}^{\kElem}(\vct{X})$ must have that all of $i_1,j_1,\dots,i_\kElem,j_\kElem$ are distinct in $\poly_{G}^{\kElem}(\vct{X})$. Then, by the last two statements, only monomials composed of $2k$ distinct variables in $\poly_{G}^{\kElem}(\vct{X})$ (and hence of degree $2\kElem$ in $\rpoly_{G}^{\kElem}(\vct{X})$) correspond to a $k$-matching in $G$.
%It has already been established above that a $\kElem$-matching ($\kmatch$) has coefficient $c_{2\kElem}$. As noted, a $\kElem$-matching occurs when there are $\kElem$ edges, $e_1, e_2,\ldots, e_\kElem$, such that all of them are disjoint, i.e., $e_1 \neq e_2 \neq \cdots \neq e_\kElem$. In all $\kElem$ factors of $\poly_{G}^\kElem(\vct{X})$ there are $k$ choices from the first factor to select an edge for a given $\kElem$ matching, $\kElem - 1$ choices in the second factor, and so on throughout all the factors, yielding $\kElem!$ duplicate terms for each $\kElem$ matching in the expansion of $\poly_{G}^\kElem(\vct{X})$.
Notice that %we have $\kElem!$ duplicates of
@ -114,11 +128,9 @@ each of the $k!$ permutations of an arbitrary monomial maps to the same distinct
%Since $c_{2k}$ is the cardinality of the multi-set of degree $2\kElem$ monomials in $\rpoly_{G}^{\kElem}(\vct{X})$, where each $\kElem$-matching in $G$ has $\kElem!$ monomial representations,
It then follows that $c_{2\kElem}= \kElem! \cdot \numocc{G}{\kmatch}$.
% and the fact that $c_{2\kElem}$ contains all monomials with degree $2\kElem$, it follows that $c_{2\kElem} = \kElem!\cdot\numocc{G}{\kmatch}$.
Thus, simply dividing $c_{2\kElem}$ by $\kElem!$ gives us $\numocc{G}{\kmatch}$, as needed. % by simply dividing $c_{2\kElem}$ by $\kElem!$.
Thus, simply dividing $c_{2\kElem}$ by $\kElem!$ gives us $\numocc{G}{\kmatch}$, as needed. \qed % by simply dividing $c_{2\kElem}$ by $\kElem!$.
\end{proof}
\qed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View file

@ -47,8 +47,8 @@ The degree of the running example polynomial is $2$. In this paper we consider o
% \end{Assumption}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}), if
\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.} there exists an n-ary $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (\emph{\ti-lineage polynomial}, or simply lineage polynomial), if
\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.} there exists an n-ary $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$, or $\semNX$-PDB $\pxdb$), and n-ary tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
Note the \tis are a special case of \bis and, thus, the following applies to \tis as well.
Recall that in a \bi $\pxdb$ with tuples $t_1, \ldots, t_n$, each input tuple $t_i$ is annotated with a unique variable $X_i$. The tuples of $\pxdb$ are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ and each tuple $t_i$ is associated with a probability $\prob(\tup_i) = \pd[X_i = 1]$. Together with the assumption that blocks are assumed to be independent and tuples from the same block are disjoint events, $\prob$ and the blocks induce the probability distribution $\pd$ of $\pxdb$.
We will write a \bi-lineage polynomial $\poly(\vct{X})$ for a \bi with $\ell$ blocks as

View file

@ -199,7 +199,8 @@ We are now ready to formally state the main problem addressed in this work.
\begin{Definition}[The Expected Result Multiplicity Problem]\label{def:the-expected-multipl}
Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$ with probability distribution $\pd$ over assignments $\vct{X} \to [0,1]$, $\query$ an n-ary query, and $t$ an n-ary tuple.
The \expectProblem is defined as follows:
\AH{I think we mean $\poly(\vct{X}) = \query(\pxdb)(t)$ instead of $\poly(\vct{X}) = \query(\pdb)(t)$. I changed the following to reflect this.}
\AH{I think we mean $\poly(\vct{X}) = \query(\pxdb)(t)$ instead of $\poly(\vct{X}) = \query(\pdb)(t)$. I changed the following to reflect this.}
\BG{Correct}
\begin{itemize}
\item \textbf{Input}: Given an expression tree $\etree \in \etreeset{\smb}$ for $\poly(\vct{X}) = \query(\pxdb)(t)$
\item \textbf{Output}: $\expct_{\vct{X} \sim \pd}[\poly(\vct{X})]$

View file

@ -1,36 +1,42 @@
%root: main.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Single $\prob$ value}
\label{sec:single-p}
%In this discussion, let us fix $\kElem = 3$.
While~\cref{thm:mult-p-hard-result} shows that computing $\rpoly(\prob,\dots,\prob)$ in general is hard it does not rule out the possibility that one can compute this value exactly for a {\em fixed} value of $p$. Indeed, it is easy to check that one can compute $\rpoly(\prob,\dots,\prob)$ exactly in linear time for $p\in \inset{0,1}$. In this section, we show that these two are the only possibilities:
While \Cref{thm:mult-p-hard-result} shows that computing $\rpoly(\prob,\dots,\prob)$ in general is hard it does not rule out the possibility that one can compute this value exactly for a {\em fixed} value of $p$. Indeed, it is easy to check that one can compute $\rpoly(\prob,\dots,\prob)$ exactly in linear time for $p\in \inset{0,1}$. In this section, we show that these two are the only possibilities:
\begin{Theorem}\label{cor:single-p-hard}
Fix $p\in (0,1)$. Then assuming~\cref{conj:graph} is true, then any algorithm that computes $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly has to run in time $\Omega\inparen{\abs{E(G)}^{1+\eps_0}}$, where $\eps_0$ is as defined in~\cref{conj:graph}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{th:single-p-hard}
Fix $p\in (0,1)$. Then assuming \Cref{conj:graph} is true, then any algorithm that computes $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly has to run in time $\Omega\inparen{\abs{E(G)}^{1+\eps_0}}$, where $\eps_0$ is as defined in \Cref{conj:graph}.
\end{Theorem}
%\begin{proof}[Proof of Corollary ~\ref{cor:single-p-gen-k}]
%Consider $\poly^3_{G}$ and $\poly' = 1$ such that $\poly'' = \poly^3_{G} \cdot \poly'$. By ~\cref{th:single-p}, query $\poly''$ with $\kElem = 4$ has $\Omega(\numvar^{\frac{4}{3}})$ complexity.
%\begin{proof}[Proof of Corollary ~\ref{th:single-p-gen-k}]
%Consider $\poly^3_{G}$ and $\poly' = 1$ such that $\poly'' = \poly^3_{G} \cdot \poly'$. By \Cref{th:single-p}, query $\poly''$ with $\kElem = 4$ has $\Omega(\numvar^{\frac{4}{3}})$ complexity.
%\end{proof}
The above shows the hardness for a very specific query polynomial but it is easy to come up with an infinite family of hard query polynomials by `embedding' $\rpoly_{G}^3$ into an infinite family of trivial query polynomials. However, unlike~\cref{thm:mult-p-hard-result} the above result does not show that computing $\rpoly_{G}^3(\prob,\dots,\prob)$ for a fixed $p\in (0,1)$ is \sharpwonehard. By contrast, in~\cref{sec:algo} we show that if we are willing to compute an approximation that this problem (and indeed solving our problem for a much more general setting) is in linear time.
The above shows the hardness for a very specific query polynomial but it is easy to come up with an infinite family of hard query polynomials by `embedding' $\rpoly_{G}^3$ into an infinite family of trivial query polynomials. However, unlike \Cref{thm:mult-p-hard-result} the above result does not show that computing $\rpoly_{G}^3(\prob,\dots,\prob)$ for a fixed $p\in (0,1)$ is \sharpwonehard. By contrast, in \Cref{sec:algo} we show that if we are willing to compute an approximation that this problem (and indeed solving our problem for a much more general setting) is in linear time.
%\AH{@atri needs to put in the result for triangles of $\numvar^{\frac{4}{3}}$ runtime.}
We will prove the above result by the following reduction:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{th:single-p}
Fix $p\in (0,1)$. Let $G$ be a graph on $\numedge$ edges.
If we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly in $T(\numedge)$ time, then we can exactly compute $\numocc{G}{\tri}$, $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ %count the number of triangles, 3-paths, and 3-matchings in $G$
in $O\inparen{T(\numedge) + \numedge}$ time.
\end{Theorem}
\begin{proof}[Proof of~\cref{cor:single-p-hard}]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof of \Cref{th:single-p}]
For the sake of contradiction, let us assume that for any $G$, we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ in $o\inparen{m^{1+\eps_0}}$ time.
Let $G$ be the input graph. It is easy to see that one can compute the expression tree for $\poly_{G}^3(\vct{X})$ in $O(m)$ time. Then by~\cref{th:single-p} we can compute $\numocc{G}{\tri}$, $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ in further time $o\inparen{m^{1+\eps_0}}+O(m)$. Thus, the overall, reduction takes $o\inparen{m^{1+\eps_0}}+O(m)= o\inparen{m^{1+\eps_0}}$ time, which violates~\cref{conj:graph}.
Let $G$ be the input graph. It is easy to see that one can compute the expression tree for $\poly_{G}^3(\vct{X})$ in $O(m)$ time. Then by \Cref{th:single-p} we can compute $\numocc{G}{\tri}$, $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ in further time $o\inparen{m^{1+\eps_0}}+O(m)$. Thus, the overall, reduction takes $o\inparen{m^{1+\eps_0}}+O(m)= o\inparen{m^{1+\eps_0}}$ time, which violates \Cref{conj:graph}.
\end{proof}
\qed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Before moving on to prove ~\cref{th:single-p}, let us state the results, lemmas and defintions that will be useful in the proof.
Before moving on to prove \Cref{th:single-p-hard}, let us state the results, lemmas and defintions that will be useful in the proof.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Preliminaries and Notation}
We need to list all possible edge patterns in an arbitrary $G$ consisting of at most three distinct edges. We have already seen $\tri,\threepath$ and $\threedis$, so here we define the remaining patterns:
@ -48,7 +54,7 @@ We need to list all possible edge patterns in an arbitrary $G$ consisting of at
%Let $\numocc{G}{H}$ denote the number of occurrences of pattern $H$ in graph $G$, where, for example, $\numocc{G}{\ed}$ means the number of single edges in $G$.
For any graph $G$, the following formulas for $\numocc{G}{H}$ for their respective patterns can be used to compute them exactly in $O(\numedge)$ time, with $d_i$ representing the degree of vertex $i$ (proofs are in~\cref{app:easy-counts}):
For any graph $G$, the following formulas for $\numocc{G}{H}$ for their respective patterns can be used to compute them exactly in $O(\numedge)$ time, with $d_i$ representing the degree of vertex $i$ (proofs are in \Cref{app:easy-counts}):
\begin{align}
&\numocc{G}{\ed} = \numedge, \label{eq:1e}\\
&\numocc{G}{\twopath} = \sum_{i \in V} \binom{d_i}{2} \label{eq:2p}\\
@ -57,9 +63,9 @@ For any graph $G$, the following formulas for $\numocc{G}{H}$ for their respecti
&\numocc{G}{\twopathdis} + 3\numocc{G}{\threedis} = \sum_{(i, j) \in E} \binom{\numedge - d_i - d_j + 1}{2}\label{eq:2pd-3d}
\end{align}
%A quick argument to why \cref{eq:2m} is true. Note that for edge $(i, j)$ connecting arbitrary vertices $i$ and $j$, finding all other edges in $G$ disjoint to $(i, j)$ is equivalent to finding all edges that are not connected to either vertex $i$ or $j$. The number of such edges is $m - d_i - d_j + 1$, where we add $1$ since edge $(i, j)$ is removed twice when subtracting both $d_i$ and $d_j$. Since the summation is iterating over all edges such that a pair $\left((i, j), (k, \ell)\right)$ will also be counted as $\left((k, \ell), (i, j)\right)$, division by $2$ then eliminates this double counting.
%A quick argument to why \Cref{eq:2m} is true. Note that for edge $(i, j)$ connecting arbitrary vertices $i$ and $j$, finding all other edges in $G$ disjoint to $(i, j)$ is equivalent to finding all edges that are not connected to either vertex $i$ or $j$. The number of such edges is $m - d_i - d_j + 1$, where we add $1$ since edge $(i, j)$ is removed twice when subtracting both $d_i$ and $d_j$. Since the summation is iterating over all edges such that a pair $\left((i, j), (k, \ell)\right)$ will also be counted as $\left((k, \ell), (i, j)\right)$, division by $2$ then eliminates this double counting.
%\cref{eq:2pd-3d} is true for similar reasons. For edge $(i, j)$, it is necessary to find two additional edges, disjoint or connected. As in ~\cref{eq:2m}, once the number of edges disjoint to $(i, j)$ have been computed, then we only need to consider all possible combinations of two edges from the set of disjoint edges, since it doesn't matter if the two edges are connected or not. Note, the factor $3$ of $\threedis$ is necessary to account for the triple counting of $3$-matchings. It is also the case that, since the two path in $\twopathdis$ is connected, that there will be no double counting by the fact that the summation automatically 'disconnects' the current edge, meaning that a two matching at the current vertex will not be counted. The sum over all such edge combinations is precisely then $\numocc{G}{\twopathdis} + 3\numocc{G}{\threedis}$.
%\Cref{eq:2pd-3d} is true for similar reasons. For edge $(i, j)$, it is necessary to find two additional edges, disjoint or connected. As in \Cref{eq:2m}, once the number of edges disjoint to $(i, j)$ have been computed, then we only need to consider all possible combinations of two edges from the set of disjoint edges, since it doesn't matter if the two edges are connected or not. Note, the factor $3$ of $\threedis$ is necessary to account for the triple counting of $3$-matchings. It is also the case that, since the two path in $\twopathdis$ is connected, that there will be no double counting by the fact that the summation automatically 'disconnects' the current edge, meaning that a two matching at the current vertex will not be counted. The sum over all such edge combinations is precisely then $\numocc{G}{\twopathdis} + 3\numocc{G}{\threedis}$.
%Original lemma proving the exact coefficient terms in qE3
@ -67,7 +73,8 @@ For any graph $G$, the following formulas for $\numocc{G}{H}$ for their respecti
\subsubsection{The proofs}
Note that $\rpoly_{G}^3(\prob,\ldots, \prob)$ as polynomial in $\prob$ has degree at most six. Next, we figure out the exact coefficients since this would be useful in our arguments:
Note that $\rpoly_{G}^3(\prob,\ldots, \prob)$ as a polynomial in $\prob$ has degree at most six. Next, we figure out the exact coefficients since this would be useful in our arguments:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:qE3-exp}
%When we expand $\poly_{G}^3(\vct{X})$ out and assign all exponents $e \geq 1$ a value of $1$, we have the following result,
For any $p$, we have:
@ -75,9 +82,11 @@ For any $p$, we have:
&\rpoly_{G}^3(\prob,\ldots, \prob) = \numocc{G}{\ed}\prob^2 + 6\numocc{G}{\twopath}\prob^3 + 6\numocc{G}{\twodis} + 6\numocc{G}{\tri}\prob^3\nonumber\\
&+ 6\numocc{G}{\oneint}\prob^4 + 6\numocc{G}{\threepath}\prob^4 + 6\numocc{G}{\twopathdis}\prob^5 + 6\numocc{G}{\threedis}\prob^6.\label{claim:four-one}
\end{align}
\end{Lemma}
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}%[Proof of \cref{lem:qE3-exp}]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}%[Proof of \Cref{lem:qE3-exp}]
By definition we have that
\[\poly_{G}^3(\vct{X}) = \sum_{\substack{(i_1, j_1),\\ (i_2, j_2),\\ (i_3, j_3) \in E}} \prod_{\ell = 1}^{3}X_{i_\ell}X_{j_\ell}.\]
Hence $\rpoly_{G}^3(\vct{X})$ has degree six. Note that the monomial $\prod_{\ell = 1}^{3}X_{i_\ell}X_{j_\ell}$ will contribute to the coefficient of $p^i$ in $\rpoly_{G}^3(\vct{X})$, where $i$ is the number of distinct variables in the monomial.
@ -93,13 +102,18 @@ Since $e\ne e'$, this case produces the following edge patterns: $\twopath, \two
\textsc{case 3:} All $e_1,e_2$ and $e_3$ are distinct. For this case, we have $3! = 6$ permutations of $(e_1, e_2, e_3)$, each of which contribute to a different monomial in the SOP expansion of $\poly_{G}^3(\vct{X})$. This case consists of the following edge patterns: $\tri, \oneint, \threepath, \twopathdis, \threedis$, which contribute $p^3,p^4,p^4,p^5$ and $p^6$ respectively to $\rpoly_{G}^3\left(\prob,\ldots, \prob\right)$.
\end{proof}
\qed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Since $p$ is fixed,~\cref{lem:qE3-exp} gives us one linear equation in $\numocc{G}{\tri},$ $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ (we can handle the other counts due to~\cref{eq:1e}-\cref{eq:2pd-3d}). However, we plan to generate two more independent linear equations in these three variables. Towards, this end we generate more graphs that are related to $G$:
Since $p$ is fixed, \Cref{lem:qE3-exp} gives us one linear equation in $\numocc{G}{\tri},$ $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ (we can handle the other counts due to \Cref{eq:1e}-\Cref{eq:2pd-3d}). However, we plan to generate two more independent linear equations in these three variables. Towards, this end we generate more graphs that are related to $G$:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}\label{def:Gk}
For $\ell > 1$, let graph $\graph{\ell}$ be a graph generated from an arbitrary graph $\graph{1}$, by replacing every edge $e$ of $\graph{1}$ with a $\ell$-path, such that all $\ell$-path replacement edges are disjoint. % in the sense that they only intersect at the original intersection endpoints as seen in $\graph{1}$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Next, we relate the various sub-graph counts in $\graph{2}$ and $\graph{3}$ to their counterparts in $\graph{1}=G$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:3m-G2}
The number of $3$-matchings in graph $\graph{2}$ satisfies the following identity,
\begin{align*}
@ -107,7 +121,9 @@ The number of $3$-matchings in graph $\graph{2}$ satisfies the following identit
&+ 4 \cdot \numocc{\graph{1}}{\oneint} + 4 \cdot \numocc{\graph{1}}{\threepath} + 2 \cdot \numocc{\graph{1}}{\tri}.
\end{align*}
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:3m-G3}
The number of 3-matchings in $\graph{3}$ satisfy the following identity,
\begin{align*}
@ -117,25 +133,32 @@ The number of 3-matchings in $\graph{3}$ satisfy the following identity,
\end{align*}
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:3p-G2}
The number of $3$-paths in $\graph{2}$ satisfies the following identity,
\[\numocc{\graph{2}}{\threepath} = 2 \cdot \numocc{\graph{1}}{\twopath}.\]
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:3p-G3}
The number of $3$-paths in $\graph{3}$ satisfies the following identity,
\[\numocc{\graph{3}}{\threepath} = \numocc{\graph{1}}{\ed} + 2 \cdot \numocc{\graph{1}}{\twopath}.\]
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:tri}
For $\ell > 1$, any graph $\graph{\ell}$ has the property that $\numocc{\graph{\ell}}{\tri} = 0$.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Using the results we have obtained so far, we will prove the following reduction result:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:lin-sys}
%Using the identities of lemmas [\ref{lem:3m-G2}, \ref{lem:3m-G3}, \ref{lem:3p-G2}, \ref{lem:3p-G3}, \ref{lem:tri}] to compute $\numocc{G}{\threedis}, \numocc{G}{\threepath}, \numocc{G}{\tri}$ for $G \in \{\graph{2}, \graph{3}\}$, there exists a linear system $\mtrix{\rpoly}\cdot (x~y~z~)^T = \vct{b}$ which can then be solved to determine the unknown quantities of $\numocc{\graph{1}}{\threedis}, \numocc{\graph{1}}{\threepath}$, and $\numocc{\graph{1}}{\tri}$.
Fix $p\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [3]$, we can compute in $O(m)$ time a vector $\vct{b}\in\rel^3$ such that
@ -154,19 +177,22 @@ Fix $p\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\i
\]
from which we can compute $\numocc{G}{\tri}, \numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ in $O(1)$ time.
\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Due to lack of space we defer the proof of the above results to~\Cref{subsec:proofs-struc-lemmas}.
Due to lack of space we defer the proof of the above results to \Cref{subsec:proofs-struc-lemmas}.
The above result immediately implies~\cref{th:single-p}:
\begin{proof}[Proof of~\cref{th:single-p}]
The above result immediately implies \Cref{th:single-p-hard}:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof of \Cref{th:single-p-hard}]
It is easy to check that in $O(m)$ time we can compute $\graph{2}$ and $\graph{3}$ from $\graph{1}=G$ (and further note that these graphs also have $O(m)$ edges). Thus,
in time $O(T(m))$, we can compute $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [3]$.~\Cref{lem:lin-sys} then completes the proof.
in time $O(T(m))$, we can compute $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [3]$. \Cref{lem:lin-sys} then completes the proof.
\end{proof}
\qed
%\AH{I didn't think of a more appropriate name for $\vct{b}$, so I have just stuck with what Atri called it on chat.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \AH{I didn't think of a more appropriate name for $\vct{b}$, so I have just stuck with what Atri called it on chat.}
%Using \cref{def:Gk} we construct graphs $\graph{2}$ and $\graph{3}$ from arbitrary graph $\graph{1}$.
%We then show that for any of the patterns $\threedis, \threepath, \tri$ which are all known to be hard to compute, we can use linear combinations in terms of $\graph{1}$ from Lemmas \ref{lem:3m-G2}, \ref{lem:3m-G3}, \ref{lem:3p-G2}, \ref{lem:3p-G3}, \ref{lem:tri} to compute $\numocc{\graph{i}}{S}$, where $i$ in $\{2, 3\}$ and $S \in \{\threedis, \threepath, \tri\}$. Then, using ~\cref{claim:four-two}, \cref{lem:qE3-exp} and \cref{lem:lin-sys}, we can combine all three linear combinations into a linear system, solving for $\numocc{\graph{1}}{S}$.
%Using \Cref{def:Gk} we construct graphs $\graph{2}$ and $\graph{3}$ from arbitrary graph $\graph{1}$.
%We then show that for any of the patterns $\threedis, \threepath, \tri$ which are all known to be hard to compute, we can use linear combinations in terms of $\graph{1}$ from Lemmas \ref{lem:3m-G2}, \ref{lem:3m-G3}, \ref{lem:3p-G2}, \ref{lem:3p-G3}, \ref{lem:tri} to compute $\numocc{\graph{i}}{S}$, where $i$ in $\{2, 3\}$ and $S \in \{\threedis, \threepath, \tri\}$. Then, using \Cref{claim:four-two}, \Cref{lem:qE3-exp} and \Cref{lem:lin-sys}, we can combine all three linear combinations into a linear system, solving for $\numocc{\graph{1}}{S}$.
%$%^&*(
@ -174,3 +200,8 @@ in time $O(T(m))$, we can compute $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ f
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: