Aaron Huber 2022-06-03 07:52:51 -04:00
commit dd3ff9b9bf
7 changed files with 125 additions and 22 deletions

View File

@ -25,3 +25,45 @@ Using the same factorization from \Cref{example:expr-tree-T}, $\polyf(\abs{\circ
\begin{Definition}[Subcircuit]
A subcircuit of a circuit $\circuit$ is a circuit \subcircuit such that \subcircuit is a DAG \textit{subgraph} of the DAG representing \circuit. The sink of \subcircuit has exactly one gate \gate.
\end{Definition}
%%%%%%%%%
\textcolor{red}{Aaron: The stuff below needs to be integrated into this section.}
Further, we can also argue the following result.%, recalling from~\Cref{sec:intro} for \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, where $\tupset$ is the set of possible tuples across all possible worlds of $\pdb$.
\begin{Lemma}
\label{lem:ctidb-gamma}
Given $\raPlus$ query $\query$ and \abbrCTIDB $\pdb$, let \circuit be the circuit computed by $\query\inparen{\tupset}$. Then, for the reduced \abbrOneBIDB $\pdb'$ there exists an equivalent circuit \circuit' obtained from $\query\inparen{\tupset'}$, such that $\gamma\inparen{\circuit'}\leq 1 - \bound^{-\inparen{k-1}}$ with $\size\inparen{\circuit'} \leq \size\inparen{\circuit} + \bigO{\numvar\bound}$
and $\depth\inparen{\circuit'} = \depth\inparen{\circuit} + \bigO{\log{\bound}}$.
\end{Lemma}
We briefly connect the runtime in \Cref{eq:approx-algo-runtime} to the algorithm outline earlier (where we ignore the dependence on $\multc{\cdot}{\cdot}$, which is needed to handle the cost of arithmetic operations over integers). The $\size(\circuit)$ comes from the time taken to run \onepass once (\onepass essentially computes $\abs{\circuit}(1,\ldots, 1)$ using the natural circuit evaluation algorithm on $\circuit$). We make $\frac{\log{\frac{1}{\conf}}}{\inparen{\error'}^2\cdot(1-\gamma)^2\cdot \prob_0^{2k}}$ many calls to \sampmon (each of which essentially traces $O(k)$ random sink to source paths in $\circuit$ all of which by definition have length at most $\depth(\circuit)$).
Finally, we address the $\multc{\log\left(\abs{\circuit}(1,\ldots, 1)\right)}{\log\left(\size(\circuit)\right)}$ term in the runtime.
\begin{Lemma}
\label{lem:val-ub}
For any \emph{\abbrOneBIDB} circuit $\circuit$ with $\degree(\circuit)=k$, we have
$\abs{\circuit}(1,\ldots, 1)\le 2^{2^k\cdot \depth(\circuit)}.$
Further, if $\circuit$ is a tree, then we have $\abs{\circuit}(1,\ldots, 1)\le \size(\circuit)^{O(k)}.$
\end{Lemma}
Note that the above implies that with the assumption $\prob_0>0$ and $\gamma<1$ are absolute constants from \Cref{cor:approx-algo-const-p}, then the runtime there simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)^2\cdot \log{\frac{1}{\conf}}\right)$ for general circuits $\circuit$. If $\circuit$ is a tree, then the runtime simplifies to $O_k\left(\frac 1{\inparen{\error'}^2}\cdot\size(\circuit)\cdot \log{\frac{1}{\conf}}\right)$, which then answers \Cref{prob:intro-stmt} with yes for such circuits.
Finally, note that by \Cref{prop:circuit-depth} and \Cref{lem:circ-model-runtime} for any $\raPlus$ query $\query$, there exists a circuit $\circuit^*$ for $\apolyqdt$ such that $\depth(\circuit^*)\le O_{|Q|}(\log{n})$ and $\size(\circuit)\le O_k\inparen{\qruntime{\query, \tupset, \bound}}$. Using this along with \Cref{lem:val-ub}, \Cref{cor:approx-algo-const-p} and the fact that $n\le \qruntime{\query, \tupset, \bound}$, we have the following corollary:
\begin{Corollary}
\label{cor:approx-algo-punchline}
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \emph{\abbrOneBIDB} with $p_0>0$ and $\gamma<1$, where $p_0,\gamma$ as in \Cref{cor:approx-algo-const-p}, are absolute constants. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $p_i$ for each $i\in [n]$ that defines $\pd$).
\end{Corollary}
\textcolor{red}{The corollary below is a repeat of the corollary on S4}
Next, we note that the above result along with \Cref{lem:ctidb-gamma}
answers \Cref{prob:big-o-joint-steps} in the affirmative as follows:
\begin{Corollary}
\label{cor:approx-algo-punchline-ctidb}
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$, where $p_0$ as in \Cref{cor:approx-algo-const-p}, is an absolute constant. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf,\bound}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $\prob_{\tup, j}$ for each $\tup\in\tupset,~j\in\pbox{\bound}$ that defines $\bpd$).
\end{Corollary}
\begin{proof}[Proof of~\Cref{cor:approx-algo-punchline-ctidb}]
By~\Cref{lem:ctidb-gamma} and~\Cref{cor:approx-algo-punchline}, the proof follows.
\end{proof}
\qed

View File

@ -3,13 +3,12 @@
\section{$1 \pm \epsilon$ Approximation Algorithm}\label{sec:algo}
We showed in~\Cref{sec:hard} that a runtime of $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$ cannot be acheived for~\Cref{prob:bag-pdb-poly-expected}. In light of this, we desire to produce an approximation algorithm that runs in time $\bigO{\qruntime{\optquery{\query},\tupset,\bound}}$. We do this by showing the result via circuits,
such that our approximation algorithm for this problem runs in $\bigO{\abs{\circuit}}$ for a very broad class of circuits, (thus affirming~\Cref{prob:intro-stmt}); see the discussion after \Cref{lem:val-ub} for more.
such that our $1\pm\epsilon$ approximation algorithm for this problem runs in $\bigO{\abs{\circuit}}$ for a very broad class of circuits, (thus solving~\Cref{prob:intro-stmt}); see the discussion after \Cref{lem:val-ub} for more.
The following approximation algorithm applies to bag query semantics over both
\abbrCTIDB lineage polynomials and general \abbrBIDB lineage polynomials in practice, where for the latter we note that a $1$-\abbrTIDB is equivalently a \abbrBIDB (blocks are size $1$).
\abbrCTIDB lineage polynomials and general \abbrBIDB lineage polynomials in practice. %, where for the latter we note that a $1$-\abbrTIDB is equivalently a \abbrBIDB (blocks are size $1$).
Our experimental results (see~\Cref{app:subsec:experiment}), which use queries from the PDBench benchmark~\cite{pdbench} support the notion that our bounds hold for general \abbrBIDB in practice.
%
%
Corresponding proofs and pseudocode for all formal statements and algorithms
can be found in \Cref{sec:proofs-approx-alg}.
@ -32,6 +31,7 @@ $
\end{Definition}
Later on, we will denote the monomial composed of the variables in $\monom$ as $\encMon$. As an example of $\expansion{\circuit}$, consider $\circuit$ illustrated in \Cref{fig:circuit}. $\expansion{\circuit}$ is then $[(X, 2), (XY, -1), (XY, 4), (Y, -2)]$. This helps us redefine $\rpoly$ (see \Cref{eq:tilde-Q-bi}) in a way that makes our algorithm more transparent.
Next, we present a sequence of definitions that will be useful for our algorithm and its analysis.
\begin{Definition}[$\abs{\circuit}$]\label{def:positive-circuit}
For any circuit $\circuit$, the corresponding
{\em positive circuit}, denoted $\abs{\circuit}$, is obtained from $\circuit$ as follows. For each leaf node $\ell$ of $\circuit$ where $\ell.\type$ is $\tnum$, update $\ell.\vari{value}$ to $|\ell.\vari{value}|$.
@ -57,8 +57,9 @@ $\degree(\circuit)$ is defined recursively:
\noindent
We use the following notation for integer multiplication complexity:
\begin{Definition}[$\multc{\cdot}{\cdot}$]\footnote{We note that when doing arithmetic operations on the RAM model for input of size $N$, we have that $\multc{O(\log{N})}{O(\log{N})}=O(1)$. More generally we have $\multc{N}{O(\log{N})}=O(N\log{N}\log\log{N})$.}
In a RAM model of word size of $W$-bits, $\multc{M}{W}$ denotes the complexity of multiplying two integers represented with $M$-bits. (We will assume that for input of size $N$, $W=O(\log{N})$.)
\begin{Definition}[$\multc{\cdot}{\cdot}$]\footnote{We note that when doing arithmetic operations on the RAM model for input of size $N$, we have that $\multc{O(\log{N})}{O(\log{N})}=O(1)$.}
%More generally we have $\multc{N}{O(\log{N})}=O(N\log{N}\log\log{N})$.}
In a RAM model of word size of $W$-bits, $\multc{M}{W}$ denotes the complexity of multiplying two integers represented with $M$-bits. (For input of size $N$, we make the standard assumption $W=O(\log{N})$.)
\end{Definition}
Finally, to get linear runtime results, we will need to define another parameter modeling the (weighted) number of monomials in $\expansion{\circuit}$
@ -84,7 +85,7 @@ Given a lineage polynomial $\poly(\vct{X})=\polyf(\circuit)$ for circuit \circui
\label{eq:tilde-Q-bi}
\rpoly\inparen{p_1,\dots,p_\numvar}=\hspace*{-1mm}\sum_{(\monom,\coef)\in \expansion{\circuit}}
\indicator{\isInd{\encMon}
}\cdot \coef\cdot\hspace*{-2mm}\prod_{X_i\in \monom}\hspace*{-2mm} p_i.
}\cdot \coef\cdot\prod_{X_i\in \monom} p_i.
\end{equation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -92,11 +93,11 @@ Given the above, the algorithm is a sampling based algorithm for the above sum:
to $\abs{\coef}$ and compute $\vari{Y}=\indicator{\isInd{\encMon}}
\cdot \prod_{X_i\in \monom} p_i$.
Repeating the sampling an appropriate number of times
and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is used to compute the sampling probabilities needed in \sampmon (details are in \Cref{sec:proofs-approx-alg}).
and computing the average of $\vari{Y}$ gives us our final estimate. \onepass is used to compute the sampling probabilities needed in \sampmon. All the algorithms details are in \Cref{sec:proofs-approx-alg}.
%%%%%%%%%%%%%%%%%%%%%%%
\mypar{Runtime analysis} We can argue the following runtime for the algorithm outlined above:
\mypar{Runtime analysis} We can argue the following runtime for the algorithm outlined above (which solves \Cref{prob:intro-stmt}):
\begin{Theorem}
\label{cor:approx-algo-const-p}
Let \circuit be an arbitrary \emph{\abbrOneBIDB} circuit, define $\poly(\vct{X})=\polyf(\circuit)$, let $k=\degree(\circuit)$, and let $\gamma=\gamma(\circuit)$. Further let it be the case that $\prob_i\ge \prob_0$ for all $i\in[\numvar]$. Then an estimate $\mathcal{E}$ of $\rpoly(\prob_1,\ldots, \prob_\numvar)$
@ -117,7 +118,24 @@ In particular, if $\prob_0>0$ and $\gamma<1$ are absolute constants then the abo
The restriction on $\gamma$ is satisfied by any
$1$-\abbrTIDB (where $\gamma=0$ in the equivalent $1$-\abbrBIDB of~\Cref{prop:ctidb-reduct})
as well as for all three queries of the PDBench \abbrBIDB benchmark (\Cref{app:subsec:experiment}). Further, we can also argue the following result.%, recalling from~\Cref{sec:intro} for \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, where $\tupset$ is the set of possible tuples across all possible worlds of $\pdb$.
as well as for all three queries of the PDBench \abbrBIDB benchmark (\Cref{app:subsec:experiment}).
%We prove \Cref{cor:approx-algo-punchline-ctidb} from \Cref{eq:approx-algo-runtime} via the following sequence of arguments.
Next, by \Cref{prop:circuit-depth} and \Cref{lem:circ-model-runtime} for any $\raPlus$ query $\query$, there exists a circuit $\circuit^*$ for $\apolyqdt$ such that $\depth(\circuit^*)\le O_{|Q|}(\log{n})$ and $\size(\circuit^*)\le O_k\inparen{\qruntime{\query, \tupset, \bound}}$. Then, we note that \Cref{prop:ctidb-reduct} gives us an equivalent $\circuit$ from $\circuit^*$ is essentially the same size and has $\gamma(\circuit)\le 1-c^{-\Omega(k)}$ (\Cref{lem:ctidb-gamma}). Finally, we argue (using the fact $\circuit^*$ has low depth) that $\abs{\circuit^*}(1,\dots,1)\le \size(\circuit^*)^{O_k(1)}$ (\Cref{lem:val-ub}).
%Next, we note that the above result %along with \Cref{lem:ctidb-gamma}
The above sequence of arguments results in the following result (which answers \Cref{prob:big-o-joint-steps} in the affirmative):
\begin{Corollary}
\label{cor:approx-algo-punchline-ctidb}
Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$, where $p_0$ as in \Cref{cor:approx-algo-const-p}, is an absolute constant. Let $\poly(\vct{X})=\apolyqdt$ for any result tuple $\tup$ with $\deg(\poly)=k$. Then one can compute an approximation satisfying \Cref{eq:approx-algo-bound-main} in time $O_{k,|Q|,\error',\conf,\bound}\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}$ (given $\query,\tupset$ and $\prob_{\tup, j}$ for each $\tup\in\tupset,~j\in\pbox{\bound}$ that defines $\bpd$).
\end{Corollary}
If we want to approximate the expected multiplicities of all $Z=O(n^k)$ result tuples $\tup$ simultaneously, we just need to run the above result with $\conf$ replaced by $\frac \conf Z$. Note this increases the runtime by only a logarithmic factor.
%%%% Commenting out he details below-- also copied over into the appendix
\iffalse
Further, we can also argue the following result.%, recalling from~\Cref{sec:intro} for \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, where $\tupset$ is the set of possible tuples across all possible worlds of $\pdb$.
\begin{Lemma}
\label{lem:ctidb-gamma}
@ -153,8 +171,9 @@ Let $\query$ be an $\raPlus$ query and $\pdb$ be a \abbrCTIDB with $p_0>0$, wher
By~\Cref{lem:ctidb-gamma} and~\Cref{cor:approx-algo-punchline}, the proof follows.
\end{proof}
\qed
\fi
If we want to approximate the expected multiplicities of all $Z=O(n^k)$ result tuples $\tup$ simultaneously, we just need to run the above result with $\conf$ replaced by $\frac \conf Z$. Note this increases the runtime by only a logarithmic factor.

View File

@ -43,7 +43,7 @@ We slightly abuse notation here, denoting a world vector as $W$ rather than $\wo
Lineage polynomials for arbitrary deterministic $\gentupset'$ are constructed in a manner analogous to $1$-\abbrTIDB\xplural (see \Cref{fig:nxDBSemantics}), differing only in the base case.
In a $1$-\abbrTIDB, each tuple contributes a multiplicity of 0 or 1, and $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$.
In a $c$-\abbrTIDB, each tuple $\tup\in\tupset'$ contributes its corresponding multiplicity: $\polyqdt{\rel}{\gentupset}{\tup} = c_\tup\cdot X_\tup$. These semantics are fully detailed in \Cref{fig:lin-poly-bidb}.
In a \abbrOneBIDB, each tuple $\tup\in\tupset'$ contributes its corresponding multiplicity: $\polyqdt{\rel}{\gentupset}{\tup} = c_\tup\cdot X_\tup$. These semantics are fully detailed in \Cref{fig:lin-poly-bidb}.
\abbrOneBIDB are powerful enough to encode \abbrCTIDB:
\begin{Proposition}[\abbrCTIDB reduction]\label{prop:ctidb-reduct}

View File

@ -177,7 +177,7 @@ The simple insight to get around this issue to note that the random variables $\
%Since the set of multiplicities for tuple $\tup$ by nature are disjoint we can drop all cross terms and have $\refpoly{1, }^2 = \sum_{j_1, j_2, j_3 \in \pbox{\bound}}j_1^2A^2_{j_1}j_2^2X_{j_2}^2j_3^2B^2_{j_3}$. Since we now have that all $\randWorld_{X_j}\in\inset{0, 1}$, computing expectation yields $\expct\pbox{\refpoly{1, }^2}=\sum_{j_1,j_2,j_3\in\pbox{\bound}}j_1^2j_2^2j_3^2$ \allowbreak $\expct\pbox{\randWorld_{A_{j_1}}}\expct\pbox{\randWorld_{X_{j_2}}}\expct\pbox{\randWorld_{B_{j_3}}}$.
Given that $X$ can only have multiplicity of $1$ or $2$ but not both, we drop the monomials with the term $X_1X_2$ to get
$\refpoly{1, }^{\inparen{ABX}^2}\inparen{A, X_1, X_2 B} = A^2X_1^2B^2+2^2\cdot A^2 X_2^2B^2.$
$\refpoly{1, }^{\inparen{ABX}^2}\inparen{A, X_1, X_2, B} = A^2X_1^2B^2+2^2\cdot A^2 X_2^2B^2.$
Now that all the world vectors $(\randWorld_A,\randWorld_{X_1},\randWorld_{X_2},\randWorld_A)\in\inset{0,1}^4$, we have $\expct\pbox{\refpoly{1, }^2}=\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{X_1}}\expct\pbox{\randWorld_{B}}+$ \\ $4\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{X_2}}\expct\pbox{\randWorld_{B}}\stackrel{\text{def}}{=}\rpoly_1^2\inparen{p_A,\probOf\inparen{X=1},\probOf\inparen{X=2},p_B}$. We only did the argument for a single monomial but by linearity of expectation we can apply the same argument to all monomials in $\poly_1^2$. Generalizing this argument to general $\poly$ leads to consider its follownig `reduced' version:
\begin{Definition}\label{def:reduced-poly}

View File

@ -33,7 +33,7 @@ Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$, $\kmatcht
%We note that the above conjecture is somewhat non-standard. In particular, the best known algorithm to compute $\numocc{G}{\kmatch}$ takes time $\Omega\inparen{|V|^{k/2}}$
%(i.e. if this is the best algorithm then $c_0=\frac 14$)
%~\cite{k-match}.
The above result is saying is that (assuming ETH) one can only hope for a slightly super-polynomial improvement over the state of the art algorithm to compute $\numocc{G}{\kmatch}$.
The above result is saying is that (assuming ETH) one can only hope for a slightly super-polynomial improvement over the trivial algorithm to compute $\numocc{G}{\kmatch}$.
%
Our hardness result in Section~\ref{sec:single-p} is based on the following conjectured hardness result:
@ -86,7 +86,7 @@ $\qruntimenoopt{\qhard^k, \tupset, \bound}$ is $O_k\inparen{\numedge}$.
\subsection{Multiple Distinct $\prob$ Values}
\label{sec:multiple-p}
We are now ready to present our main hardness result.
We are now ready to present one of our main hardness result.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -96,12 +96,16 @@ needs time $\bigOmega{\kmatchtime}$, if $\kmatchtime\ge \omega\inparen{\abs{\edg
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Note that the second row of \Cref{tab:lbs} follows from %\Cref{prop:expection-of-polynom},
\Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{thm:k-match-hard} while the third row is proved by %\Cref{prop:expection-of-polynom},
\Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{conj:known-algo-kmatch}.
Note that the second (and third) row(s) of \Cref{tab:lbs} follow from %\Cref{prop:expection-of-polynom},
\Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{thm:k-match-hard} (\Cref{conj:known-algo-kmatch} resp.).
%\textcolor{red}{Need to put in a proof overview here-- Atri}
\Cref{thm:mult-p-hard-result} follows by observing that $\rpoly_G^\kElem(\prob,\dots,\prob)=\prob^{2k}\cdot \numocc{G}{\kmatch} +r(p)$, where $r(p)$ is a polynomial of degree at most $2k-1$ (with coefficients that just depend on $G$). By polynomial interpolation, knowing the values $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ (over all $i\in [2k+1]$) allows us to compute all the coefficients, including $\numocc{G}{\kmatch}$.
%while the third row is proved by %\Cref{prop:expection-of-polynom}, \Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{conj:known-algo-kmatch}.
%Since \Cref{conj:known-algo-kmatch} is non-standard, the latter hardness result should be interpreted as follows. Any substantial polynomial improvement for \Cref{prob:bag-pdb-poly-expected} (over the trivial algorithm that converts $\poly$ into SMB and then uses \Cref{cor:expct-sop} for \abbrStepTwo) would lead to an improvement over the state of the art {\em upper} bounds on $\kmatchtime$. Finally,
Note that \Cref{thm:mult-p-hard-result} needs one to be able to compute the expected multiplicities over $(2k+1)$ distinct values of $p_i$, each of which corresponds to distinct $\bpd$ (for the same $\tupset$), which explain the `Multiple' entries in the second column of the second and third rows in \Cref{tab:lbs}. Next, we argue how to get rid of this latter requirement.
\textcolor{red}{Need to put in a proof overview here-- Atri}
%%% Local Variables:
%%% mode: latex

View File

@ -17,7 +17,7 @@ Many data models have been proposed for encoding PDBs more compactly than as set
%
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (\emph{pvc-tables}) that supports bags, and has runtime complexity linear in the size of the lineage.
However, this lineage is encoded as a tree; the size (and thus the runtime) are still superlinear in $\qruntime{\query, \tupset, \bound}$.
The runtime bound is also limited to a specific class of (hierarchical) queries, suggesting the possibility of a generalization of \cite{DS12}'s dichotomy result to \abbrBPDB\xplural.
The runtime bound is also limited to a specific class of (hierarchical) queries, suggesting the possibility of a generalization of \cite{DS12}'s dichotomy result to \abbrBPDB\xplural for our problem (\cite{https://doi.org/10.48550/arxiv.2201.11524} presents a dichotomy result for a related problem).
Several techniques for approximating tuple probabilities have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07}, relying on Monte Carlo sampling, e.g.,~\cite{DS07}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10}.
Our approximation algorithm is also based on sampling.

View File

@ -12,8 +12,46 @@ Fix $\prob\in (0,1)$. Then assuming \Cref{conj:graph}, any algorithm that comput
\end{Theorem}
Note that \Cref{lem:tdet-om} and \Cref{th:single-p-hard} above imply the hardness result in the first row of \Cref{tab:lbs}.
We note that \Cref{thm:k-match-hard} and \Cref{conj:known-algo-kmatch} (and the lower bounds in the second and third rows) need $k$ to be large enough (in particular, we need a family of hard queries). But the above \Cref{th:single-p-hard} (and the lower bound in first row of Table~\ref{tab:lbs}) holds for $k=3$ (and hence for a fixed query). \textcolor{red}{Need to put in a proof overview here-- Atri}
We note that \Cref{thm:k-match-hard} and \Cref{conj:known-algo-kmatch} (and the lower bounds in the second and third rows) need $k$ to be large enough (in particular, we need a family of hard queries). But the above \Cref{th:single-p-hard} (and the lower bound in first row of Table~\ref{tab:lbs}) holds for $k=3$ (and hence for a fixed query).
%\textcolor{red}{Need to put in a proof overview here-- Atri}
Unlike the proof of \Cref{thm:mult-p-hard-result}, in this case we have to pay close attention to all the coefficients of $\rpoly_{G}^3(\prob,\ldots, \prob)$:
\begin{Lemma}\label{lem:qE3-exp}
For any $\prob$, we have:
{\small
\begin{align}
&\rpoly_{G}^3(\prob,\ldots, \prob) = \numocc{G}{\ed}\prob^2 + 6\numocc{G}{\twopath}\prob^3 + 6\numocc{G}{\twodis}\prob^4 + 6\numocc{G}{\tri}\prob^3\nonumber\\
&+ 6\numocc{G}{\oneint}\prob^4 + 6\numocc{G}{\threepath}\prob^4 + 6\numocc{G}{\twopathdis}\prob^5 + 6\numocc{G}{\threedis}\prob^6.\label{claim:four-one}
\end{align}}
\end{Lemma}
%\subsubsection{Proof for \Cref{lem:qE3-exp}}
%Unlike \Cref{thm:mult-p-hard-result}, we do not have access to evaluations of $\rpoly_{G}^3(\prob,\ldots, \prob)$ for multiple values of $p$ and hence,
Since $p$ is fixed, the earlier polynomial interpolation based argument does not work anymore. Next, we use the fact that the algorithm still has to compute $\rpoly_{G}^3(\prob,\ldots, \prob)$ for {\em all} graphs $G$. We focus on the graphs $\graph{\ell}$ obtained from $G$ by replacing each edge by path of length $\ell$ (\Cref{def:Gk}). We then show
%\begin{Definition}\label{def:Gk}
%For $\ell \geq 1$, let graph $\graph{\ell}$ be a graph generated from an arbitrary graph $G$, by replacing every edge $e$ of $G$ with an $\ell$-path, such that all inner vertices of an $\ell$-path replacement edge have degree $2$.\footnote{Note that $G\equiv \graph{1}$.}
%\end{Definition}
\begin{Lemma}\label{lem:lin-sys}
Fix $\prob\in (0,1)$. Given $\rpoly_{\graph{\ell}}^3(\prob,\dots,\prob)$ for $\ell\in [2]$, we can compute in $O(m)$ time a vector $\vct{b}\in\mathbb{R}^3$ such that
\begin{equation}
\label{eq:lin-eqs-single-p}
\begin{pmatrix}
1 - 3p & -(3\prob^2 - \prob^3)\\
10(3\prob^2 - \prob^3) & 10(3\prob^2 - \prob^3)
\end{pmatrix}
\cdot
\begin{pmatrix}
\numocc{G}{\tri}]\\
\numocc{G}{\threedis}
\end{pmatrix}
=\vct{b},
\end{equation}
allowing us to compute $\numocc{G}{\tri}$ and $\numocc{G}{\threedis}$ in $O(1)$ time.
\end{Lemma}
Note that \cref{eq:lin-eqs-single-p} only depends on sub-graph counts on $G=\graph{1}$.
%(it is a bit technically tedious to relate the sub-graph counts of $\graph{2}$ to the corresponding ones in $G$).
It can be verified that the coefficient matrix in \cref{eq:lin-eqs-single-p} is full rank for all $p\in (0,1)$. Then by solving the linear equations in \cref{eq:lin-eqs-single-p} we can compute $\numocc{G}{\tri}$, from where \Cref{conj:graph} implies \Cref{th:single-p-hard}.
%%% Local Variables:
%%% mode: latex