paper-BagRelationalPDBsAreHard/mult_distinct_p.tex

%root:main.tex
%!TEX root=./main.tex
\section{Hardness of exact computation}
\label{sec:hard}

In this section, we will prove the hardness results claimed in Table~\ref{tab:lbs} for a specific (family) of hard instance $(\query,\pdb)$ for \Cref{prob:bag-pdb-poly-expected} where $\pdb$ is a \abbrTIDB.
% that computing $\expct\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial  $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard. 
 Note that this implies hardness for \bis and general \abbrBPDB, answering \Cref{prob:bag-pdb-poly-expected} (and hence the equivalent \Cref{prob:bag-pdb-query-eval}) in the negative. 
%Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preliminaries}
Our hardness results are based on (exactly) counting the number of (not necessarily induced) subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity.  We can think of $H$ as being of constant size and $G$ as growing.  %In query processing, $H$ can be viewed as the query while $G$ as the database instance.
In particular, we will consider the problems of computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings).  We use $\kmatchtime$ to denote the optimal runtime of computing $\numocc{G}{\kmatch}$.  Our hardness results in \Cref{sec:multiple-p} is based on the following hardness results/conjectures:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}[\cite{k-match}]
\label{thm:k-match-hard}
Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$ with no self-loops or parallel edges, the time $\kmatchtime$ to compute $\numocc{G}{\kmatch}$ exactly is $\littleomega{f(k)\cdot |\edgeSet|^c}$ for any function $f$ and fixed constant $c$ independent of $\numedge$ and $k$ (assuming $\sharpwzero\ne\sharpwone$. %counting the number of $k$-matchings in $G$ is\sharpwonehard (parameterization is in $k$).
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%The above result means that we cannot hope to count the number of $k$-matchings in $G=(\vset,\edgeSet)$ in time $f(k)\cdot |\vset|^{c}$ for any function $f$ and constant $c$ independent of $k$. 
\begin{hypo}\label{conj:known-algo-kmatch}
There exists an absolute constant $c_0>0$ such that for every $G=(\vset,\edgeSet)$, we have $\kmatchtime \ge \Omega{|E|^{c_0\cdot k}}$.
\end{hypo}
We note that the above conjecture is somewhat non-standard. In particular, the best known state of the art algorithm to compute $\numocc{G}{\kmatch}$ takes time $\Omega\inparen{|V|^{k/2}}$ (i.e. if this is the best algorithm then $c_0=\frac 14$)~\cite{k-match}. What the above conjecture is saying is that one can only hope for a polynomial improvement over the state of the art algorithm to compute $\numocc{G}{\kmatch}$.
%
Our hardness result in Section~\ref{sec:single-p} is based on the following conjectured hardness result:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{hypo}
\label{conj:graph}
There exists a constant $\eps_0>0$ such that given an undirected graph $G=(\vset,\edgeSet)$, computing $\numocc{G}{\tri}$ exactly cannot be done in time $o\inparen{|\edgeSet|^{1+\eps_0}}$.
\end{hypo}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-hard}), which states that detection of whether $G$ has a triangle or not takes time $\Omega\inparen{|\edgeSet|^{4/3}}$, implies that in Conjecture~\ref{conj:graph} we can take $\eps_0\ge \frac 13$.
%The current best known algorithm to count the number of $3$-matchings, to
%\AR{Need to add something about 3-paths and 3-matchings as well.}

All of our hardness results rely on a simple lineage polynomial encoding of the edges of a graph.
To prove our hardness result, consider a graph $G=(\vset, \edgeSet)$, where $|\edgeSet| = m$, $\vset = [\numvar]$. Our lineage polynomial has a variable $X_i$ for every $i$ in $[\numvar]$.
Consider the polynomial
$\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j.$
The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}\label{def:qk}
For any graph $G=(V,\edgeSet)$ and $\kElem\ge 1$, define
\[\poly_{G}^\kElem(X_1,\dots,X_n) = \left(\sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j\right)^\kElem.\]
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Our hardness results only need a \ti instance; We also consider the special case when all the tuple probabilities (probabilities assigned to $X_i$ by $\probAllTup$) are the same value. Note that our hardness results % do not require the general circuit representation and
%even hold for the expression trees. %this polynomial can be encoded in an expression tree of size $\Theta(km)$.

\noindent Returning to \Cref{fig:two-step}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ is the lineage polynomial corresponding to the query that generalizes our example query from \Cref{sec:intro}. Let us alias 
\begin{lstlisting}
SELECT 1 FROM OnTime a, Route r, OnTime b
WHERE a.city = r.city1 AND b.city = r.city2
\end{lstlisting}
as $R_i$ for each $i \in [k]$.  The query $\query^k$ then becomes
\begin{lstlisting}
SELECT 1 FROM $R_1$ JOIN $R_2$ JOIN$\cdots$JOIN $R_k$
\end{lstlisting}          
%RA format for the same query
%\begin{align*}
%\query^k_G \coloneqq &\inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}\times_2\cdots\\
%&\cdots \times_k \inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}
%\end{align*}

%\resizebox{1\linewidth}{!}{
%\begin{minipage}{1.05\linewidth}
%\[\poly^k_G\dlImp OnTime(C_1),Route(C_1, C_1'),OnTime(C_1'),\dots,OnTime(C_\kElem),Route(C_\kElem,C_\kElem'),OnTime(C_\kElem')\]
%\end{minipage}
%}
\noindent Further, the PDB instance generalizes the one in \Cref{fig:two-step} as follows. Relation $OnTime$ has $n$ tuples corresponding to each vertex for $i$ in $[n]$, each with probability $\prob_i$ and $Route$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
In other words, for this instance $\dbbase$ contains the set of $n$ unary tuples in $OnTime$ (which corresponds to $\vset$) and $m$ binary tuples in $Route$ (which corresponds to $\edgeSet$).
Note that this implies that $\poly_{G}^\kElem$ 
%our hard lineage polynomial can be represented as an expression tree produced by a  project-join query with same probability value for each input tuple $\prob_i$, and hence
 is indeed a lineage polynomial for a \abbrTIDB \abbrPDB.

Next, we note that the runtime for \abbrStepOne with $\query^k$ and $\dbbase$ as defined above is $O(m)$ (i.e.  \abbrStepOne is `easy' for this query):
\begin{Lemma}\label{lem:tdet-om}
Let $\query^k$ and $\dbbase$ be as defined above. Then
% of \Cref{def:qk}, the runtime 
$\qruntime{\query^k, \dbbase}$ is $O(\kElem\numedge)$.
\end{Lemma}

%\begin{Corollary}\label{cor:at-least-kmatch}
%\end{Corollary}
%\begin{proof}[Proof of \Cref{cor:at-least-kmatch}
%\end{proof}
%
%\begin{Corollary}\label{cor:best-curr-algo}
%\end{Corollary}
%\begin{proof}[Proof of \Cref{cor:best-curr-algo}
%\end{proof}

\subsection{Multiple Distinct $\prob$ Values}
\label{sec:multiple-p}
%Unless otherwise noted, all proofs for this section are in \Cref{app:single-mult-p}.
We are now ready to present our main hardness result.
%
e
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Theorem}\label{thm:mult-p-hard-result}
Let $\prob_0,\ldots,\prob_{2k}$ be $2k + 1$ distinct values in $(0, 1]$.  Then computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ (over all $i\in [2k+1]$ for arbitrary $G=(\vset,\edgeSet)$
%and any $(2k+1)$ distinct values $\prob_i$ ($0\le i \le 2k$)
needs time $\bigOmega{\kmatchtime}$, assuming $\kmatchtime\ge \omega\inparen{\abs{\edgeSet}}$.
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Note that the second row of \Cref{tab:lbs} follows from \Cref{prop:expection-of-polynom}, \Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{thm:k-match-hard} while the third row is proved by \Cref{prop:expection-of-polynom}, \Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{conj:known-algo-kmatch}. Since \Cref{conj:known-algo-kmatch} is non-standard, the latter hardness result should be interpreted as follows. Any substantial polynomial improvement for \Cref{prob:bag-pdb-poly-expected} (over the trivial algorithm that converts $\poly$ into SMB and then runs the obvious algorithm for \abbrStepTwo) would lead to an improvement over the state of the art {\em upper} bounds on  $\kmatchtime$. Finally, note that \Cref{thm:mult-p-hard-result} needs one to be able to compute the expected multiplicities over $(2k+1)$ distinct values of $p_i$, each of which corresponds to distinct $\pd$ (for the same $\dbbase$), which explain the `Multiple' entry in the second column in the second and third row in \Cref{tab:lbs}. Next, we argue how to get rid of this latter requirement.
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%NEEDS to be moved to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem (and proves \Cref{thm:mult-p-hard-result}):
%\begin{Lemma}\label{lem:qEk-multi-p}
%Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$.  Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $\bigO{\kElem^3}$ time.
%\end{Lemma}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%END move to appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:
More work on lemmas 3, 4, and lin sys. 2020-12-04 13:14:12 -05:00			`%root:main.tex`
Pass over S2, S3; Ended up saving a column or so 2020-12-19 00:45:30 -05:00			`%!TEX root=./main.tex`
Started pass on Sec 3 2020-12-13 11:32:55 -05:00			`\section{Hardness of exact computation}`
Added a app for Sec 3 details 2020-12-13 21:53:22 -05:00			`\label{sec:hard}`
Finished pass on Section 4 (Aaron) 2021-04-07 12:21:41 -04:00
Done with pass on S3 2021-09-15 17:15:53 -04:00			`In this section, we will prove the hardness results claimed in Table~\ref{tab:lbs} for a specific (family) of hard instance $(\query,\pdb)$ for \Cref{prob:bag-pdb-poly-expected} where $\pdb$ is a \abbrTIDB.`
			`% that computing $\expct\pbox{\poly(\vct{W})}$ exactly for a \ti-lineage polynomial $\poly(\vct{X})$ generated from a project-join query (even an expression tree representation) is \sharpwonehard.`
			`Note that this implies hardness for \bis and general \abbrBPDB, answering \Cref{prob:bag-pdb-poly-expected} (and hence the equivalent \Cref{prob:bag-pdb-query-eval}) in the negative.`
			`%Furthermore, we demonstrate in \Cref{sec:single-p} that the problem remains hard, even if $\probOf[X_i=1] = \prob$ for all $X_i$ and any fixed valued $\prob \in (0, 1)$ as long as certain popular hardness conjectures in fine-grained complexity hold.`
Started pass on Sec 3 2020-12-13 11:32:55 -05:00
sec 3 2020-12-18 12:23:13 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Started pass on Sec 3 2020-12-13 11:32:55 -05:00			`\subsection{Preliminaries}`
Finished pass on S3 2021-09-09 11:42:30 -04:00			`Our hardness results are based on (exactly) counting the number of (not necessarily induced) subgraphs in $G$ isomorphic to $H$. Let $\numocc{G}{H}$ denote this quantity. We can think of $H$ as being of constant size and $G$ as growing. %In query processing, $H$ can be viewed as the query while $G$ as the database instance.`
Done with pass on S3 2021-09-15 17:15:53 -04:00			In particular, we will consider the problems of computing the following counts (given $G$ in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threedis}$ (the number of $3$-matchings), and the latter's generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings). We use $\kmatchtime$ to denote the optimal runtime of computing $\numocc{G}{\kmatch}$. Our hardness results in \Cref{sec:multiple-p} is based on the following hardness results/conjectures:
abstract 2020-12-11 19:50:53 -05:00
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Incorporated all of @atri Riot 120920 suggestions. 2020-12-09 12:20:44 -05:00			`\begin{Theorem}[\cite{k-match}]`
Added hardness result for k-matchings 2020-12-09 00:00:04 -05:00			`\label{thm:k-match-hard}`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$ with no self-loops or parallel edges, the time $\kmatchtime$ to compute $\numocc{G}{\kmatch}$ exactly is $\littleomega{f(k)\cdot \|\edgeSet\|^c}$ for any function $f$ and fixed constant $c$ independent of $\numedge$ and $k$ (assuming $\sharpwzero\ne\sharpwone$. %counting the number of $k$-matchings in $G$ is\sharpwonehard (parameterization is in $k$).`
Incorporated all of @atri Riot 120920 suggestions. 2020-12-09 12:20:44 -05:00			`\end{Theorem}`
abstract 2020-12-11 19:50:53 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Finished rearranging S3. 2021-09-15 11:15:36 -04:00			`%The above result means that we cannot hope to count the number of $k$-matchings in $G=(\vset,\edgeSet)$ in time $f(k)\cdot \|\vset\|^{c}$ for any function $f$ and constant $c$ independent of $k$.`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`\begin{hypo}\label{conj:known-algo-kmatch}`
			`There exists an absolute constant $c_0>0$ such that for every $G=(\vset,\edgeSet)$, we have $\kmatchtime \ge \Omega{\|E\|^{c_0\cdot k}}$.`
Finished rearranging S3. 2021-09-15 11:15:36 -04:00			`\end{hypo}`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`We note that the above conjecture is somewhat non-standard. In particular, the best known state of the art algorithm to compute $\numocc{G}{\kmatch}$ takes time $\Omega\inparen{\|V\|^{k/2}}$ (i.e. if this is the best algorithm then $c_0=\frac 14$)~\cite{k-match}. What the above conjecture is saying is that one can only hope for a polynomial improvement over the state of the art algorithm to compute $\numocc{G}{\kmatch}$.`
Pass over S2, S3; Ended up saving a column or so 2020-12-19 00:45:30 -05:00			`%`
made some macro changes 2020-12-13 13:05:43 -05:00			`Our hardness result in Section~\ref{sec:single-p} is based on the following conjectured hardness result:`
sec 3 2020-12-18 12:23:13 -05:00			`%`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
made some macro changes 2020-12-13 13:05:43 -05:00			`\begin{hypo}`
			`\label{conj:graph}`
Finished pass on S3 2021-09-09 11:42:30 -04:00			`There exists a constant $\eps_0>0$ such that given an undirected graph $G=(\vset,\edgeSet)$, computing $\numocc{G}{\tri}$ exactly cannot be done in time $o\inparen{\|\edgeSet\|^{1+\eps_0}}$.`
made some macro changes 2020-12-13 13:05:43 -05:00			`\end{hypo}`
sec 3 2020-12-18 12:23:13 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%`
More changes to notation, etc. 2021-06-11 11:22:58 -04:00			`Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-hard}), which states that detection of whether $G$ has a triangle or not takes time $\Omega\inparen{\|\edgeSet\|^{4/3}}$, implies that in Conjecture~\ref{conj:graph} we can take $\eps_0\ge \frac 13$.`
Done with S3 pass 2020-12-20 00:22:12 -05:00			`%The current best known algorithm to count the number of $3$-matchings, to`
			`%\AR{Need to add something about 3-paths and 3-matchings as well.}`
made some macro changes 2020-12-13 13:05:43 -05:00
Done with pass on S3 2021-09-15 17:15:53 -04:00			`All of our hardness results rely on a simple lineage polynomial encoding of the edges of a graph.`
			`To prove our hardness result, consider a graph $G=(\vset, \edgeSet)$, where $\|\edgeSet\| = m$, $\vset = [\numvar]$. Our lineage polynomial has a variable $X_i$ for every $i$ in $[\numvar]$.`
updates 2021-04-09 16:12:46 -04:00			`Consider the polynomial`
More changes to notation, etc. 2021-06-11 11:22:58 -04:00			$\poly_{G}(\vct{X}) = \sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j.$
Read through: Space, grammar, notation fixes 2021-04-07 01:02:46 -04:00			`The hard polynomial for our problem will be a suitable power $k\ge 3$ of the polynomial above:`
sec 3 2020-12-18 12:23:13 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Incorporated Virginia's 3-path observation into Lem 3.15. 2021-03-25 11:52:59 -04:00			`\begin{Definition}\label{def:qk}`
Finished pass on S.3. 2021-09-02 17:01:17 -04:00			`For any graph $G=(V,\edgeSet)$ and $\kElem\ge 1$, define`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`\[\poly_{G}^\kElem(X_1,\dots,X_n) = \left(\sum\limits_{(i, j) \in \edgeSet} X_i \cdot X_j\right)^\kElem.\]`
Done with pass on (new) Sec 3.1 2020-12-13 13:41:42 -05:00			`\end{Definition}`
sec 3 2020-12-18 12:23:13 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Finished pass on S.3. 2021-09-02 17:01:17 -04:00			`%Our hardness results only need a \ti instance; We also consider the special case when all the tuple probabilities (probabilities assigned to $X_i$ by $\probAllTup$) are the same value. Note that our hardness results % do not require the general circuit representation and`
			`%even hold for the expression trees. %this polynomial can be encoded in an expression tree of size $\Theta(km)$.`
More work on lemmas 3, 4, and lin sys. 2020-12-04 13:14:12 -05:00
Done with pass on S3 2021-09-15 17:15:53 -04:00			`\noindent Returning to \Cref{fig:two-step}, it is easy to see that $\poly_{G}^\kElem(\vct{X})$ is the lineage polynomial corresponding to the query that generalizes our example query from \Cref{sec:intro}. Let us alias`
Finished pass on S3 2021-09-09 11:42:30 -04:00			`\begin{lstlisting}`
			`SELECT 1 FROM OnTime a, Route r, OnTime b`
			`WHERE a.city = r.city1 AND b.city = r.city2`
			`\end{lstlisting}`
Added two lemmas to S3. 2021-09-14 08:21:57 -04:00			`as $R_i$ for each $i \in [k]$. The query $\query^k$ then becomes`
Finished pass on S3 2021-09-09 11:42:30 -04:00			`\begin{lstlisting}`
			`SELECT 1 FROM $R_1$ JOIN $R_2$ JOIN$\cdots$JOIN $R_k$`
			`\end{lstlisting}`
			`%RA format for the same query`
			`%\begin{align*}`
			`%\query^k_G \coloneqq &\inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}\times_2\cdots\\`
			`%&\cdots \times_k \inparen{\project_\emptyset\inparen{OnTime \join_{City = City_1} Route \join_{{City}_2 = City'}\rename_{City' \leftarrow City}(OnTime)}}`
			`%\end{align*}`

Finished pass on S.3. 2021-09-02 17:01:17 -04:00			`%\resizebox{1\linewidth}{!}{`
			`%\begin{minipage}{1.05\linewidth}`
			`%\[\poly^k_G\dlImp OnTime(C_1),Route(C_1, C_1'),OnTime(C_1'),\dots,OnTime(C_\kElem),Route(C_\kElem,C_\kElem'),OnTime(C_\kElem')\]`
			`%\end{minipage}`
			`%}`
Done with pass on S3 2021-09-15 17:15:53 -04:00			\noindent Further, the PDB instance generalizes the one in \Cref{fig:two-step} as follows. Relation $OnTime$ has $n$ tuples corresponding to each vertex for $i$ in $[n]$, each with probability $\prob_i$ and $Route$ has tuples corresponding to the edges $\edgeSet$ (each with probability of $1$).\footnote{Technically, $\poly_{G}^\kElem(\vct{X})$ should have variables corresponding to tuples in $Route$ as well, but since they always are present with probability $1$, we drop those. Our argument also works when all the tuples in $Route$ also are present with probability $\prob$ but to simplify notation we assign probability $1$ to edges.}
			`In other words, for this instance $\dbbase$ contains the set of $n$ unary tuples in $OnTime$ (which corresponds to $\vset$) and $m$ binary tuples in $Route$ (which corresponds to $\edgeSet$).`
			`Note that this implies that $\poly_{G}^\kElem$`
			`%our hard lineage polynomial can be represented as an expression tree produced by a project-join query with same probability value for each input tuple $\prob_i$, and hence`
			`is indeed a lineage polynomial for a \abbrTIDB \abbrPDB.`
Started pass on Sec 3 2020-12-13 11:32:55 -05:00
Done with pass on S3 2021-09-15 17:15:53 -04:00			Next, we note that the runtime for \abbrStepOne with $\query^k$ and $\dbbase$ as defined above is $O(m)$ (i.e. \abbrStepOne is `easy' for this query):
Added two lemmas to S3. 2021-09-14 08:21:57 -04:00			`\begin{Lemma}\label{lem:tdet-om}`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`Let $\query^k$ and $\dbbase$ be as defined above. Then`
			`% of \Cref{def:qk}, the runtime`
			`$\qruntime{\query^k, \dbbase}$ is $O(\kElem\numedge)$.`
Added two lemmas to S3. 2021-09-14 08:21:57 -04:00			`\end{Lemma}`
Started changes to S.3 to eliminate the 2-step process from our theoretical results. 2021-09-13 12:10:22 -04:00
More work on adjusting S.3 2021-09-14 14:41:14 -04:00			`%\begin{Corollary}\label{cor:at-least-kmatch}`
			`%\end{Corollary}`
			`%\begin{proof}[Proof of \Cref{cor:at-least-kmatch}`
			`%\end{proof}`
			`%`
			`%\begin{Corollary}\label{cor:best-curr-algo}`
			`%\end{Corollary}`
			`%\begin{proof}[Proof of \Cref{cor:best-curr-algo}`
			`%\end{proof}`

Started pass on Sec 3 2020-12-13 11:32:55 -05:00			`\subsection{Multiple Distinct $\prob$ Values}`
			`\label{sec:multiple-p}`
cref 2021-04-10 09:48:26 -04:00			`%Unless otherwise noted, all proofs for this section are in \Cref{app:single-mult-p}.`
Done till proof of redux 2020-12-13 14:16:32 -05:00			`We are now ready to present our main hardness result.`
sec 3 2020-12-18 12:23:13 -05:00			`%`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`e`
updates 2021-04-09 16:12:46 -04:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Done till proof of redux 2020-12-13 14:16:32 -05:00			`\begin{Theorem}\label{thm:mult-p-hard-result}`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`Let $\prob_0,\ldots,\prob_{2k}$ be $2k + 1$ distinct values in $(0, 1]$. Then computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ (over all $i\in [2k+1]$ for arbitrary $G=(\vset,\edgeSet)$`
Finished pass on S3 2021-09-09 11:42:30 -04:00			`%and any $(2k+1)$ distinct values $\prob_i$ ($0\le i \le 2k$)`
Done with pass on S3 2021-09-15 17:15:53 -04:00			`needs time $\bigOmega{\kmatchtime}$, assuming $\kmatchtime\ge \omega\inparen{\abs{\edgeSet}}$.`
Done till proof of redux 2020-12-13 14:16:32 -05:00			`\end{Theorem}`
sec 3 2020-12-18 12:23:13 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%`
Done with pass on S3 2021-09-15 17:15:53 -04:00			Note that the second row of \Cref{tab:lbs} follows from \Cref{prop:expection-of-polynom}, \Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{thm:k-match-hard} while the third row is proved by \Cref{prop:expection-of-polynom}, \Cref{thm:mult-p-hard-result}, \Cref{lem:tdet-om}, and \Cref{conj:known-algo-kmatch}. Since \Cref{conj:known-algo-kmatch} is non-standard, the latter hardness result should be interpreted as follows. Any substantial polynomial improvement for \Cref{prob:bag-pdb-poly-expected} (over the trivial algorithm that converts $\poly$ into SMB and then runs the obvious algorithm for \abbrStepTwo) would lead to an improvement over the state of the art {\em upper} bounds on $\kmatchtime$. Finally, note that \Cref{thm:mult-p-hard-result} needs one to be able to compute the expected multiplicities over $(2k+1)$ distinct values of $p_i$, each of which corresponds to distinct $\pd$ (for the same $\dbbase$), which explain the `Multiple' entry in the second column in the second and third row in \Cref{tab:lbs}. Next, we argue how to get rid of this latter requirement.
Finished pass on S.3. 2021-09-02 17:01:17 -04:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%NEEDS to be moved to appendix`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%\noindent The following lemma reduces the problem of counting $\kElem$-matchings in a graph to our problem (and proves \Cref{thm:mult-p-hard-result}):`
			`%\begin{Lemma}\label{lem:qEk-multi-p}`
			`%Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$. Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $\bigO{\kElem^3}$ time.`
			`%\end{Lemma}`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%END move to appendix`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%`
updates 2021-04-10 14:35:38 -04:00
			`%%% Local Variables:`
			`%%% mode: latex`
			`%%% TeX-master: "main"`
			`%%% End:`