Update on Overleaf.

2022-06-10 00:27:28 +00:00 · 2022-06-10 00:27:28 +00:00 · 8271735896
parent ce39974c49
commit 8271735896
8 changed files with 48 additions and 42 deletions
--- a/abstract.tex
+++ b/abstract.tex
@ -3,7 +3,9 @@
 We study the problem of computing a query result tuple's expected multiplicity for probabilistic databases under bag semantics (where each tuple is associated with a multiplicity) exactly and approximately.
 Specifically, we are interested in the fine-grained complexity of this problem for \abbrCTIDB\xplural, i.e., probabilistic databases where tuples are independent probabilistic events and the multiplicity of each tuple is bound by a constant $\bound$.
 % We consider bag-\abbrTIDB\xplural where we have a bound $\bound$ on the maximum multiplicity of each tuple and tuples are independent probabilistic events (we refer to such databases as \abbrCTIDB\xplural).
-Unfortunately, our results imply that computing expected multiplicities for \abbrCTIDB\xplural based on the output of deterministic query evaluation algorithms introduces super-linear overhead (under certain complexity hardness conjectures).
+Unfortunately, our results imply that computing expected multiplicities for \abbrCTIDB\xplural 
+%based on the output of deterministic query evaluation algorithms 
+introduces super-linear overhead over the corresponding deterministic query evaluation algorithms (under certain complexity hardness conjectures).
 % We are specifically interested in the fine-grained complexity of computing expected multiplicities and how it compares to the complexity of deterministic query evaluation algorithms --- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases.
 % Unfortunately, our results imply that computing expected multiplicities for \abbrCTIDB\xplural based on the results produced by such query evaluation algorithms introduces super-linear overhead (under parameterized complexity hardness assumptions/conjectures).
  Next, we develop a sampling algorithm that computes a $(1 \pm \epsilon)$-approximation of the expected multiplicity of an output tuple in time linear in the runtime of the corresponding deterministic query for any   positive relational algebra ($\raPlus$) query over \abbrCTIDB\xplural and for a non-trivial subclass of block-independent databases. % (\abbrBIDB\xplural).
--- a/approx_alg.tex
+++ b/approx_alg.tex
@ -15,7 +15,7 @@ Proofs and pseudocode for all formal statements and algorithms
 \subsection{Preliminaries and some more notation}

 For notational convenience, in this section we will assume that \dbbaseName $\tupset'=[n]$.
-We now introduce definitions and notation related to circuits and polynomials that we will need to state our upper bound results. First we introduce the expansion $\expansion{\circuit}$ of circuit $\circuit$ which 
+We now introduce definitions  related to circuits and polynomials that we will need to state our upper bound results. First, we introduce the expansion $\expansion{\circuit}$ of circuit $\circuit$ which 
 is used in our auxiliary algorithm \sampmon for sampling monomials when computing the approximation.  

 \begin{Definition}[$\expansion{\circuit}$]\label{def:expand-circuit}
@ -72,7 +72,7 @@ Given \abbrOneBIDB circuit $\circuit$, let
 {\abs{\circuit}(1,\ldots, 1)}.\]
 \end{Definition}
 \subsection{Our main result}\label{sec:algo:sub:main-result}
-We solve~\Cref{prob:intro-stmt} for any fixed $\epsilon > 0$ in what follows.
+ In what follows, we solve~\Cref{prob:intro-stmt} for any fixed $\epsilon > 0$.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \mypar{Algorithm Idea}
@ -99,16 +99,16 @@ and computing the average of $\vari{Y}$ gives us our final estimate.

 We illustrate \sampmon  (\Cref{alg:sample}), using the circuit $\circuit$ in \Cref{fig:circuit}. As a pre-processing step, \onepass (\Cref{alg:one-pass-iter}) recursively (for each sub-circuit) computes $\abs{\circuit}\inparen{1,\ldots, 1}$ in the obvious manner.
 %for any subcircuit whose sink is gate \circuit.  
-The \textcolor{gray}{gray} values in~\Cref{fig:circuit} represent the value $\abs{\circuit'}\inparen{1,\ldots, 1}$ for each sub-circuit $\circuit'$ rooted at the corresponding node. E.g. in the bottom right $\circmult$ (\textcolor{blue}{blue}) gate, the value is $1\times 1=1$ (where the left child has $Y=1$ and the right child has $\abs{-1}=1$).  
+The \textcolor{gray}{gray} values in~\Cref{fig:circuit} represent the value $\abs{\circuit'}\inparen{1,\ldots, 1}$ for each sub-circuit $\circuit'$ rooted at the corresponding node. E.g. in the bottom right $\circmult$ (\textcolor{blue}{blue}) gate, the value is $1\times 1=1$ (where the left child has $Y\gets 1$ and the right child has $\abs{-1}=1$).  
 %The probability of sampling the left child of the red gate is then the computed sum of its left child divided by the red gate's computed sum, $\frac{1}{3}$.% = \abs{\circuit_\linput}\inparen{1,\ldots, 1} + \abs{\circuit_\rinput}\inparen{1,\ldots, 1}$, the sum of its children's values.
 % visits each gate \circuit exactly once, computing $\abs{\circuit}\inparen{1,\ldots, 1}$.  %for each gate \circuit.  
 %If we consider the leftmost source gates, \onepass computes $\abs{\circuit}\inparen{1,\ldots,1} = 1$ for $\circuit.\val = X$ and $\abs{\circuit}\inparen{1,\ldots, 1} = 2$ for $\circuit.\val = 2$. For the leftmost $\circmult$ gate, \onepass computes $\abs{\circuit}\inparen{1,\ldots, 1} = 2\circmult 1$, i.e. $\abs{\circuit_\linput}\inparen{1,\ldots, 1} \times \abs{\circuit_\rinput}\inparen{1,\ldots, 1}$ for children $\circuit_\linput$ and $\circuit_\rinput$.  A level higher, the leftmost $\circplus$ gate recursively adds the values of its two children deriving $\abs{\circuit}\inparen{1,\ldots, 1} = 2 \circplus 1$, while using the expression $\frac{\abs{\circuit_i}\inparen{1,\ldots, 1}}{\abs{\circuit}\inparen{1,\ldots, 1}}$ for $i\in\inset{\linput, \rinput}$ to simultaneously compute the weights $\frac{1}{3}$ and $\frac{2}{3}$ for its children.  The final sum value is then computed in similar fashion.  % yielding $\abs{\circuit}\inparen{1,\ldots, 1} = 3 \circmult 3 = 9$.
 %Given the computed values of \onepass, \sampmon picks a sampling path by traversing both children of a $\circmult$ gate and randomly choosing a child from a $\circplus$ gate according to the weight $\frac{\abs{\circuit_i}\inparen{1,\ldots, 1}}{\abs{\circuit}\inparen{1,\ldots, 1}}$ for $i\in\inset{\linput, \rinput}$.  
 %then uses the weights provided by \onepass to randomly select a monomial from $\expansion{\circuit}$.  

-  We now consider a partial run of \sampmon that samples  $\inparen{XY, -1}$ in~\Cref{fig:circuit}.  It recursively traverses \emph{both} children of the sink $\circmult$ gate.  For the  \textcolor{red}{red} and \textcolor{green}{green} children, which are both  $\circplus$ gates, we randomly choose one of their children. Specifically \sampmon then randomly picks the right child (\textcolor{blue}{blue} $\circmult$ gate that represents $-Y$ and is computed by recursing on both children of the (\textcolor{blue}{blue} $\circmult$ gate) with probability of $\frac{1}{3}$ (where the numerator and denominator are the values computed by \onepass for the \textcolor{blue}{blue} $\circmult$ and \textcolor{green}{green} $\circplus$ gates respectively). Similarly at the left \textcolor{red}{red} $\circplus$ gate we sample the left child (representing $X$ and is computed by recursing to the leaf node $X$) with  probability $\frac{1}{3}$.  Note that the probability for choosing $\inparen{XY, -1}$ overall  is $\frac{1}{3}\cdot \frac{1}{3}=\frac{1}{9}$, which is indeed the ratio of the coefficient of $\inparen{XY, -1}$ to the sum of all coefficients in $\abs{\circuit}$, as needed.  %For the recursive call on the red gate, $\inparen{X, 1}$ is returned, while the purple gate recursively visits both children of the sampled $\circmult$ gate, returning $\inparen{Y, 1}$.  %Multiplying $-1 \circmult XY$, concludes the random sampling of monomial $-XY$.  Suppose \sampmon also randomly samples $X$ and $-Y$ from $\rpoly$ in a call to \approxq.  To estimate $\rpoly\inparen{\vct{\prob}}$, \approxq computes $\prob_X - \prob_X\prob_Y - \prob_Y$ and scales the accumulation accordingly.
+  We now consider a partial run of \sampmon that samples  $\inparen{XY, -1}$ in~\Cref{fig:circuit}.  It recursively traverses \emph{both} children of the sink $\circmult$ gate.  For the  \textcolor{red}{red} and \textcolor{green}{green} children, which are both  $\circplus$ gates, we randomly choose one of their children. Specifically \sampmon then randomly picks the right child of the \textcolor{green}{green} $\circplus$ gate (i.e the \textcolor{blue}{blue} $\circmult$ gate that represents $-Y$ and is computed by recursing on both children of the \textcolor{blue}{blue} $\circmult$ gate) with probability of $\frac{1}{3}$ (where the numerator and denominator are the values computed by \onepass for the \textcolor{blue}{blue} $\circmult$ and \textcolor{green}{green} $\circplus$ gates respectively). Similarly at the left \textcolor{red}{red} $\circplus$ gate we sample the left child (representing $X$ and is computed by recursing to the leaf node $X$) with  probability $\frac{1}{3}$.  Note that the probability for choosing $\inparen{XY, -1}$ overall  is $\frac{1}{3}\cdot \frac{1}{3}=\frac{1}{9}$, which is indeed the ratio of the coefficient of $\inparen{XY, -1}$ to the sum of all coefficients in $\abs{\circuit}$, as needed.  %For the recursive call on the red gate, $\inparen{X, 1}$ is returned, while the purple gate recursively visits both children of the sampled $\circmult$ gate, returning $\inparen{Y, 1}$.  %Multiplying $-1 \circmult XY$, concludes the random sampling of monomial $-XY$.  Suppose \sampmon also randomly samples $X$ and $-Y$ from $\rpoly$ in a call to \approxq.  To estimate $\rpoly\inparen{\vct{\prob}}$, \approxq computes $\prob_X - \prob_X\prob_Y - \prob_Y$ and scales the accumulation accordingly.
 %such that a source gate \circuit has $\abs{\circuit}\inparen{1,\ldots, 1} = \circuit.\val$ when \circuit.\type $=$ \num and $\abs{\circuit}\inparen{1,\ldots,1} = 1$ otherwise.  For every gate \circuit, \onepass computes $\abs{\circuit}\inparen{1,\ldots, 1}$ as seen in the lighter font of~\Cref{fig:circuit}.  \onepass further weights each child $\circuit_i$ for $i\in\inset{\linput, \rinput}$, by the expression $\frac{\abs{\circuit_i}\inparen{1,\ldots, 1}}{\abs{\circuit}\inparen{1,\ldots, 1}}$.  These weight are the basis for the sampling performed by \sampmon. 
- All algorithm details, including those for \approxq (\Cref{alg:mon-sam}) are in \Cref{sec:proofs-approx-alg}.
+ All algorithm details, including those for \approxq (\Cref{alg:mon-sam}), are in \Cref{sec:proofs-approx-alg}.
 %%%%%%%%%%%%%%%%%%%%%%%


--- a/binarybidb.tex
+++ b/binarybidb.tex
@ -26,7 +26,7 @@ We call a polynomial $\poly\inparen{\vct{X}}$ a \emph{\abbrCTIDB-lineage polynom
 \subsection{\abbrOneBIDB}\label{subsec:one-bidb}
 \label{subsec:tidbs-and-bidbs}

-\noindent A block independent database \abbrBIDB $\pdb'$ models a set of worlds each of which consists of a subset of the \dbbaseName $\tupset'$, where $\tupset'$ is partitioned into $\numblock$ blocks $\block_i$ and the events $\tup\in\block_i$ and $\tup\in\block_j$ are independent  for $i\ne j$.  $\pdb'$ further constrains that all $\tup\in\block_i$ for all $i\in\pbox{\numblock}$ of $\tupset'$ be disjoint events.  
+\noindent A block independent database \abbrBIDB $\pdb'$ models a set of worlds each of which consists of a subset of the \dbbaseName $\tupset'$, where $\tupset'$ is partitioned into $\numblock$ blocks $\block_i$ and the random variables $\worldvec_\tup$ for $\tup\in\block_i$ and $\tup\in\block_j$ are independent  for $i\ne j$.  $\pdb'$ further constrains that $\worldvec_\tup$ all $\tup\in\block_i$ for the same $i\in\pbox{\numblock}$ of $\tupset'$ be disjoint events.  
 %We refer to any monomial that includes $X_\tup X_{\tup'}$ for $\tup\neq\tup'\in\block_i$ as a \emph{cancellation}.
 We define next a specific construction of \abbrBIDB that is useful for our work.

@ -44,11 +44,12 @@ Define a \emph{\abbrOneBIDB} to be the pair $\pdb' = \inparen{\bigtimes_{\tup\in
 Lineage polynomials for arbitrary \dbbaseName $\gentupset'$ are constructed in a manner analogous to $1$-\abbrTIDB\xplural (see \Cref{fig:nxDBSemantics}), differing only in the base case.  
 In a $1$-\abbrTIDB, each tuple contributes a multiplicity of 0 or 1, and $\polyqdt{\rel}{\gentupset}{\tup} = X_\tup$. %\textcolor{red}{CHANGE}
 In a \abbrOneBIDB, each tuple $\tup\in\tupset'$ contributes its corresponding multiplicity: %\textcolor{red}{CHANGE}
-$\polyqdt{\rel}{\gentupset}{\tup} = c_\tup\cdot X_\tup$.  See \Cref{fig:lin-poly-bidb} for details.
+$\polyqdt{\rel}{\gentupset}{\tup} = c_\tup\cdot X_\tup$.  See \Cref{fig:lin-poly-bidb} for full details.

 \abbrOneBIDB are powerful enough to encode \abbrCTIDB:
 \begin{Proposition}[\abbrCTIDB reduction]\label{prop:ctidb-reduct}
-Given \abbrCTIDB $\pdb =$\newline $\inparen{\worlds, \bpd}$, let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$ be the \emph{\abbrOneBIDB} obtained in the following manner: for each $\tup\in\tupset$, create block $\block_\tup = \inset{\intuple{\tup, j}_{j\in\pbox{\bound}}}$ of disjoint tuples, for all $j\in\pbox{\bound}$ where $\bound_{\intuple{\tup, j}} = j$ for each $\intuple{\tup, j}$ in $\tupset'$.
+Given \abbrCTIDB $\pdb =$\newline $\inparen{\worlds, \bpd}$, let $\pdb' = \inparen{\onebidbworlds{\tupset'}, \bpd'}$ be the \emph{\abbrOneBIDB} obtained in the following manner: for each $\tup\in\tupset$, create block $\block_\tup = \inset{\intuple{\tup, j}_{j\in\pbox{\bound}}}$ of disjoint tuples, %for all $j\in\pbox{\bound}$ 
+where $\bound_{\intuple{\tup, j}} = j$ for each $\intuple{\tup, j}$ in $\tupset'$.
  The probability distribution $\bpd'$ is the characterized by the vector $\vct{p} = \inparen{\inparen{\prob_{\tup, j}}_{\tup\in\tupset, j\in\pbox{\bound}}}$. 
  Then, $\mathcal{P}$ and $\mathcal{P}'$ are equivalent.
 \end{Proposition} 
--- a/circuits-model-runtime.tex
+++ b/circuits-model-runtime.tex
@ -64,12 +64,15 @@ For these algorithms, $\jointime{R_1, \ldots, R_n}$ is linear in the {\em AGM bo
 %}\\

 \noindent
-Under this model, an $\raPlus$ query $\query$ evaluated over database $\gentupset$ has runtime $O\inparen{\qruntimenoopt{Q,\gentupset, \bound}}$.
+Under this model, an $\raPlus$ query $\query$ evaluated over over any deterministic database 
+%\dbbaseName 
+that maps each tuple in $\gentupset$ to a multiplicity in $[0,\bound]$ %database $\gentupset$
+has runtime $O\inparen{\qruntimenoopt{Q,\gentupset, \bound}}$.
 We assume that full table scans are used for every base relation access. We can model index scans by treating an index scan query $\sigma_\theta(R)$ as a base relation.
 %Observe that 
 % () .\footnote{This claim can be verified by e.g. simply looking at the {\em Generic-Join} algorithm in~\cite{skew} and {\em factorize} algorithm in~\cite{factorized-db}.} It can be verified that the above cost model on the corresponding $\raPlus$ join queries correctly captures the runtime of current best known .

-\Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and $\tupset$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\tupset,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{\optquery{\query}, \tupset,\bound})$, as we assumed when moving from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}.  Lastly, we can handle FAQs/AJAR queries and factorized databases by allowing for optimization. %, i.e. $\qruntimenoopt{\optquery{\query}, \gentupset, \bound}$.
+\Cref{lem:circ-model-runtime} and \Cref{lem:tlc-is-the-same-as-det} show that for any $\raPlus$ query $\query$ and \dbbaseName $\tupset$, there exists a circuit $\circuit^*$ such that $\timeOf{\abbrStepOne}(Q,\tupset,\circuit^*)$ and $|\circuit^*|$ are both $O(\qruntimenoopt{\optquery{\query}, \tupset,\bound})$, as we assumed when moving from \Cref{prob:big-o-joint-steps} to \Cref{prob:intro-stmt}.  Lastly, we can handle FAQs/AJAR queries and factorized databases by allowing for optimization. %, i.e. $\qruntimenoopt{\optquery{\query}, \gentupset, \bound}$.
 %
 %We now make a simple observation on the above cost model:
 %\begin{proposition}
--- a/introduction.tex
+++ b/introduction.tex
@ -9,14 +9,14 @@ Any such world can be encoded as a vector (of length $\numvar=\abs{\tupset}$) fr
 A given world $\worldvec \in\worlds$ can be interpreted as follows: for each $\tup \in \tupset$, $\worldvec_{\tup}$ is the multiplicity of $\tup$ in $\worldvec$.
 We note that encoding a possible world as a vector, while non-standard, is equivalent to encoding it as a bag of tuples (\Cref{prop:expection-of-polynom}).
 %in \Cref{subsec:expectation-of-polynom-proof}).
-Given that tuple multiplicities are independent events, the  probability distribution $\bpd$ can be expressed compactly by assigning each tuple a (disjoint) probability distribution over $[0,\bound]$. Let $\prob_{\tup,j}$ denote the probability that tuple $\tup$ is assigned multiplicity $j$. The probability of a world $\worldvec$ is then $\prod_{\tup \in \tupset} \prob_{\tup,j(t)}$ for $j(t) = \worldvec_{\tup}$.
+Given that tuple multiplicities are independent events, the  probability distribution $\bpd$ can be expressed compactly by assigning each tuple a  probability distribution over $[0,\bound]$. Let $\prob_{\tup,j}$ denote the probability that tuple $\tup$ is assigned multiplicity $j$. The probability of a world $\worldvec$ is then $\prod_{\tup \in \tupset} \prob_{\tup,j(t)}$ for $j(t) = \worldvec_{\tup}$.
 %
 % Allowing for $\leq \bound$ multiplicities across all tuples gives rise to having $\leq \inparen{\bound+1}^\numvar$ possible worlds instead of the usual $2^\numvar$ possible worlds of a $1$-\abbrTIDB, which (assuming set query semantics), is the same as the traditional set \abbrTIDB.
 % In this work, since we are generally considering bag query input, we will only be considering bag query semantics.
-In this work, we consider queries with bag semantics over such bag probabilistic databases.
+In this work, we consider \emph{queries with bag semantics} over such bag probabilistic databases.
 We denote by $\query\inparen{\worldvec}\inparen{\tup}$ the multiplicity of a result tuple $\tup$ in query $\query$ over possible world $\worldvec\in\worlds$.
 %
-We can formally state our problem of computing the expected multiplicity: % of a result tuple as:
+We now formally state our problem of computing the expected multiplicity: % of a result tuple as:

 \begin{Problem}\label{prob:expect-mult}
 Given \abbrCTIDB $\pdb = \inparen{\worlds, \bpd}$, $\raPlus$ query\footnote{
@ -75,12 +75,12 @@ computing the probability of an output tuple's multiplicity being bounded by giv
 % Our work in contrast assumes a finite bound on the multiplicities where we simply list the finitely many probability values (and hence do not need consider a more succinct representation). Further, our work primarily looks into the fine-grained analysis of computing the expected multiplicity of an output tuple.

 \mypar{Our Setup} In contrast to~\cite{https://doi.org/10.48550/arxiv.2201.11524}, we consider \abbrCTIDB\xplural, i.e., the multiplicity of input tuples is bound by a constant $\bound$.
-For this setting, % (\abbrCTIDB\xplural, i.e., the multiplicity of input tuples is bound by a constant $\bound$), however,
-there exists a trivial \ptime algorithm for computing the expectation of a result tuple's multiplicity~(\Cref{prob:expect-mult}) for any $\raPlus$ query  due to linearity of expectation (see~\Cref{sec:intro-poly-equiv}).
-Since~\Cref{prob:expect-mult} is in \ptime, the  %interesting 
+Then, % (\abbrCTIDB\xplural, i.e., the multiplicity of input tuples is bound by a constant $\bound$), however,
+there exists a trivial \ptime algorithm for computing the expectation of a result tuple's multiplicity~(\Cref{prob:expect-mult}) for any fixed $\raPlus$ query  due to linearity of expectation (see~\Cref{sec:intro-poly-equiv}).
+Since the {\em data complexity} of~\Cref{prob:expect-mult} is in \ptime, the  %interesting 
 we explore the question of  %the hardness of 
- computing expectation using fine-grained analysis and parameterized complexity, where we are interested in the exponent of polynomial runtime.\footnote{While %the authors of 
-  \cite{https://doi.org/10.48550/arxiv.2201.11524} also observe that computing the expectation of an output tuple multiplicity is in \ptime, they do not investigate the fine-grained complexity of this problem.}
+ computing expectation using fine-grained and parameterized complexity, where we are interested in the exponent of polynomial runtime.\footnote{While %the authors of 
+  \cite{https://doi.org/10.48550/arxiv.2201.11524} also observes that computing the expectation of an output tuple multiplicity is in \ptime, it does not investigate the fine-grained complexity of this problem.}

 Specifically, in this work we ask if~\Cref{prob:expect-mult} can be solved in time linear in the runtime of an analogous deterministic query, which we make more precise shortly. 
  If true, this opens up the way for deployment of \abbrCTIDB\xplural in practice. We expand on the practical implications of this problem later in the section but for now we stress that in practice, $\bound$ is indeed constant and most often $\bound=1$.
@ -93,12 +93,12 @@ Specifically, in this work we ask if~\Cref{prob:expect-mult} can be solved in ti
 \centering
 \begin{tabular}{|p{0.43\textwidth}|p{0.12\textwidth}|p{0.35\textwidth}|}
 \hline
-\textbf{Lower bound on $\timeOf{}^*(\qhard^k,\pdb)$} & \textbf{Num.} $\bpd$s
+\textbf{Lower bound on $\timeOf{}^*(\qhard^k,\pdb,1)$} & \textbf{Num.} $\bpd$s
  & \textbf{Hardness Assumption}\\
 \hline
 $\Omega\inparen{\inparen{\qruntime{\optquery{\qhard^k}, \tupset, \bound}}^{1+\eps_0}}$ for {\em some} $\eps_0>0$ & Single & Triangle Detection hypothesis\\
 $\omega\inparen{\inparen{\qruntime{\optquery{\qhard^k}, \tupset, \bound}}^{C_0}}$ for {\em all} $C_0>0$ & Multiple &$\sharpwzero\ne\sharpwone$\\
-$\Omega\inparen{\inparen{\qruntime{\optquery{\qhard^k}, \tupset, \bound}}^{c_0\cdot k}}$ for {\em some} $c_0>0$ & Multiple & Exponential Time Hypothesis (ETH)\\%\Cref{conj:known-algo-kmatch}\\
+$\Omega\inparen{\inparen{\qruntime{\optquery{\qhard^k}, \tupset, \bound}}^{c_0\cdot k/\log{k}}}$ for {\em some} $c_0>0$ & Multiple & Exponential Time Hypothesis (ETH)\\%\Cref{conj:known-algo-kmatch}\\
 \hline
 \end{tabular}
 \savecaptionspace{
@ -112,24 +112,24 @@ Those with `Multiple' in the second column need the algorithm to be able to hand
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \mypar{Our lower bound results}
 %
-Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for further details) denote the runtime for query $\query$ over any deterministic database 
+Let $\qruntime{\query,\gentupset,\bound}$ (see~\Cref{sec:gen} for a formal definition) denote the runtime for query $\query$ over any deterministic database 
 %\dbbaseName 
 that maps each tuple in $\gentupset$ to a multiplicity in $[0,\bound]$.
 %where the maximum multiplicity of any tuple is less than or equal to $\bound$.  % This paper considers $\raPlus$ queries, for which order of operations is \emph{explicit}, as opposed to other query languages, e.g. Datalog, UCQ.  Thus, since order of operations affects runtime, we denote the optimized $\raPlus$ query picked by an arbitrary production system as $\optquery{\query} \approx \min_{\query'\in\raPlus, \query'\equiv\query}\qruntime{\query', \gentupset, \bound}$.  Then $\qruntime{\optquery{\query}, \gentupset,\bound}$ is the runtime for the optimized query.\footnote{The upper bounds on runtime that we derive apply pointwise to any $\query \in\raPlus$, allowing us to abstract away the specific heuristics for choosing an optimized query (i.e., Any deterministic query optimization heuristic is equally useful for \abbrCTIDB queries).}\BG{Rewrite: since an optimized Q is also a Q this also applies in the case where there is a query optimizer the rewrites Q}
-Our question is whether or not it is always true that for every $\query$,  $\timeOf{}^*\inparen{\query, \pdb, \bound}\leq \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$.  We remark that the issue of query optimization is orthogonal to this question (recall that an $\raPlus$ query explicitly encodes order of operations) since we want to answer the above question for all $\query$. \emph{Specifically, if there is an equivalent and more efficient query $\query'$, we allow both deterministic and probabilistic query processing access to $\query'$}.
+Our question is whether or not it is always true that for every $\query$,  $\timeOf{}^*\inparen{\query, \pdb, \bound}\leq \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$.  We remark that the issue of query optimization is orthogonal to this question (recall that an $\raPlus$ query explicitly encodes order of operations) since we want to answer the above question for {\em all} $\query$. \emph{Specifically, if there is an equivalent and more efficient query $\query'$, we allow both deterministic and probabilistic query processing access to $\query'$}.

 Unfortunately the the answer to the above question  is no--
 \Cref{tab:lbs} shows our results.
 Specifically, depending on what hardness result/conjecture we assume, we get various weaker or stronger versions of {\em no} as an answer to our question.  To make some sense of the  lower bounds in \Cref{tab:lbs}, we note that it is not too hard to show that $\timeOf{}^*(\query,\pdb, \bound) \le  \bigO{\inparen{\qruntime{\optquery{\query}, \tupset, \bound}}^k}$, where $k$ is the join width of $\query$ (our notion of join width 
 %follows from~\Cref{def:degree-of-poly}
-is essentially the degree of the corresponding polynomial we introduce in \Cref{sec:intro-poly-equiv}).
+is essentially the degree of the corresponding lineage polynomial we introduce in \Cref{sec:intro-poly-equiv}).
 %% OK: Fig 1 hasn't been introduced yet
 % defined in~\Cref{fig:nxDBSemantics}.) 
 %of the query $\query$ over all result tuples $\tup$ (and the parameter that defines our family of hard queries).
 %
 What our lower bound in the third row says, is that for a specific family of hard queries, one cannot get more than a polynomial improvement (for fixed $k$) over essentially the trivial algorithm for~\Cref{prob:expect-mult}, assuming the Exponential Time Hypothesis (ETH)~\cite{eth}.
 %However, this result assumes a hardness conjecture that is not as well studied as those in the first two rows of the table (see \Cref{sec:hard} for more discussion on the hardness assumptions). 
-We also note that existing results\footnote{This claim follows when we set $\query$ to the query that counts the number of $k$-cliques over database $\tupset$. Precisely the same bounds as in the three rows of~ \Cref{tab:lbs} (with $n$ replacing $\qruntime{\optquery{\query}, \tupset, \bound}$) follow from the same complexity assumptions we make: triangle detection hypothesis (by definition), $\sharpwzero\ne\sharpwone$~\cite{10.5555/645413.652181} and Strong ETH~\cite{CHEN20061346}. For the last result we can replace $k/\log{k}$ by just $k$.
+We also note that existing results\footnote{This claim follows when we set $\query$ to the query that counts the number of $k$-cliques over database $\tupset$ that encodes a graph. Precisely the same bounds as in the three rows of~ \Cref{tab:lbs} (with $n$ replacing $\qruntime{\optquery{\query}, \tupset, \bound}$) follow from the same complexity assumptions we make: triangle detection hypothesis (by definition), $\sharpwzero\ne\sharpwone$~\cite{10.5555/645413.652181} and Strong ETH~\cite{CHEN20061346}. For the last result we can replace $k/\log{k}$ by just $k$.
 %This claim follows from  known results for the problem of evaluating a query $\query$ that counts the number of $k$-cliques over database $\tupset$. Specifically, a lower bound of the form $\Omega\inparen{n^{1+\eps_0}}$ for {\em some} $\eps_0>0$ follows from the triangle detection hypothesis (this like our result is for $k=3$). Second, a lower bound of $\omega\inparen{n^{C_0}}$ for {\em all} $C_0>0$ under the assumption $\sharpwzero\ne\sharpwone$~\cite{10.5555/645413.652181}. Finally, a lower bound of $\Omega\inparen{n^{c_0\cdot k}}$ for {\em some} $c_0>0$  was shown by~\cite{CHEN20061346} (under the strong exponential time hypothesis).
 } 
 imply the claimed lower bounds if we replace the $\qruntime{\optquery{\query}, \tupset, \bound}$ by just $\numvar = |\tupset|$.
@ -155,7 +155,7 @@ compute $\expct_{\vct{W}\sim \pdassign}\pbox{\apolyqdt\inparen{\worldvec}}$).
 \end{Problem}
 %We note that computing \Cref{prob:expect-mult} is equivalent (yields the same result as) to computing \Cref{prob:bag-pdb-poly-expected} (see \Cref{prop:expection-of-polynom}).

-We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or irrelevant to the discussion.
+We drop $\query$, $\tupset$, and $\tup$ from $\apolyqdt$ when they are clear from the context or not relevant to the discussion.
 All of our results rely on working with a {\em reduced} form $\rpoly$ of the lineage polynomial $\poly$. As we show, for the $1$-\abbrTIDB case, computing the expected multiplicity (over bag query semantics) is {\em exactly} the same as evaluating $\rpoly$ over the probabilities that define the $1$-\abbrTIDB.  
 Further, only light extensions (see \Cref{def:reduced-poly-one-bidb}) are required to support block independent disjoint probabilistic databases~\cite{DBLP:conf/icde/OlteanuHK10}. % (bag query semantics with input tuple multiplicity at most $1$). %, for which the proof of~\Cref{lem:tidb-reduce-poly} (introduced shortly) holds .

@ -172,7 +172,7 @@ The lineage polynomial for $\query_1^2$ is $\poly_1^2\inparen{A, B, C, E, U, Y,
 $$
 =A^2U^2B^2 + B^2Y^2E^2 + B^2Z^2C^2 + 2AUB^2YE + 2AUB^2ZC + 2B^2YEZC.
 $$
-To compute $\expct\pbox{\poly_1^2}$ we can use linearity of expectation and push the expectation through each summand.  To keep things simple, let us focus on the monomial $\monomial{1}(A,B,U) = A^2U^2B^2$ as the procedure is the same for all other monomials of $\poly_1^2$.  Let $\randWorld_U$ be the random variable corresponding to a lineage variable $U$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_U^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_U^2}\expct\pbox{\randWorld_B^2}$.  Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can simplify to $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_U^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$.  Observe that if $W_U\in\inset{0, 1}$, then we further would have $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_U}\expct\pbox{\randWorld_B} = \prob_A\cdot\prob_X\cdot\prob_B$ (denoting $\probOf\pbox{\randWorld_A = 1} = \prob_A$) $= \rmonomial{1}\inparen{\prob_A, \prob_U, \prob_B}$ (see $ii)$ of~\Cref{def:reduced-poly}).  However, in this example, we get stuck with $\expct\pbox{\randWorld_U^2}$, since $\randWorld_U\in\inset{0, 1, 2}$ and for $\randWorld_U \gets 2$, $\randWorld_U^2 \neq \randWorld_U$.
+To compute $\expct\pbox{\poly_1^2}$ we can use linearity of expectation and push the expectation through each summand.  To keep things simple, let us focus on the monomial $\monomial{1}(A,U,B) = A^2U^2B^2$ as the procedure is the same for all other monomials of $\poly_1^2$.  Let $\randWorld_X$ be the random variable corresponding to a variable $X$. Because the distinct variables in the product are independent, we can push expectation through them yielding $\expct\pbox{\randWorld_A^2\randWorld_U^2\randWorld_B^2}=\expct\pbox{\randWorld_A^2}\expct\pbox{\randWorld_U^2}\expct\pbox{\randWorld_B^2}$.  Since $\randWorld_A, \randWorld_B\in \inset{0, 1}$ we can simplify to $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_U^2}\expct\pbox{\randWorld_B}$ by the fact that for any $W\in \inset{0, 1}$, $W^2 = W$.  Observe that if $W_U\in\inset{0, 1}$, then we further would have $\expct\pbox{\randWorld_A}\expct\pbox{\randWorld_U}\expct\pbox{\randWorld_B} = \prob_A\cdot\prob_U\cdot\prob_B = \rmonomial{1}\inparen{\prob_A, \prob_U, \prob_B}$ (denoting $\probOf\pbox{\randWorld_X = 1} = \prob_X$).  However, in this example, we get stuck with $\expct\pbox{\randWorld_U^2}$, since $\randWorld_U\in\inset{0, 1, 2}$ and for $\randWorld_U \gets 2$, $\randWorld_U^2 \neq \randWorld_U$.

 The simple insight to get around this issue to note that the random variables $\randWorld_U$ and $\randWorld_{U_1}+2\randWorld_{U_2}$ have exactly the same distribution, where $\randWorld_{U_1},\randWorld_{U_2}\in\inset{0,1}$ and $\probOf\pbox{\randWorld_{U_j} = 1} = \probOf\pbox{\randWorld_{U} = j}$. Thus, the idea is to replace the variable $U$ by $U_1+2U_2$ (where $U_j$ corresponds to the event that $U$ has multiplicity $j$) yielding% to obtain the following polynomial:
 %
@ -191,7 +191,7 @@ The simple insight to get around this issue to note that the random variables $\
 Given that $U$ can only have multiplicity of $1$ or $2$ but not both, 
 %we drop the monomials with the term $U_1U_2$ to get 
 %$\refpoly{1, }^{\inparen{ABU}^2}\inparen{A, U_1, U_2, B} =  A^2U_1^2B^2+2^2\cdot A^2 U_2^2B^2.$
-given world vectors $(\randWorld_A,\randWorld_{U_1},\randWorld_{U_2},\randWorld_A)$, we have $\expct\pbox{\randWorld_{U_1}\randWorld_{U_2}}=0$. Further, since the world vectors are Binary vectors, we have $\expct\pbox{\monomial{1,R}}=\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{U_1}}\expct\pbox{\randWorld_{B}}+$ \\ $4\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{U_2}}\expct\pbox{\randWorld_{B}}\stackrel{\text{def}}{=}\rmonomial{1}\inparen{p_A,\probOf\inparen{U=1},\probOf\inparen{U=2},p_B}$.  We only did the argument for a single monomial but by linearity of expectation we can apply the same argument to all monomials in $\poly_1^2$. Generalizing this argument to arbitrary $\poly$ leads to consider its following `reduced' version:
+given world vectors $(\randWorld_A,\randWorld_{U_1},\randWorld_{U_2},\randWorld_A)\in\inset{0,1}^2$, we have $\expct\pbox{\randWorld_{U_1}\randWorld_{U_2}}=0$. Further, since the world vectors are Binary vectors, we have $\expct\pbox{\monomial{1,R}}=\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{U_1}}\expct\pbox{\randWorld_{B}}+$  $4\expct\pbox{\randWorld_{A}}\expct\pbox{\randWorld_{U_2}}\expct\pbox{\randWorld_{B}}\stackrel{\text{def}}{=}\rmonomial{1}\inparen{p_A,\probOf\inparen{U=1},\probOf\inparen{U=2},p_B}$.  We only did the argument for a single monomial but by linearity of expectation we can apply the same argument to all monomials in $\poly_1^2$. Generalizing this argument to arbitrary $\poly$ leads us to consider its following `reduced' version:

 \begin{Definition}\label{def:reduced-poly}
 For any polynomial $\poly\inparen{\inparen{X_\tup}_{\tup\in\tupset}}$ define the reformulated polynomial $\refpoly{}\inparen{\inparen{X_{\tup, j}}_{\tup\in\tupset, j\in\pbox{\bound}}}
@ -200,7 +200,7 @@ $ and ii) define the \emph{reduced polynomial} $\rpoly\inparen{\inparen{X_{\tup,
 $ to be the polynomial resulting from converting $\refpoly{}$ into the standard monomial basis\footnote{
  This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
 } (\abbrSMB),
-removing all monomials containing the term $X_{\tup, j}X_{\tup, j'}$ for $\tup\in\tupset, j\neq j'\in\pbox{c}$, and setting each \emph{variable}'s exponents $e > 1$ to $1$.
+removing all monomials containing the term $X_{\tup, j}X_{\tup, j'}$ for any $\tup\in\tupset, j\neq j'\in\pbox{c}$, and setting each \emph{variable}'s exponents $e > 1$ to $1$.
 \end{Definition}
 %Continuing with the example\footnote{To save clutter we do not show the full expansion for variables with greatest multiplicity $= 1$ since e.g. for variable $A$, the sum of products itself evaluates to $1^2\cdot A^2 = A$.}
 % $\poly_1^2\inparen{A, B, C, E, X_1, X_2, Y, Z}$ we have $\rpoly_1^2(A, B, C, E, X_1, X_2, Y, Z)=$
@ -228,13 +228,13 @@ $, where $\probAllTup = \inparen{\prob_{\tup,j}}_{\tup\in\tupset,j\in\pbox{\boun
 \subsection{Our Techniques}
 \mypar{Lower Bound Proof Techniques}
 %Our main hardness result shows that computing~\Cref{prob:expect-mult} is $\sharpwonehard$ for $1$-\abbrTIDB. 
-To prove the lower bounds in \cref{tab:lbs} we show that for the same $\query_1$ from the example above, for an arbitrary `product width' $k$, the query $\qhard^k$ is able to encode various hard graph-counting problems (assuming $\bigO{\numvar}$ tuples rather than the $\bigO{1}$ tuples in \Cref{fig:two-step}).
+To prove the lower bounds in \Cref{tab:lbs} we show that for the same $\query_1$ from the example above, for an arbitrary `product width' $k$, the query $\qhard^k$ is able to encode various hard graph-counting problems (assuming $\bigO{\numvar}$ tuples rather than the $\bigO{1}$ tuples in \Cref{fig:two-step}).
 We do so by considering an arbitrary graph $G$ (analogous to relation $\boldsymbol{R}$ of $\query_1$) and analyzing how the coefficients in the (univariate) polynomial $\widetilde{\poly}\left(p,\dots,p\right)$ relate to counts of subgraphs in $G$ that are isomorphic to various subgraphs with $k$ edges. E.g., for the last two rows in \cref{tab:lbs}, we exploit the fact that the coefficient corresponding to $\prob^{2k}$ in $\rpoly\inparen{\prob,\ldots,\prob}$ of $\qhard^k$ is proportional to the number of $k$-matchings in $G$,
 a known hard problem in parameterized/fine-grained complexity literature.


 \mypar{Upper Bound Techniques}
-Our negative results (\Cref{tab:lbs}) indicate that \abbrCTIDB{}s (even for $\bound=1$) cannot achieve comparable performance to deterministic databases for exact results (under complexity assumptions). In fact, under plausible hardness conjectures, one cannot (drastically) improve upon the trivial algorithm to exactly compute the expected multiplicities for $1$-\abbrTIDB\xplural. A natural followup is whether we can do better if we are willing to settle for an approximation to the expected multiplities.
+Our negative results (\Cref{tab:lbs}) indicate that \abbrCTIDB{}s (even for $\bound=1$) cannot achieve comparable performance to deterministic databases for exact results (under complexity assumptions). In fact, under well established hardness conjectures, one cannot (drastically) improve upon the trivial algorithm to exactly compute the expected multiplicities for $1$-\abbrTIDB\xplural. A natural followup is whether we can do better if we are willing to settle for an approximation to the expected multiplities.

 \input{two-step-model}
 We adopt the two-step intensional model of query evaluation used in set-\abbrPDB\xplural, as illustrated in \Cref{fig:two-step}:
@ -242,7 +242,7 @@ We adopt the two-step intensional model of query evaluation used in set-\abbrPDB
 $;
 (ii) \termStepTwo (\abbrStepTwo): Given $\poly(\vct{X})$ for each tuple, compute a $(1\pm \eps)$-approximation $\expct_{\randWorld\sim\bpd}\pbox{\poly(\vct{\randWorld})}$.
 Let $\timeOf{\abbrStepOne}(\query,\tupset,\circuit)$ denote the runtime of \abbrStepOne when it outputs $\circuit$ (a representation of $\poly$ as an arithmetic circuit --- more on this representation in~\Cref{sec:expression-trees}).
-Denote by $\timeOf{\abbrStepTwo}(\circuit, \epsilon)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo (when $\poly$ is input as $\circuit$). Then to answer if we can compute a $(1\pm \eps)$-approximation to the expected multiplicity, it is enough to answer the following:
+Denote by $\timeOf{\abbrStepTwo}(\circuit, \epsilon)$ (recall $\circuit$ is the output of \abbrStepOne) the runtime of \abbrStepTwo when $\poly$ is input as $\circuit$. Then to answer if we can compute a $(1\pm \eps)$-approximation to the expected multiplicity, it is enough to answer the following:
 %which we can leverage~\Cref{def:reduced-poly} and~\Cref{lem:tidb-reduce-poly} to address the next formal objective:

 \begin{Problem}[\abbrCTIDB linear time approximation]\label{prob:big-o-joint-steps}
@ -260,17 +260,17 @@ Accordingly, this work uses (arithmetic) circuits\footnote{
 }
 as the representation system of $\poly(\vct{X})$, and we show in \Cref{sec:circuit-depth} an $\bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$ algorithm for constructing the lineage polynomial for all result tuples of an $\raPlus$ query $\query$ (or more precisely, a circuit $\circuit$ with $\numvar$ sinks, one per output tuple).% representing the tuple's lineage).
 %
-Since a representation $\circuit^*$ exists where $\timeOf{\abbrStepOne}(\query,\tupset,\circuit^*)\le \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$ and
+ Since a representation $\circuit^*$ exists where $\timeOf{\abbrStepOne}(\query,\tupset,\circuit^*)\le \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$ and
 the size of $\circuit^*$ is bounded by $\qruntime{\optquery{\query}, \tupset, \bound}$ (i.e., $|\circuit^*| \le \bigO{\qruntime{\optquery{\query}, \tupset, \bound}}$) (see~\Cref{sec:circuit-runtime}), we can focus on the complexity of \abbrStepTwo.
 %Thus, the question of approximation can be stated as the following stronger (since~\Cref{prob:big-o-joint-steps} has access to \emph{all} equivalent \circuit representing $\query\inparen{\vct{W}}\inparen{\tup}$), but sufficient condition:
 %Given such a $\circuit^*$, 
-To solve \Cref{prob:big-o-joint-steps}, it is \emph{sufficient} to solve: % the following problem:
+Thus, to solve \Cref{prob:big-o-joint-steps}, it is \emph{sufficient} to solve: % the following problem:
 \begin{Problem}\label{prob:intro-stmt}
-Given one circuit $\circuit$ that encodes $\Phi\inparen{\vct{X}}$ for all result tuples $\tup$ (one sink per $\tup$) for \abbrCTIDB $\pdb$ and $\raPlus$ query $\query$, does there exist an algorithm that computes a $(1\pm\epsilon)$-approximation of $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$ (for all result tuples $\tup$) in $\bigO{|\circuit|}$ time?
+Given any circuit $\circuit$ that encodes $\Phi\inparen{\vct{X}}$ for all result tuples $\tup$ (one sink per $\tup$) for \abbrCTIDB $\pdb$ and $\raPlus$ query $\query$, does there exist an algorithm that computes a $(1\pm\epsilon)$-approximation of $\expct_{\rvworld\sim\bpd}\pbox{\query\inparen{\rvworld}\inparen{\tup}}$ (for all result tuples $\tup$) in $\bigO{|\circuit|}$ time?
 \end{Problem}

-We will formalize the notions of circuits and hence, \Cref{prob:intro-stmt} in \Cref{sec:expression-trees}. For an upper bound on approximating the expected count, it is easy to check that if all the probabilties are constant then (with an additive adjustment) $\poly\left(\prob_1,\dots, \prob_n\right)$ is a constant factor approximation of $\rpoly$ (recall \Cref{def:reduced-poly}).
-This is illustrated in the following example using $\query_1^2$ from earlier.  To aid in presentation we again limit our focus to $\monomial{1,R}$, assume $\bound = 2$ for variable $U$ and $\bound = 1$ for all other variables.  Let $\prob_A$ denote $\probOf\pbox{A = 1}$.
+We will formalize the notions of circuits and hence, \Cref{prob:intro-stmt} in \Cref{sec:expression-trees}. For an upper bound on approximating the expected count, it is easy to check that if all the probabilties are constant then (with an additive adjustment) $\poly\left(\prob_1,\dots, \prob_n\right)$ is a constant factor approximation of $\rpoly$ (where we assume $\tupset=[n]$).
+This is illustrated in the following example using $\query_1^2$ from earlier.  To aid in presentation we again limit our focus to $\monomial{1,R}(A,U,B)$, assume $\bound = 2$ for variable $U$ and $\bound = 1$ for all other variables.  Let $\prob_X$ denote $\probOf\pbox{X = 1}$.
 %In computing $\rpoly$, we have some cancellations to deal with:
 Then we have:
 %
@ -299,13 +299,13 @@ $\monomial{1,R}\inparen{\vct{X}} = A^2\inparen{U_1^2 + 4U_1U_2 + 4U_2^2}B^2 =A^2
 %\end{align*}
 %\end{footnotesize}
 Noting that $\rmonomial{1}\inparen{\vct{X}} = AU_1B+4AU_2B$,
-If we assume that all probability values are in $[p_0,1]$ for some $p_0>0$, 
+if we assume that all probability values are in $[p_0,1]$ for some $p_0>0$, 
 %then given access to $\refpoly{1, }^{\inparen{ABX}^2}\inparen{\vct{\prob}} - 4\prob_A^2\prob_{X_1}\prob_{X_2}\prob_B^2$
 we get that $\monomial{1,R}\inparen{\vct{\prob}} - 4\prob_A^2\prob_{U_1}\prob_{U_2}\prob_B^2$ is in the range $\pbox{p_0^3\cdot\rmonomial{1}\inparen{\vct{\prob}}, \rmonomial{1}\inparen{\vct{\prob}}}$.
 %We can simulate sampling from $\refpoly{1, }^2\inparen{\vct{X}}$ by sampling monomials from $\refpoly{1, }^2$ while ignoring any samples $A^2X_1X_2B^2$.
 Note however, that this is \emph{not a tight approximation}.
 In~\Cref{sec:algo} we demonstrate that a $(1\pm\epsilon)$ (multiplicative) approximation with competitive performance is achievable.
-To get an $(1\pm \epsilon)$-multiplicative approximation and solve~\Cref{prob:intro-stmt}, using \circuit we uniformly sample monomials from the equivalent \abbrSMB representation of $\poly$ (without materializing the \abbrSMB representation) and `adjust' their contribution to $\widetilde{\poly}\left(\cdot\right)$.
+To get an $(1\pm \epsilon)$-multiplicative approximation and solve~\Cref{prob:intro-stmt}, using \circuit we uniformly sample monomials from the equivalent \abbrSMB representation of $\poly$ (without materializing the \abbrSMB representation) and `adjust' their contribution to $\widetilde{\poly}\left(\vct{p}\right)$.


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
--- a/mult_distinct_p.tex
+++ b/mult_distinct_p.tex
@ -27,7 +27,7 @@ Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$ with no sel
 %\end{hypo}
 %=======
 \begin{Theorem}[~\cite{10.1109/FOCS.2014.22}]\label{conj:known-algo-kmatch}
-Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$,  $\kmatchtime\ge |\vset|^{\Omega\inparen{k/\log{k}}}$, assuming ETH.
+Given positive integer $k$ and undirected graph $G=(\vset,\edgeSet)$,  $\kmatchtime\ge |\vset|^{\Omega\inparen{k/\log{k}}}$ (assuming ETH).
 \end{Theorem}

 %We note that the above conjecture is somewhat non-standard. In particular, the best known algorithm to compute $\numocc{G}{\kmatch}$ takes time $\Omega\inparen{|V|^{k/2}}$
--- a/prob-def.tex
+++ b/prob-def.tex
@ -9,7 +9,7 @@ We focus on the problem of computing $\expct_{\worldvec\sim\pdassign}\pbox{\poly

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[Circuit]\label{def:circuit}
-A circuit $\circuit$ is a Directed Acyclic Graph (DAG) with source gates (in degree of $0$) drawn from either $\domN$ or $\vct{X} = \inparen{X_1,\ldots,X_\numvar}$ and one sink gate for each result tuple.  Internal gates have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
+A circuit $\circuit$ is a Directed Acyclic Graph (DAG) with source gates (in degree of $0$) drawn from either $\domN$ or $\vct{X} = \inparen{X_\tup}_{\tup\in\tupset}$ and one sink gate for each result tuple.  Internal gates have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
 %
 Each gate has the following members: \type, \vari{input}, %\val, 
 \vpartial, \degval, \vari{Lweight}, and \vari{Rweight}, where \type is the value type $\{\circplus, \circmult, \var, \tnum\}$ and \vari{input} the list of inputs. Source gates have an extra member \val for the value.  $\circuit_\linput$ ($\circuit_\rinput$) denotes the left (right) input of \circuit.
--- a/pwsem.tex
+++ b/pwsem.tex
@ -17,7 +17,7 @@ Let $\abs{\poly'}$ be the number of operators in $\poly'$. Then:

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Corollary}\label{cor:expct-sop}
-If $\poly'$ is a \abbrOneBIDB lineage polynomial already in \abbrSMB, then the expectation of $\poly$, i.e., $\expct\pbox{\poly'}$ % = \rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ 
+If $\poly'$ is a \abbrOneBIDB lineage polynomial already in \abbrSMB, then the expectation of $\poly$, i.e., $	\expct_{\vct{W} \sim \pdassign'}\pbox{\poly'\inparen{\vct{W}}}$ % = \rpoly\left(\prob_1,\ldots, \prob_\numvar\right)$ 
 can be computed in $\bigO{\abs{\poly'}}$ time.
 \end{Corollary}