Merge branch 'master' of gitlab.odin.cse.buffalo.edu:ahuber/SketchingWorlds

2020-12-20 17:51:32 -06:00 · 2020-12-20 17:51:32 -06:00 · 25753c6c7f
parent cc7c5fdb8a 51766b857c
commit 25753c6c7f
7 changed files with 47 additions and 41 deletions
--- a/circuits-model-runtime.tex
+++ b/circuits-model-runtime.tex
@ -1,20 +1,22 @@
 %!TEX root=./main.tex
 \section{Generalizations}
 \label{sec:gen}
-In this section, we consider several generalizations/corollaries of our results.
+In this section, we consider generalizations/corollaries of our results.
 In particular, in~\Cref{sec:circuits} we first consider the case when the compressed  polynomial is represented by a Directed Acyclic Graph (DAG) instead of an expression tree (\Cref{def:express-tree}) and observe that our results carry over.
 Then, we formalize our claim from \Cref{sec:intro} that a linear algorithm for our problem implies that PDB queries can be answered in the same runtime as deterministic queries under reasonable assumptions.
-Finally, in~\Cref{sec:momemts}, we observe how our results can be used to estimate moments other than the expectation.
+Finally, in~\Cref{sec:momemts}, we generalize our result for expectation to other moments.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Lineage circuits}
+\subsection{Lineage Circuits}
 \label{sec:circuits}

-In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of our input as a polynomial. In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree. However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' of intermediate results, which is crucial for these algorithms (and other query processing methods as well).
+In~\Cref{sec:semnx-as-repr}, we switched to thinking of our query results as polynomials and until now, have focused on thinking of inputs this way. 
+In particular, starting with~\Cref{sec:expression-trees} we considered these polynomials to be represented as an expression tree. 
+However, these do not capture many of the compressed polynomial representations that we can get from query processing algorithms on bags, including the recent work on worst-case optimal join algorithms~\cite{ngo-survey,skew}, factorized databases~\cite{factorized-db}, and FAQ~\cite{DBLP:conf/pods/KhamisNR16}. Intuitively, the main reason is that an expression tree does not allow for `sharing' of intermediate results, which is crucial for these algorithms (and other query processing methods as well).

 In this section, we represent query polynomials via {\em arithmetic circuits}~\cite{arith-complexity}, a standard way to represent polynomials over fields (particularly in the field of algebraic complexity) that we use for polynomials over $\mathbb N$ in the obvious way.
 We present a formal treatment of {\em lineage circuit}s in~\Cref{sec:circuits-formal}, with only a quick overview to in this section.
-A lineage circuit is represented by a DAG, where each source node corresponds to either one of the input variables or a constant and the sinks correspond to output tuples.
+A lineage circuit is represented by a DAG, where each source node corresponds to either one of the input variables or a constant, and the sinks to output tuples.
 Every other node has at most two in-edges, is labeled as an addition or a multiplication node, and has no limit on its outdegree.
 Note that if we limit the outdegree to one, then we get back expression trees.

@ -36,11 +38,12 @@ For a more detailed discussion of why~\Cref{lem:approx-alg} holds for a lineage
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsubsection{The cost model}
 \label{sec:cost-model}
-Thus, so far our analysis of the runtime of $\onepass$ has been in terms of the size of the compressed lineage polynomial.
-We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial for a query $Q$ and \bi $\pxdb$ in runtime that is linear in the
-runtime that a class of deterministic algorithms take to evaluate $Q(D)$ for any world $\db$ of $\pxdb$ as long as
-there exists a constant $c$ that is independent of the number tuple in the largest world of $\pxdb$  such that $\abs{pxdb} \leq c \cdot \abs{\db}$. In practice, this is often the case because typically the blocks of a \bi represent entities where we are uncertain about their properties and in such a scenario often there are only a limited number of alternatives for each block. Note that all TIDBs trivially fulfill this condition for $c = 1$.
-That is for \bis that fulfill this restriction approximating the expectation of results of SPJU queries is only has a constant factor overhead over deterministic query processing (using one of the algorithms for which we prove the claim).
+So far our analysis of $\approxq$ has been in terms of the size of the compressed lineage polynomial.
+We now show that this model corresponds to the behavior of a deterministic database by proving that for any union of conjunctive queries, we can construct a compressed lineage polynomial for a query $Q$ and \bi $\pxdb$ of size (and in runtime) linear in the runtime of a general class of query processing algorithms for the same query $Q$ on a deterministic database $\db$.
+We assume a linear relationship between input sizes $|\pxdb|$ and $|\db|$ (i.e., $\exists c, \db \in \pxdb$ s.t. $\abs{\pxdb} \leq c \cdot \abs{\db})$).
+This is a reasonable assumption because each block of a \bi represents entities with uncertain attributes. 
+In practice there is often a limited number of alternatives for each block (e.g., which of five conflicting data sources to trust). Note that all \tis trivially fulfill this condition (i.e., $c = 1$).
+%That is for \bis that fulfill this restriction approximating the expectation of results of SPJU queries is only has a constant factor overhead over deterministic query processing (using one of the algorithms for which we prove the claim).
 % with the same complexity as it would take to evaluate the query on a deterministic \emph{bag} database of the same size as the input PDB.
 We adopt a minimalistic compute-bound model of query evaluation drawn from the worst-case optimal join literature~\cite{skew,ngo-survey}.
 \newcommand{\qruntime}[1]{\textbf{cost}(#1)}
@ -115,15 +118,15 @@ This follows from~\Cref{lem:circuits-model-runtime} and (the lineage circuit cou
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Higher moments}
+\subsection{Higher Moments}
 \label{sec:momemts}

 We make a simple observation to conclude the presentation of our results.
-So far we have presented algorithms that approximate the expectation of $\poly$.
+So far focused on the expectation of $\poly$.
 In addition, we could e.g. prove bounds of probability of the multiplicity being at least $1$.
 While we do not have a good approximation algorithm for this problem, we can make some progress as follows:
-Note that for any positive integer $m$ we can compute the expectation $\poly^m$ (since this only changes the degree of the corresponding lineage polynomial by a factor of $m$).
-In other words, we can compute the $m$-th moment of the multiplicities as well allowing us to e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in.
+For any positive integer $m$ we can compute the expectation $\poly^m$ (which only changes the degree of the corresponding lineage polynomial by a factor of $m$).
+In other words, we can compute the $m$-th moment of the multiplicities, allowing us to e.g. to use Chebyschev inequality or other high moment based probability bounds on the events we might be interested in.
 However, we leave the question of coming up with a more accurate approximation algorithms for future work.

 %%% Local Variables:
--- a/conclusions.tex
+++ b/conclusions.tex
@ -3,11 +3,11 @@

 We have studied the problem of calculating the expectation of query polynomials over BIDBs. %random integer variables.
 This problem has a practical application in probabilistic databases over multisets, where it corresponds to calculating the expected multiplicity of a query result tuple.
-This problem has been studied extensively for sets (lineage formulas), but the bag settings has not received much attention so far.
+It has been studied extensively for sets (lineage formulas), but the bag settings has not received much attention.
 While the expectation of a polynomial can be calculated in linear time in the size of polynomials that are in SOP form, the problem is \sharpwonehard for factorized polynomials.
 We have proven this claim through a reduction from the problem of counting k-matchings.
 When only considering polynomials for result tuples of UCQs over TIDBs and BIDBs (under the assumption that there are few cancellations), we prove that it is still possible to approximate the expectation of a polynomial in linear time.
-Interesting directions for future work include development of a dichotomy for queries over bag PDBs and desgin approximation schemes for data models beyond what we consider in this paper.
+Interesting directions for future work include development of a dichotomy for queries over bag PDBs and approximations for data models beyond what we consider in this paper.
 % Furthermore, it would be interesting to see whether our approximation algorithm can be extended to support queries with negations, perhaps using circuits with monus as a representation system.

 \BG{I am not sure what interesting future work is here. Some wild guesses, if anybody agrees I'll try to flesh them out:
--- a/intro.tex
+++ b/intro.tex
@ -206,15 +206,14 @@ To see why computing this probability is hard, observe that the clauses of the d
 Conversely, in Bag-PDBs, correlations between clauses of the SOP polynomial are not problematic thanks to linearity of expectation.
 The expectation computation over the output lineage is simply the sum of expectations of each clause.
 For \Cref{ex:intro}, the expectation is simply
-{\small
-\begin{align*}
-\expct\pbox{\poly(W_a, W_b, W_c)} &= \expct\pbox{W_aW_b} + \expct\pbox{W_bW_c} + \expct\pbox{W_cW_a}\\
-\intertext{\normalsize
+\begin{equation*}
+\expct\pbox{\poly_{bag}(W_a, W_b, W_c)} = \expct\pbox{W_aW_b} + \expct\pbox{W_bW_c} + \expct\pbox{W_cW_a}
+\end{equation*}
 In this particular lineage polynomial, all variables in each product clause are independent, so we can push expectations through.
-}
-&= \expct\pbox{W_a}\expct\pbox{W_b} + \expct\pbox{W_b}\expct\pbox{W_c} + \expct\pbox{W_c}\expct\pbox{W_a}
-\end{align*}
-}
+\begin{equation*}
+= \expct\pbox{W_a}\expct\pbox{W_b} + \expct\pbox{W_b}\expct\pbox{W_c} + \expct\pbox{W_c}\expct\pbox{W_a}
+\end{equation*}
+
 Computing such expectations is indeed linear in the size of the SOP as the number of operations in the computation is \textit{exactly} the number of multiplication and addition operations of the polynomial.
 As a further interesting feature of this example, note that $\expct\pbox{W_i} = \probOf[W_i = 1]$, and so taking the same polynomial over the reals:
 \begin{multline}
@ -307,7 +306,7 @@ With $\poly^2$ as an example, we have:
 Note that the reduced polynomial is a closed form of the expected count (i.e., $\expct\pbox{\poly^2} = \rpoly(\probOf\pbox{W_a=1}, \probOf\pbox{W_b=1}, \probOf\pbox{W_c=1})$).
 Also note that the $\poly$ in~\Cref{ex:bag-vs-set} is already in reduced form.

-The reduced form of a polynomial can be obtained in a linear scan over the clauses of a SOP encoding of the polynomial.
+The reduced form of a polynomial can be obtained in a linear scan over the clauses of an SOP encoding of the polynomial.
 In prior work on lineage-based Bag-PDBs~\cite{kennedy:2010:icde:pip,DBLP:conf/vldb/AgrawalBSHNSW06,yang:2015:pvldb:lenses} where this encoding is implicitly assumed, computing the expected count is linear in the size of the encoding.
 In general however, compressed encodings of the polynomial can be exponentially smaller in $k$ for $k$-products --- the query $\poly^k$ obtained by taking the Cartesian product of $k$ copies of $\poly$ has a factorized encoding of size $6\cdot k$, while the SOP encoding is of size $2\cdot 3^k$.
 This leads us to the \textbf{central question of this paper}:
--- a/mult_distinct_p.tex
+++ b/mult_distinct_p.tex
@ -12,7 +12,7 @@ In this section, we will prove that computing $\expct\limits_{\vct{W} \sim \pd}\
 \subsection{Preliminaries}

 Our hardness results are based on (exactly) counting the number of occurrences of a fixed graph $H$ as a subgraph in $G$. Let $\numocc{G}{H}$ denote the number of occurrences of pattern $H$ in graph $G$. %, where, for example, $\numocc{G}{\ed}$ means the number of single edges in $G$.
-In particular, we will consider the problems of computing the following counts (given $G$ as an input in its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threepath}$ (the number of $3$-paths),  $\numocc{G}{\threedis}$ (the number of $3$-matchings or collection of three node-disjoint edges) and its generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings or collections of $k$ node-disjoint edges).
+In particular, we will consider the problems of computing the following counts (given $G$ as an input and its adjacency list representation): $\numocc{G}{\tri}$ (the number of triangles), $\numocc{G}{\threepath}$ (the number of $3$-paths),  $\numocc{G}{\threedis}$ (the number of $3$-matchings or collection of three node-disjoint edges) and its generalization $\numocc{G}{\kmatch}$ (the number of $k$-matchings or collections of $k$ node-disjoint edges).
 %
 Our hardness result in \Cref{sec:multiple-p} is based on the following result:

@ -35,7 +35,7 @@ There exists a constant $\eps_0>0$ such that given an undirected graph $G=(V,E)$
 \end{hypo}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
-Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-hard}), which states that detection whether $G$ has a triangle or not takes time $\Omega\inparen{|E|^{4/3}}$, implies that in Conjecture~\ref{conj:graph} we can take $\eps_0\ge \frac 13$. 
+Based on the so called {\em Triangle detection hypothesis} (cf.~\cite{triang-hard}), which states that detection of whether $G$ has a triangle or not takes time $\Omega\inparen{|E|^{4/3}}$, implies that in Conjecture~\ref{conj:graph} we can take $\eps_0\ge \frac 13$. 
 %The current best known algorithm to count the number of $3$-matchings, to
 %\AR{Need to add something about 3-paths and 3-matchings as well.}

@ -73,7 +73,7 @@ Computing $\rpoly_G^\kElem(\prob_i,\dots,\prob_i)$ for arbitrary $G$ and any $(2
 %
 We will prove the above result by reduction from the problem of computing the number of $k$-matchings in $G$. Given the current best-known algorithm for this counting problem, our results imply that unless the state-of-the-art $k$-matching algorithms are improved, we cannot hope to solve our problem in time better than $\Omega_k\inparen{m^{k/2}}$, which is only quadratically faster than expanding $\poly_{G}^\kElem(\vct{X})$ into its \abbrSMB form and then using \Cref{cor:expct-sop}. By contrast the approximation algorithm we present in \Cref{sec:algo} has runtime $O_k\inparen{m}$ for  this query (since it runs in linear-time on all lineage polynomials).

-Here, we present a reduction from the problem of couting $\kElem$-matchings in a graph to our problem:
+Here, we present a reduction from the problem of counting $\kElem$-matchings in a graph to our problem:
 \begin{Lemma}\label{lem:qEk-multi-p}
 Let $\prob_0,\ldots, \prob_{2\kElem}$ be distinct values in $(0, 1]$.  Then given the values $\rpoly_{G}^\kElem(\prob_i,\ldots, \prob_i)$ for $0\leq i\leq 2\kElem$, the number of $\kElem$-matchings in $G$ can be computed in $O\inparen{\kElem^3}$ time.
 \end{Lemma}
--- a/poly-form.tex
+++ b/poly-form.tex
@ -22,7 +22,7 @@ A monomial is a product of variable terms, each raised to a non-negative integer
  \[
    \sum_{i=1}^n c_i \cdot m_i
  \]
-where each $c_i$ is a positive integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. The \abbrSMB of a polynomial $\poly$ is $\smbOf{\poly}$.
+where each $c_i$ is an integer and each $m_i$ is a monomial and $m_i \neq m_j$ for $i \neq j$. The \abbrSMB of a polynomial $\poly$ is $\smbOf{\poly}$.
 %  fully expanded out such that no product of sums exist and where each unique monomial appears exactly once.
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -94,7 +94,9 @@ Given the set of BIDB variables $\inset{X_{b,i}}$, define
 \end{Definition}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
-Intuitively, in the reduced form, all exponents $e > 1$ are reduced to $e = 1$ by $\text{mod } \mathcal T$, and all monomials with multile variables from the same block $\block$ are dropped by $\text{mod } \mathcal B$ (i.e., any world containing more than one tuple from a block has $0$ probability and can be ignored). 
+
+Intuitively, in the reduced form, all exponents $e > 1$ are reduced to $e = 1$ by $\text{mod } \mathcal T$, and all monomials with multiple variables from the same block $\block$ are dropped by $\text{mod } \mathcal B$ (i.e., any world containing more than one tuple from a block has $0$ probability and can be ignored). 
+
 For the special case of \tis, the second step is not necessary since every block contains a single tuple.
 %Alternatively, one can think of $\rpoly$ as the \abbrSMB of $\poly(\vct{X})$ when the product operator is idempotent.
 %
@ -126,7 +128,7 @@ Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blo
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{Definition}[Valid Worlds]
-For probability distribution $\probDist$ and its corresponding probability mass function $\probOf$, the set of valid worlds $\eta$ is the worlds with probability value greater than $0$; i.e., for variable vector $\vct{W}$
+For probability distribution $\probDist$ and its corresponding probability mass function $\probOf$, the set of valid worlds $\eta$ consists of all the worlds with probability value greater than $0$; i.e., for variable vector $\vct{W}$
 \[
 \eta = \{\vct{w}\st \probOf[\vct{W} = \vct{w}] > 0\}
 \]
--- a/ra-to-poly.tex
+++ b/ra-to-poly.tex
@ -13,7 +13,7 @@ Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\p
 For a probabilistic  database $\pdb = (\idb, \pd)$,  the result of a query is the pair $(\query(\idb), \pd')$ where $\pd'$ is a probability distribution over $\query(\idb)$  that assigns to each possible query result the sum of the probabilities of the worlds that produce this answer:
 \[\forall \db \in \query(\idb): \probOf'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \probOf(\db') \]

-Note that in this work we consider multisets, i.e., each possible world is a set of multiset relations and queries are evaluated using bag semantics. We will use K-relations to model multisets. A \emph{K-relation}~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are annotated with elements from a commutative semiring $\semK = (\domK, \addK, \multK, \zeroK, \oneK)$.  A commutative semiring is a structure with a domain $\domK$ and associative and commutative binary operations $\addK$ and $\multK$ such that $\multK$ distributes over $\addK$, $\zeroK$ is the identity of $\addK$, $\oneK$ is the identity of $\multK$, and $\zeroK$ annihilates all elements of $\domK$ when combined by $\multK$.
+Note that in this work we consider multisets, i.e., each possible world is a set of multiset relations and queries are evaluated using bag semantics. We will use $\domK$-relations to model multisets. A \emph{$\domK$-relation}~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are annotated with elements from a commutative semiring $\semK = (\domK, \addK, \multK, \zeroK, \oneK)$.  A commutative semiring is a structure with a domain $\domK$ and associative and commutative binary operations $\addK$ and $\multK$ such that $\multK$ distributes over $\addK$, $\zeroK$ is the identity of $\addK$, $\oneK$ is the identity of $\multK$, and $\zeroK$ annihilates all elements of $\domK$ when combined by $\multK$.
 Let $\udom$ be a countable domain of values.
 Formally, an n-ary $\semK$-relation over $\udom$ is a function $\rel: \udom^n \to \domK$ with finite support $\support{\rel} = \{ \tup \mid \rel(\tup) \neq \zeroK \}$.
 A $\semK$-database is a set of $\semK$-relations. It will be convenient to also interpret a $\semK$-database as a function from tuples to annotations. Thus, $\rel(t)$ (resp., $\db(t)$) denotes the annotation associated by $\semK$-relation $\rel$ ($\semK$-database $\db$) to $t$.
--- a/related-work.tex
+++ b/related-work.tex
@ -4,25 +4,25 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %\subsection{Probabilistic Databases}\label{sec:prob-datab}
 \textbf{Probabilistic Databases} (PDBs) have been studied predominantly for set semantics.
-A multitude of data models have been proposed for encoding a PDB more compactly than as its set of possible worlds. These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
+Many data models have been proposed for encoding PDBs more compactly than as sets of possible worlds. 
+These include tuple-independent databases~\cite{VS17} (\tis), block-independent databases (\bis)~\cite{RS07}, and \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
 with probability distribution  over its variables.
-This is similar to our $\semNX$-PDBs, but we use polynomials instead of Boolean expressions and only allow constants as attribute values.
+This is similar to our $\semNX$-PDBs, with Boolean expressions instead of polynomials.
 % Tuple-independent databases (\tis) consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events.
 % While unable to encode correlations directly, \tis are popular because any finite probabilistic database can be encoded as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}.
 % Block-independent databases (\bis) generalize \tis by partitioning the input into blocks of disjoint tuples, where blocks are independent~\cite{RS07}. %,BS06
 % \emph{PC-tables}~\cite{GT06} pair a C-table % ~\cite{IL84a}
 % with probability distribution over its variables. This is similar to our $\semNX$-PDBs, except that we do not allow for variables as attribute values and instead of local conditions (propositional formulas that may contain comparisons), we associate tuples with polynomials $\semNX$.

-Approaches for probabilistic query processing (i.e., computing the marginal probability for query result tuples), fall into two broad categories.
+Approaches for probabilistic query processing (i.e., computing marginal probabilities for tuples), fall into two broad categories.
 \emph{Intensional} (or \emph{grounded}) query evaluation computes the \emph{lineage} of a tuple % (a Boolean formula encoding the provenance of the tuple)
 and then the probability of the lineage formula.
-In this paper we focus on intensional query evaluation using polynomials instead of Boolean formulas.
-It is a well-known fact that computing the marginal probability of a tuple is \sharpphard (proven through a reduction from weighted model counting~\cite{valiant-79-cenrp} %provan-83-ccccptg
-using the fact the tuple's marginal probability is the probability of a its lineage formula).
+In this paper we focus on intensional query evaluation with polynomials.
+It has been shown that computing the marginal probability of a tuple is \sharpphard~\cite{valiant-79-cenrp} (by reduction from weighted model counting).
 The second category, \emph{extensional} query evaluation, % avoids calculating the lineage.
 % This approach
 is in \ptime, but is limited to certain classes of queries.
-Dalvi et al.~\cite{DS12} proved that  a dichotomy for unions of conjunctive queries (UCQs):
+Dalvi et al.~\cite{DS12} proved a dichotomy for unions of conjunctive queries (UCQs):
 for any UCQ the probabilistic query evaluation problem is either \sharpphard (requires extensional evaluation) or \ptime (permits intensional).
 Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation. % R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries.
 Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15}. %,AB15c
@ -35,9 +35,11 @@ Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of
 % \cite{FH12} identifies a tractable class of queries involving aggregation.
 In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while~\cite{FH12} computes probabilities for individual output annotations).

-\textbf{Compressed Encodings} are used extensively for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g.,factorizations~\cite{factorized-db}) some of which have been utilized for  probabilistic query processing, e.g.,~\cite{jha-12-pdwm}. Compact representations of Boolean formulas for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD. In terms of circuits over semiring expression,~\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation of a semiring). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.
+\noindent \textbf{Compressed Encodings} are used for Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g., factorizations~\cite{factorized-db}) some of which have been utilized for  probabilistic query processing, e.g.,~\cite{jha-12-pdwm}. 
+Compact representations for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD. 
+\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.

-Additional discussion related work pertaining to fine-grained complexity appears in \Cref{sec:param-compl}.
+\noindent \Cref{sec:param-compl} covers more related work on fine-grained complexity.


 %%% Local Variables: