merged bib + wrote related work

master
Boris Glavic 2020-12-19 00:19:27 -06:00
parent cc989e88bc
commit 923cba6a34
8 changed files with 656 additions and 72 deletions

View File

@ -97,7 +97,7 @@ series = {PODS '07}
@inproceedings{DBLP:conf/sigmod/BoulosDMMRS05,
@inproceedings{BD05,
author = {Jihad Boulos and
Nilesh N. Dalvi and
Bhushan Mandhani and
@ -300,4 +300,3 @@ numpages = {12}
pages = {5--16},
year = {2017}
}

View File

@ -6,7 +6,7 @@
\AR{\textbf{Oliver/Boris:} What is missing from the intro is why would someone care about bag-PDBs in {\em practice}? This is kinda obliquely referred to in the first para but it would be good to motivate this more. The intro (rightly) focuses on the theoretical reasons to study bag PDBs but what (if any) are the practical significance of getting bag PDBs done in linear-time? Would this lead to much faster real-life PDB systems?}
Modern production databases like Postgres and Oracle use bag semantics, while research on probabilistic databases (PDBs)~\cite{DBLP:series/synthesis/2011Suciu,DBLP:conf/sigmod/BoulosDMMRS05,DBLP:conf/icde/AntovaKO07a,DBLP:conf/sigmod/SinghMMPHS08} focuseses predominantly on query evaluation under set semantics.
Modern production databases like Postgres and Oracle use bag semantics, while research on probabilistic databases (PDBs)~\cite{DBLP:series/synthesis/2011Suciu,BD05,DBLP:conf/icde/AntovaKO07a,DBLP:conf/sigmod/SinghMMPHS08} focuseses predominantly on query evaluation under set semantics.
This is not surprising, as the conventional strategy for encoding the lineage of a query result --- a key component of query evaluation in PDBs --- makes computing typical statistics like marginal probabilities or moments easy (at worst linear in the size of the lineage) for bags and hence, perhaps not worthy of research attention, but hard (at worst exponential in the size of the lineage) for sets and hence, interesting from a research perspective.
However, conventional encodings of a result's lineage are typically large, and even for Bag-PDBs, computing such statistics from lineage formulas still has a higher complexity than answering queries in a deterministic (i.e., non-probabilistic) database.
In this paper, we formally prove this limitation of PDBs, and address it by proposing an approximation algorithm that, to the best of our knowledge, is the first $(1-\epsilon)$-approximation for expectations of counts to have a runtime within a constant factor of deterministic query processing.
@ -19,10 +19,10 @@ Thus, the expectation of the multiplicity is the expectation of this polynomial.
Lineage in Set-PDBs is typically encoded in disjunctive normal form.
This representation is significantly larger than the query result sans lineage.
However, even with alternative encodings~\cite{DBLP:journals/vldb/FinkHO13}, the limiting factor in computing marginal probabilities remains the probability computation itself, and not the lineage formula.
However, even with alternative encodings~\cite{FH13}, the limiting factor in computing marginal probabilities remains the probability computation itself, and not the lineage formula.
The corresponding lineage encoding for Bag-PDBs is a polynomial in sum of products (SOP) form --- a sum of `clauses', each of which is the product of a set of integer or variable atoms.
Thanks to linearity of expectation, computing the expectation of a count query is linear in the number of clauses in the SOP polynomial.
Unlike Set-PDBs, however, when we consider compressed representations of this polynomial, the complexity landscape becomes much more nuanced and is \textit{not} linear in general.
Unlike Set-PDBs, however, when we consider compressed representations of this polynomial, the complexity landscape becomes much more nuanced and is \textit{not} linear in general.
Such compressed representations like Factorized Databases~\cite{10.1145/3003665.3003667,DBLP:conf/tapp/Zavodny11} or Arithmetic/Polynomial Circuits~\cite{arith-complexity}, are analogous to deterministic query optimizations (e.g. pushing down projections)~\cite{DBLP:conf/pods/KhamisNR16,10.1145/3003665.3003667}.
Thus, measuring the performance of a PDB algorithm in terms of the size of the \emph{compressed} lineage formula allows us to more closely relate the algorithm's performance to the complexity of query evaluation in a deterministic database.
@ -30,7 +30,7 @@ The initial picture is not good.
In this paper, we prove that computing expected counts is \emph{not} linear in the size of a compressed --- specifically a factorized~\cite{10.1145/3003665.3003667} --- lineage polynomial by reduction from counting $k$-matchings.
Thus, even bag PDBs do not enjoy the same computational complexity as deterministic databases.
This motivates our second goal, a linear time approximation algorithm for computing expected counts in a bag database, with complexity linear in the size of a factorized lineage formula.
As we will show, the size of the factorized
As we will show, the size of the factorized
lineage formula for a query --- and by extension, our approximation algorithm --- is proportional to the complexity of evaluating the same query on a comparable deterministic database instance~\cite{DBLP:conf/pods/KhamisNR16,10.1145/3003665.3003667}.
In other words, our approximation algorithm can estimate expected multiplicities for tuples in the result of an SPJU query with a complexity comparable to deterministic query-processing.
@ -98,11 +98,11 @@ In other words, our approximation algorithm can estimate expected multiplicities
%\end{figure}
\begin{Example}\label{ex:intro}
Consider the Tuple Independent ($\ti$) Set-PDB\footnote{Our work also handles Block Independent Disjoint Databases ($\bi$)~\cite{DBLP:conf/sigmod/BoulosDMMRS05,DBLP:series/synthesis/2011Suciu}, we return to this model later.} given in \Cref{fig:intro-ex} with two input relations $R$ and $E$.
Consider the Tuple Independent ($\ti$) Set-PDB\footnote{Our work also handles Block Independent Disjoint Databases ($\bi$)~\cite{BD05,DBLP:series/synthesis/2011Suciu}, we return to this model later.} given in \Cref{fig:intro-ex} with two input relations $R$ and $E$.
Each input tuple is assigned an annotation (attribute $\Phi$): an independent random Boolean variable ($W_i$) or the constant $\top$.
Each assignment of values to variables ($\{\;W_a,W_b,W_c\;\}\mapsto \{\;\top,\bot\;\}$) \SF{Do we need to state the meaning of $\top$ and $\bot$? Also do we want to add bag annotation to Figure 1 too since we are discussing both sets and bags later?} identifies one \emph{possible world}, a deterministic database instance that contains exactly the tuples annotated by the constant $\top$ or by a variable assigned to $\top$.
The probability of this world is the joint probability of the corresponding assignments.
For example, let $P[W_a] = P[W_b] = P[W_c] = p$ and consider the possible world where $R = \{\;\tuple{a}, \tuple{b}\;\}$.
The probability of this world is the joint probability of the corresponding assignments.
For example, let $P[W_a] = P[W_b] = P[W_c] = p$ and consider the possible world where $R = \{\;\tuple{a}, \tuple{b}\;\}$.
The corresponding variable assignment is $\{\;W_a \mapsto \top, W_b \mapsto \top, W_c \mapsto \bot\;\}$, and the probability of this world is $P[W_a]\cdot P[W_b] \cdot P[\neg W_c] = p\cdot p\cdot (1-p)=p^2-p^3$.
\end{Example}
@ -112,7 +112,7 @@ Without loss of generality, we assume that input relations are sets (i.e. $Dom(W
We contrast bag and set query evaluation with the following example:
\begin{Example}\label{ex:bag-vs-set}
Continuing the prior example, we are given the following Boolean (resp,. count) query
Continuing the prior example, we are given the following Boolean (resp,. count) query
$$\poly() :- R(A), E(A, B), R(B)$$
The lineage of the result in a Set-PDB (resp., Bag-PDB) is a Boolean (resp., polynomial) formula over random variables annotating the input relations (i.e., $W_a$, $W_b$, $W_c$).
Because the Boolean query has only a nullary relation, we write $Q(\cdot)$ to denote the function mapping variable assignments to a concrete value for the lineage in the corresponding possible world:
@ -138,7 +138,7 @@ P[\poly_{set}] &= \sum_{w_i \in \{\top,\bot\}} \mu(\poly_{set}(w_a, w_b, w_c))P[
\end{Example}
Note that the query of \Cref{ex:bag-vs-set} in set semantics is indeed \sharpphard, since it non-hierarchical~\cite{10.1145/1265530.1265571}.
To see why computing this probability is hard, observe that the clauses of the disjunctive normal form Boolean lineage are neither independent nor disjoint, leading to e.g.~\cite{DBLP:journals/vldb/FinkHO13} the use of Shannon decomposition, which is at worst exponential in the size of the input.
To see why computing this probability is hard, observe that the clauses of the disjunctive normal form Boolean lineage are neither independent nor disjoint, leading to e.g.~\cite{FH13} the use of Shannon decomposition, which is at worst exponential in the size of the input.
% \begin{equation*}
% \expct\pbox{\poly(W_a, W_b, W_c)} = W_aW_b + W_a\overline{W_b}W_c + \overline{W_a}W_bW_c = 3\prob^2 - 2\prob^3
% \end{equation*}
@ -148,25 +148,25 @@ To see why computing this probability is hard, observe that the clauses of the d
%&W_aW_b \vee W_bW_c \vee W_cW_a
%= &W_a
%\end{align*}
Conversely, in Bag-PDBs, correlations between clauses of the SOP polynomial are not problematic thanks to linearity of expectation.
Conversely, in Bag-PDBs, correlations between clauses of the SOP polynomial are not problematic thanks to linearity of expectation.
The expectation computation over the output lineage is simply the sum of expectations of each clause.
For \Cref{ex:intro}, the expectation is simply
{\small
\begin{align*}
\expct\pbox{\poly(W_a, W_b, W_c)} &= \expct\pbox{W_aW_b} + \expct\pbox{W_bW_c} + \expct\pbox{W_cW_a}\\
\intertext{\normalsize
\intertext{\normalsize
In this particular lineage polynomial, all variables in each product clause are independent, so we can push expectations through.
}
&= \expct\pbox{W_a}\expct\pbox{W_b} + \expct\pbox{W_b}\expct\pbox{W_c} + \expct\pbox{W_c}\expct\pbox{W_a}
\end{align*}
}
Computing such expectations is indeed linear in the size of the SOP as the number of operations in the computation is \textit{exactly} the number of multiplication and addition operations of the polynomial.
As a further interesting feature of this example, note that $\expct\pbox{W_i} = P[W_i = 1]$, and so taking the same polynomial over the reals:
Computing such expectations is indeed linear in the size of the SOP as the number of operations in the computation is \textit{exactly} the number of multiplication and addition operations of the polynomial.
As a further interesting feature of this example, note that $\expct\pbox{W_i} = P[W_i = 1]$, and so taking the same polynomial over the reals:
\begin{multline}
\label{eqn:can-inline-probabilities-into-polynomial}
\expct\pbox{\poly_{bag}} = P[W_a = 1]P[W_b = 1] + P[W_b = 1]P[W_c = 1]\\
+ P[W_c = 1]P[W_a = 1]\\
= \poly_{bag}(P[W_a=1], P[W_b=1], P[W_c=1])
= \poly_{bag}(P[W_a=1], P[W_b=1], P[W_c=1])
\end{multline}
\begin{figure}[h!]
@ -207,12 +207,12 @@ As a further interesting feature of this example, note that $\expct\pbox{W_i} =
\end{figure}
\subsection{Superlinearity of Bag PDBs}
Moving forward, we focus exclusively on bags and drop the subscript from $\poly_{bag}$.
Moving forward, we focus exclusively on bags and drop the subscript from $\poly_{bag}$.
Consider the Cartesian product of $\poly$ with itself:
\begin{equation*}
\poly^2() := \rel(A), E(A, B), \rel(B),\; \rel(C), E(C, D), \rel(D)
\end{equation*}
For an arbitrary polynomial, it is known that there may exist equivalent compressed representations.
For an arbitrary polynomial, it is known that there may exist equivalent compressed representations.
One such compression is the factorized polynomial~\cite{10.1145/3003665.3003667}, where the polynomial is broken up into separate factors.
For example:
{\small
@ -239,7 +239,7 @@ Observe that under the assumption that $Dom(W_i) = \{0, 1\}$, it is generally tr
This property leads us to consider another structure related to $\poly$.
% \AH{I don't know if we want to include the following statement: \par \emph{ bags are only hard with self-joins }
% \par Atri suggests a proof in the appendix regarding this claim.}
For any polynomial $\poly(\vct{X})$, we define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in $\poly(\vct{X})$ to $1$.
For any polynomial $\poly(\vct{X})$, we define the \emph{reduced polynomial} $\rpoly(\vct{X})$ to be the polynomial obtained by setting all exponents $e > 1$ in $\poly(\vct{X})$ to $1$.
With $\poly^2$ as an example, we have:
\begin{align*}
\rpoly^2(W_a, W_b, W_c) =&\; W_aW_b + W_bW_c + W_cW_a + 2W_aW_bW_c + 2W_aW_bW_c\\
@ -255,7 +255,7 @@ In prior work on PDBs, where this encoding is implicitly assumed, computing the
In general however, compressed encodings of the polynomial can be exponentially smaller in $k$ for $k$-products --- the query $\poly^k$ obtained by taking the Cartesian product of $k$ copies of $\poly$ has a factorized encoding of size $6\cdot k$, while the SOP encoding is of size $2\cdot 3^k$.
This leads us to the \textbf{central question of this paper}:
\begin{quote}
{\em
{\em
Is it always the case that the expectation of an UCQ in a Bag-PDB can be computed in time linear in the size of the \emph{compressed} lineage polynomial?}
\end{quote}
If the answer is yes, then it is possible for Bag-PDBs to achieve performance competitive with deterministic databases.
@ -268,8 +268,8 @@ The answer, unfortunately, is no, and an approximation algorithm is required.
% The factorized output polynomial consists of a product of three identical three-way summations, while the SOP encoding is exponential --- $3^3$ clauses to be precise.
\subsection{Overview of our results and techniques}
Concretely, in this paper:
(i) We show that conjunctive queries over a bag-$\ti$ are hard (i.e., superlinear in the size of a compressed lineage encoding) by reduction from counting the number of $k$-matchings over an arbitrary graph;
Concretely, in this paper:
(i) We show that conjunctive queries over a bag-$\ti$ are hard (i.e., superlinear in the size of a compressed lineage encoding) by reduction from counting the number of $k$-matchings over an arbitrary graph;
(ii) We present an $(1-\epsilon)$-approximation algorithm for bag-$\ti$s and show that its complexity is linear in the size of the compressed lineage encoding;
(iii) We generalize the approximation algorithm to bag-$\bi$s, a more general model of probabilistic data;
(iv) We further generalize our results to higher moments, polynomial circuits, and prove RA+ queries, the processing time in approximation is within a constant factor of the same query processed deterministically.

View File

@ -340,6 +340,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\sharpphard}{\#P-hard\xspace}
\newcommand{\sharpwonehard}{\#W[1]-hard\xspace}
\newcommand{\ptime}{PTIME\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TERMINOLOGY AND ABBREVIATIONS

641
main.bib
View File

@ -1,50 +1,603 @@
misc{pdbench,
howpublished = {r̆lhttp://pdbench.sourceforge.net/},
note = {Accessed: 2020-12-15},
title = {pdbench}
}
@article{AF18,
author = {Arab, Bahareh and Feng, Su and Glavic, Boris and Lee, Seokki and Niu, Xing and Zeng, Qitian},
journal = {IEEE Data Eng. Bull.},
number = {1},
pages = {51--62},
title = {GProM - A Swiss Army Knife for Your Provenance Needs},
volume = {41},
year = {2018}
}
@inproceedings{10.1145/1265530.1265571,
author = {Dalvi, Nilesh and Suciu, Dan},
booktitle = {PODS},
numpages = {10},
pages = {293--302},
title = {The Dichotomy of Conjunctive Queries on Probabilistic Structures},
year = {2007}
}
@inproceedings{DBLP:conf/icde/OlteanuHK10,
author = {Dan Olteanu and
Jiewen Huang and
Christoph Koch},
booktitle = {ICDE},
pages = {145--156},
title = {Approximate confidence computation in probabilistic databases},
year = {2010}
}
@book{DBLP:series/synthesis/2011Suciu,
author = {Dan Suciu and
Dan Olteanu and
Christopher Ré and
Christoph Koch},
publisher = {Morgan \& Claypool Publishers},
title = {Probabilistic Databases},
year = {2011}
}
@inproceedings{feng:2019:sigmod:uncertainty,
author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver},
booktitle = {SIGMOD},
title = {Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers},
year = {2019}
}
@article{FH12,
author = {Fink, Robert and Han, Larisa and Olteanu, Dan},
journal = {PVLDB},
number = {5},
pages = {490--501},
title = {Aggregation in probabilistic databases via knowledge compilation},
volume = {5},
year = {2012}
}
@inproceedings{DBLP:conf/tapp/Zavodny11,
author = {Jakub Závodný},
booktitle = {TaPP},
editor = {Peter Buneman and
Juliana Freire},
title = {On Factorisation of Provenance Polynomials},
year = {2011}
}
@inproceedings{kennedy:2010:icde:pip,
author = {Kennedy, Oliver and Koch, Christoph},
booktitle = {ICDE},
title = {PIP: A Database System for Great and Small Expectations},
year = {2010}
}
@inproceedings{DBLP:conf/icde/AntovaKO07a,
author = {Lyublena Antova and
Christoph Koch and
Dan Olteanu},
booktitle = {ICDE},
editor = {Rada Chirkova and
Asuman Dogac and
M. Tamer Özsu and
Timos K. Sellis},
pages = {1479--1480},
title = {MayBMS: Managing Incomplete Information with Probabilistic World-Set
Decompositions},
year = {2007}
}
@misc{Antova_fastand,
author = {Lyublena Antova and Thomas Jansen and Christoph Koch and Dan Olteanu},
title = {Fast and Simple Relational Processing of Uncertain Data},
year = {}
}
@inproceedings{DBLP:conf/pods/KhamisNR16,
author = {Mahmoud Abo Khamis and
Hung Q. Ngo and
Atri Rudra},
booktitle = {PODS},
pages = {13--28},
title = {FAQ: Questions Asked Frequently},
year = {2016}
}
@article{10.1145/3003665.3003667,
author = {Olteanu, Dan and Schleich, Maximilian},
journal = {SIGMOD Rec.},
number = {2},
numpages = {12},
pages = {5--16},
title = {Factorized Databases},
volume = {45},
year = {2016}
}
@article{DBLP:journals/sigmod/GuagliardoL17,
author = {Paolo Guagliardo and
Leonid Libkin},
journal = {SIGMOD Rec.},
number = {3},
pages = {5--16},
title = {Correctness of SQL Queries on Databases with Nulls},
volume = {46},
year = {2017}
}
@inproceedings{DBLP:conf/vldb/AgrawalBSHNSW06,
author = {Parag Agrawal and
Omar Benjelloun and
Anish Das Sarma and
Chris Hayworth and
Shubha U. Nabar and
Tomoe Sugihara and
Jennifer Widom},
booktitle = {VLDB},
pages = {1151--1154},
title = {Trio: A System for Data, Uncertainty, and Lineage},
year = {2006}
}
@inproceedings{k-match,
author = {Radu Curticapean},
booktitle = {Automata, Languages, and Programming - 40th International Colloquium,
ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I},
editor = {Fedor V. Fomin and
Rusins Freivalds and
Marta Z. Kwiatkowska and
David Peleg},
pages = {352--363},
title = {Counting Matchings of Size k Is W[1]-Hard},
volume = {7965},
year = {2013}
}
@inproceedings{DBLP:conf/sigmod/SinghMMPHS08,
author = {Sarvjeet Singh and
Chris Mayfield and
Sagar Mittal and
Sunil Prabhakar and
Susanne E. Hambrusch and
Rahul Shah},
booktitle = {SIGMOD},
pages = {1239--1242},
title = {Orion 2.0: native support for uncertain data},
year = {2008}
}
@inproceedings{DBLP:conf/pods/GreenKT07,
Acmid = {1265535},
Author = {Green, Todd J. and Karvounarakis, Grigoris and Tannen, Val},
Bdskurla = {http://doi.acm.org/10.1145/1265530.1265535},
Bdskurlb = {http://dx.doi.org/10.1145/1265530.1265535},
Booktitle = {PODS},
Dateadded = {2014-07-11 22:27:00 +0000},
Datemodified = {2014-07-11 22:27:00 +0000},
Doi = {10.1145/1265530.1265535},
Isbn = {978-1-59593-685-1},
Keywords = {data lineage, data provenance, datalog, formal power series, incomplete databases, probabilistic databases, semirings},
Location = {Beijing, China},
Numpages = {10},
Pages = {31--40},
Title = {Provenance Semirings},
Url = {http://doi.acm.org/10.1145/1265530.1265535},
Year = {2007},
BdskUrla = {http://doi.acm.org/10.1145/1265530.1265535},
BdskUrlb = {http://dx.doi.org/10.1145/1265530.1265535}
author = {Todd J. Green and
Gregory Karvounarakis and
Val Tannen},
booktitle = {PODS},
pages = {31--40},
title = {Provenance semirings},
year = {2007}
}
@article{FH12,
Author = {Fink, Robert and Han, Larisa and Olteanu, Dan},
Date-Added = {2014-03-27 19:13:26 +0000},
pdf = {/Users/lord_pretzel/Documents/PaperGit/Papers/FHO12_Aggregation in probabilistic databases via knowledge compilation.pdf},
Date-Modified = {2014-03-27 19:13:29 +0000},
Journal = {Proceedings of the VLDB Endowment},
Number = {5},
Pages = {490--501},
Publisher = {VLDB Endowment},
Title = {Aggregation in probabilistic databases via knowledge compilation},
Volume = {5},
Year = {2012},
Bdsk-File-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhZYWxpYXNEYXRhXxBdLi4vUGFwZXJHaXQvUGFwZXJzL0ZITzEyX0FnZ3JlZ2F0aW9uIGluIHByb2JhYmlsaXN0aWMgZGF0YWJhc2VzIHZpYSBrbm93bGVkZ2UgY29tcGlsYXRpb24ucGRmTxECYgAAAAACYgACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAAAAAAEJEAAH/////H0ZITzEyX0FnZ3JlZ2F0aW9uICNGRkZGRkZGRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAQADAAAKIGN1AAAAAAAAAAAAAAAAAAZQYXBlcnMAAgB5LzpVc2Vyczpsb3JkX3ByZXR6ZWw6RG9jdW1lbnRzOlBhcGVyR2l0OlBhcGVyczpGSE8xMl9BZ2dyZWdhdGlvbiBpbiBwcm9iYWJpbGlzdGljIGRhdGFiYXNlcyB2aWEga25vd2xlZGdlIGNvbXBpbGF0aW9uLnBkZgAADgCWAEoARgBIAE8AMQAyAF8AQQBnAGcAcgBlAGcAYQB0AGkAbwBuACAAaQBuACAAcAByAG8AYgBhAGIAaQBsAGkAcwB0AGkAYwAgAGQAYQB0AGEAYgBhAHMAZQBzACAAdgBpAGEAIABrAG4AbwB3AGwAZQBkAGcAZQAgAGMAbwBtAHAAaQBsAGEAdABpAG8AbgAuAHAAZABmAA8AGgAMAE0AYQBjAGkAbgB0AG8AcwBoACAASABEABIAd1VzZXJzL2xvcmRfcHJldHplbC9Eb2N1bWVudHMvUGFwZXJHaXQvUGFwZXJzL0ZITzEyX0FnZ3JlZ2F0aW9uIGluIHByb2JhYmlsaXN0aWMgZGF0YWJhc2VzIHZpYSBrbm93bGVkZ2UgY29tcGlsYXRpb24ucGRmAAATAAEvAAAVAAIAE///AAAACAANABoAJACEAAAAAAAAAgEAAAAAAAAABQAAAAAAAAAAAAAAAAAAAuo=},
@article{factorized-db,
author = {Dan Olteanu and
Maximilian Schleich},
journal = {SIGMOD Rec.},
number = {2},
pages = {5--16},
title = {Factorized Databases},
volume = {45},
year = {2016}
}
@article{FH12,
Author = {Fink, Robert and Han, Larisa and Olteanu, Dan},
Date-Added = {2014-03-27 19:13:26 +0000},
pdf = {/Users/lord_pretzel/Documents/PaperGit/Papers/FHO12_Aggregation in probabilistic databases via knowledge compilation.pdf},
Date-Modified = {2014-03-27 19:13:29 +0000},
Journal = {Proceedings of the VLDB Endowment},
Number = {5},
Pages = {490--501},
Publisher = {VLDB Endowment},
Title = {Aggregation in probabilistic databases via knowledge compilation},
Volume = {5},
Year = {2012},
Bdsk-File-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhZYWxpYXNEYXRhXxBdLi4vUGFwZXJHaXQvUGFwZXJzL0ZITzEyX0FnZ3JlZ2F0aW9uIGluIHByb2JhYmlsaXN0aWMgZGF0YWJhc2VzIHZpYSBrbm93bGVkZ2UgY29tcGlsYXRpb24ucGRmTxECYgAAAAACYgACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAAAAAAEJEAAH/////H0ZITzEyX0FnZ3JlZ2F0aW9uICNGRkZGRkZGRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAQADAAAKIGN1AAAAAAAAAAAAAAAAAAZQYXBlcnMAAgB5LzpVc2Vyczpsb3JkX3ByZXR6ZWw6RG9jdW1lbnRzOlBhcGVyR2l0OlBhcGVyczpGSE8xMl9BZ2dyZWdhdGlvbiBpbiBwcm9iYWJpbGlzdGljIGRhdGFiYXNlcyB2aWEga25vd2xlZGdlIGNvbXBpbGF0aW9uLnBkZgAADgCWAEoARgBIAE8AMQAyAF8AQQBnAGcAcgBlAGcAYQB0AGkAbwBuACAAaQBuACAAcAByAG8AYgBhAGIAaQBsAGkAcwB0AGkAYwAgAGQAYQB0AGEAYgBhAHMAZQBzACAAdgBpAGEAIABrAG4AbwB3AGwAZQBkAGcAZQAgAGMAbwBtAHAAaQBsAGEAdABpAG8AbgAuAHAAZABmAA8AGgAMAE0AYQBjAGkAbgB0AG8AcwBoACAASABEABIAd1VzZXJzL2xvcmRfcHJldHplbC9Eb2N1bWVudHMvUGFwZXJHaXQvUGFwZXJzL0ZITzEyX0FnZ3JlZ2F0aW9uIGluIHByb2JhYmlsaXN0aWMgZGF0YWJhc2VzIHZpYSBrbm93bGVkZ2UgY29tcGlsYXRpb24ucGRmAAATAAEvAAAVAAIAE///AAAACAANABoAJACEAAAAAAAAAgEAAAAAAAAABQAAAAAAAAAAAAAAAAAAAuo=},
@inproceedings{ngo-survey,
author = {Hung Q. Ngo},
booktitle = {Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles
of Database Systems, Houston, TX, USA, June 10-15, 2018},
editor = {Jan Van den Bussche and
Marcelo Arenas},
pages = {111--124},
title = {Worst-Case Optimal Join Algorithms: Techniques, Results, and Open
Problems},
year = {2018}
}
@article{skew,
author = {Hung Q. Ngo and
Christopher Ré and
Atri Rudra},
journal = {SIGMOD Rec.},
number = {4},
pages = {5--16},
title = {Skew strikes back: new developments in the theory of join algorithms},
volume = {42},
year = {2013}
}
@article{NPRR,
author = {Hung Q. Ngo and
Ely Porat and
Christopher Ré and
Atri Rudra},
journal = {J. ACM},
number = {3},
pages = {16:1--16:40},
title = {Worst-case Optimal Join Algorithms},
volume = {65},
year = {2018}
}
@book{arith-complexity,
author = {Peter Bürgisser and
Michael Clausen and
Mohammad Amin Shokrollahi},
publisher = {Springer},
title = {Algebraic complexity theory},
volume = {315},
year = {1997}
}
@inproceedings{triang-hard,
author = {Tsvi Kopelowitz and
Virginia Vassilevska Williams},
booktitle = {ICALP},
editor = {Artur Czumaj and
Anuj Dawar and
Emanuela Merelli},
pages = {74:1--74:16},
title = {Towards Optimal Set-Disjointness and Set-Intersection Data Structures},
volume = {168},
year = {2020}
}
@article{LL97,
author = {Lakshmanan, L.V.S. and Leone, N. and Ross, R. and Subrahmanian, VS},
journal = {TODS},
number = {3},
pages = {419--469},
title = {Probview: A flexible probabilistic database system},
volume = {22},
year = {1997}
}
@article{jha-13-kcmdt,
author = {Jha, Abhay and Suciu, Dan},
title = {Knowledge Compilation Meets Database Theory: Compiling Queries
To Decision Diagrams},
journal = {Theory of Computing Systems},
volume = 52,
number = 3,
pages = {403--440},
year = 2013,
publisher = {Springer},
}
@inproceedings{BS06,
author = {Omar Benjelloun and Anish Das Sarma and Alon Y. Halevy and Jennifer Widom},
booktitle = {VLDB},
pages = {953--964},
title = {ULDBs: Databases with Uncertainty and Lineage},
year = {2006}
}
@conference{RS07,
author = {Ré, C. and Suciu, D.},
booktitle = {VLDB},
pages = {51--62},
title = {Materialized views in probabilistic databases: for information exchange and query optimization},
year = {2007}
}
@article{VS17,
Author = {Van den Broeck, Guy and Suciu, Dan},
Title = {Query Processing on Probabilistic Data: A Survey},
Year = {2017},
}
@incollection{GT06,
author = {Green, Todd J and Tannen, Val},
booktitle = {EDBT},
pages = {278--296},
title = {Models for incomplete and probabilistic information},
year = {2006}
}
@article{IL84a,
author = {Imieli\'nski, Tomasz and Lipski Jr, Witold},
journal = {JACM},
number = {4},
pages = {761--791},
title = {Incomplete Information in Relational Databases},
volume = {31},
year = {1984}
}
@article{DS12,
author = {Dalvi, Nilesh and Suciu, Dan},
journal = {JACM},
number = {6},
pages = {30},
title = {The dichotomy of probabilistic inference for unions of conjunctive queries},
volume = {59},
year = {2012}
}
@inproceedings{heuvel-19-anappdsd,
author = {Maarten Van den Heuvel and Peter Ivanov and Wolfgang Gatterbauer and Floris Geerts and Martin Theobald},
booktitle = {SIGMOD},
pages = {1295--1312},
title = {Anytime Approximation in Probabilistic Databases via Scaled Dissociations},
year = {2019}
}
@article{AB15,
author = {Amarilli, Antoine and Bourhis, Pierre and Senellart, Pierre},
journal = {PODS},
title = {Probabilities and provenance via tree decompositions},
year = {2015}
}
@inproceedings{OH09a,
author = {Olteanu, Dan and Huang, Jiewen},
booktitle = {SIGMOD},
pages = {389--402},
title = {Secondary-storage confidence computation for conjunctive queries with inequalities},
year = {2009}
}
@article{OS16,
author = {Olteanu, Dan and Schleich, Maximilian},
journal = {SIGMOD Record},
number = {2},
pages = {5--16},
title = {Factorized Databases},
volume = {45},
year = {2016}
}
@article{FO16,
author = {Robert Fink and Dan Olteanu},
journal = {TODS},
number = {1},
pages = {4:1--4:47},
title = {Dichotomies for Queries with Negation in Probabilistic Databases},
volume = {41},
year = {2016}
}
@article{FH13,
author = {Robert Fink and Jiewen Huang and Dan Olteanu},
journal = {VLDBJ},
number = {6},
pages = {823--848},
title = {Anytime approximation in probabilistic databases},
volume = {22},
year = {2013}
}
@inproceedings{AB15c,
author = {Antoine Amarilli and Pierre Bourhis and Pierre Senellart},
booktitle = {Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II},
pages = {56--68},
title = {Provenance Circuits for Trees and Treelike Instances},
year = {2015}
}
@inproceedings{kenig-13-nclexpdc,
author = {Batya Kenig and Avigdor Gal and Ofer Strichman},
booktitle = {SUM},
editor = {Weiru Liu and V. S. Subrahmanian and Jef Wijsen},
pages = {219--232},
title = {A New Class of Lineage Expressions over Probabilistic Databases Computable in P-Time},
volume = {8078},
year = {2013}
}
@inproceedings{cavallo-87-tpd,
author = {Roger Cavallo and Michael Pittarelli},
booktitle = {VLDB},
editor = {Peter M. Stocker and William Kent and Peter Hammersley},
pages = {71--81},
title = {The Theory of Probabilistic Databases},
year = {1987}
}
@inproceedings{roy-11-f,
author = {Sudeepa Roy and Vittorio Perduca and Val Tannen},
booktitle = {ICDT},
editor = {Tova Milo},
pages = {232--243},
title = {Faster query answering in probabilistic databases using read-once functions},
year = {2011}
}
@article{sen-10-ronfqevpd,
author = {Prithviraj Sen and Amol Deshpande and Lise Getoor},
journal = {PVLDB},
number = {1},
pages = {1068--1079},
title = {Read-Once Functions and Query Evaluation in Probabilistic Databases},
volume = {3},
year = {2010}
}
@article{provan-83-ccccptg,
author = {J. Scott Provan and Michael O. Ball},
journal = {SIAM J. Comput.},
number = {4},
pages = {777--788},
title = {The Complexity of Counting Cuts and of Computing the Probability That a Graph Is Connected},
volume = {12},
year = {1983}
}
@article{valiant-79-cenrp,
author = {Leslie G. Valiant},
journal = {SIAM J. Comput.},
number = {3},
pages = {410--421},
title = {The Complexity of Enumeration and Reliability Problems},
volume = {8},
year = {1979}
}
@inproceedings{AD11d,
author = {Amsterdamer, Yael and Deutch, Daniel and Tannen, Val},
booktitle = {PODS},
pages = {153--164},
title = {Provenance for Aggregate Queries},
year = {2011}
}
@article{S18a,
author = {Senellart, Pierre},
journal = {SIGMOD Record},
number = {4},
pages = {5--15},
title = {Provenance and Probabilities in Relational Databases},
volume = {46},
year = {2018}
}
@article{RS09b,
author = {Christopher Ré and Dan Suciu},
journal = {VLDBJ},
number = {5},
pages = {1091--1116},
title = {The trichotomy of HAVING queries on a probabilistic database},
volume = {18},
year = {2009}
}
@article{gatterbauer-17-dpaplinws,
author = {Wolfgang Gatterbauer and Dan Suciu},
title = {Dissociation and Propagation for Approximate Lifted Inference
With Standard Relational Database Management Systems},
journal = {{VLDB} J.},
volume = 26,
number = 1,
pages = {5--30},
year = 2017,
doi = {10.1007/s00778-016-0434-5},
url = {https://doi.org/10.1007/s00778-016-0434-5},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/journals/vldb/GatterbauerS17.bib},
timestamp = {Sun, 02 Jun 2019 20:52:24 +0200},
}
@inproceedings{fink-11,
author = {Robert Fink and Dan Olteanu},
booktitle = {ICDT},
editor = {Tova Milo},
pages = {174--185},
title = {On the optimal approximation of queries using tractable propositional languages},
year = {2011}
}
@article{jha-12-pdwm,
author = {Abhay Kumar Jha and Dan Suciu},
title = {Probabilistic Databases With Markoviews},
journal = {Proc. {VLDB} Endow.},
volume = {5},
number = {11},
pages = {1160--1171},
year = {2012},
doi = {10.14778/2350229.2350236},
url = {https://doi.org/10.14778/2350229.2350236},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/journals/pvldb/JhaS12.bib},
timestamp = {Sat, 25 Apr 2020 13:59:35 +0200},
}
@conference{BD05,
author = {Boulos, J. and Dalvi, N. and Mandhani, B. and Mathur, S. and Re, C. and Suciu, D.},
booktitle = {SIGMOD},
pages = {891--893},
title = {MYSTIQ: a system for finding more answers by using probabilities},
year = {2005}
}
@article{DS07,
author = {Dalvi, N. and Suciu, D.},
journal = {VLDB},
number = {4},
pages = {544},
title = {Efficient query evaluation on probabilistic databases},
volume = {16},
year = {2007}
}
@inproceedings{re-07-eftqevpd,
author = {Christopher Ré and Nilesh N. Dalvi and Dan Suciu},
booktitle = {ICDE},
editor = {Rada Chirkova and Asuman Dogac and M. Tamer Özsu and Timos K. Sellis},
pages = {886--895},
title = {Efficient Top-k Query Evaluation on Probabilistic Data},
year = {2007}
}
@inproceedings{DM14c,
author = {Deutch, Daniel and Milo, Tova and Roy, Sudeepa and Tannen, Val},
booktitle = {ICDT},
pages = {201--212},
title = {Circuits for Datalog Provenance},
year = {2014}
}
@proceedings{DBLP:conf/iccad/1993,
editor = {Michael R. Lightner and
Jochen A. G. Jess},
title = {Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided
Design, 1993, Santa Clara, California, USA, November 7-11, 1993},
year = {1993}
}
@inproceedings{bahar-93-al,
author = {R. Iris Bahar and Erica A. Frohm and Charles M. Gaona and Gary
D. Hachtel and Enrico Macii and Abelardo Pardo and Fabio
Somenzi},
booktitle = {Proceedings of the 1993 IEEE/ACM International Conference on
Computer-Aided Design, 1993, Santa Clara, California, USA,
November 7-11, 1993},
pages = {188--191},
title = {Algebraic decision diagrams and their applications},
year = {1993}
}
@proceedings{DBLP:conf/uai/2013,
editor = {Ann Nicholson and
Padhraic Smyth},
title = {Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial
Intelligence, UAI 2013, Bellevue, WA, USA, August 11-15, 2013},
year = {2013}
}
@inproceedings{gogate-13-smp,
author = {Vibhav Gogate and Pedro M. Domingos},
booktitle = {Proceedings of the Twenty-Ninth Conference on Uncertainty in
Artificial Intelligence, UAI 2013, Bellevue, WA, USA, August
11-15, 2013},
title = {Structured Message Passing},
year = {2013}
}
@article{chen-10-cswssr,
author = {Hubie Chen and Martin Grohe},
journal = {J. Comput. Syst. Sci.},
number = {8},
pages = {847--860},
title = {Constraint Satisfaction With Succinctly Specified Relations},
volume = {76},
year = {2010}
}

View File

@ -178,7 +178,7 @@ sensitive=true
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliographystyle{plain}
\bibliography{aaron,atri}
\bibliography{main}

View File

@ -108,7 +108,7 @@ By definition, $\rpoly_{G}^{\kElem}(\vct{X})$ sets every exponent $e > 1$ to $e
\rpoly_{G}^{\kElem}(\prob,\ldots, \prob) = \sum_{i = 0}^{2\kElem} c_i \prob^i
\end{equation*}
We note that $c_i$ is {\em exactly} the number of monomials in the SOP\BG{\abbrSMB?} expansion of $\poly_{G}^{\kElem}(\vct{X})$ composed of $i$ distinct variables.%, with $\prob$ substituted for each distinct variable
We note that $c_i$ is {\em exactly} the number of monomials in the \abbrSMB expansion of $\poly_{G}^{\kElem}(\vct{X})$ composed of $i$ distinct variables.%, with $\prob$ substituted for each distinct variable
\footnote{Since $\rpoly_G^\kElem(\vct{X})$ does not have any monomial with degree $< 2$, it is the case that $c_0 = c_1 = 0$ but for the sake of simplcity we will ignore this observation.}
Given that we then have $2\kElem + 1$ distinct values of $\rpoly_{G}^\kElem(\prob,\ldots, \prob)$ for $0\leq i\leq2\kElem$, it follows that

View File

@ -1,5 +1,33 @@
\section{Related Work}\label{sec:related-work}
In addition to work on probabilistic databases, our work has connections to work on compact representations of polynomials and relies on past work in fine-grained complexity.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Probabilistic Databases}\label{sec:prob-datab}
Probabilistic databases have been studied predominantly under set semantics.
A multitude of probabilistic data models have been proposed for encoding a probabilistic database more compactly than as its set of possible worlds. Tuple-independent databases~ consist of a classical database where each tuple associated with a probability and tuples are treated as independent probabilistic events. In spite of the inability to encode correlations, \tis have received much attention, because it was shown that any finite probabilistic database can be encode as a \ti and a set of constraints that ``condition'' the \ti~\cite{VS17}. Block-independent databases (\bis) generalize \tis by partitioning the input into blocks where tuples within each block as disjoint events and blocks are independent~\cite{RS07,BS06}. \emph{PC-tables}~\cite{GT06} pair a C-table~\cite{IL84a} with probability distribution for each of its variables. This is similar to the $\semNX$-PDBs we use here, except that we do not allow for variables as attribute values and instead of local conditions which are propositional formulas which may contain comparisons, we associate tuples with polynomials $\semNX$.
Approaches for probabilistic query processing, i.e., computing the marginal probability for each result tuple of a query over a probabilistic database, fall into two broad categories. \emph{Intensional} (or \emph{grounded}) query evaluation approaches compute the \emph{lineage} of a tuple which is a Boolean formula encoding the provenance of the tuple and then compute the probability of the lineage formula. In this paper we also focus on intensional query evaluation, but use polynomials instead of boolean formulas to deal with multisets. It is a well-known fact that computing the probability of a tuple in the result of a query over a probabilistic database (the \emph{marginal probability of a tuple}) is \sharpphard which can be proven through a reduction from weighted model counting~\cite{provan-83-ccccptg,valiant-79-cenrp} using the fact the the probability of a tuple's lineage formula is equal to the marginal probability of the tuple. The second category, \emph{extensional} query evaluation, avoids calculating the lineage. This approach is in \ptime, but is limited to certain classes of queries. Dalvi et al.~\cite{DS12} proved a dichotomy for unions of conjunctive queries (UCQs): for any UCQ the probabilistic query evaluation problem is either \sharpphard or \ptime. Olteanu et al.~\cite{FO16} presented dichotomies for two classes of queries with negation, R\'e et al~\cite{RS09b} present a trichotomy for HAVING queries. Amarilli et al. investigated tractable classes of databases for more complex queries~\cite{AB15,AB15c}. Another line of work, studies which structural properties of lineage formulas lead to tractable cases~\cite{kenig-13-nclexpdc,roy-11-f,sen-10-ronfqevpd}.
Several techniques for approximating the probability of a query result tuple have been proposed in related work~\cite{FH13,heuvel-19-anappdsd,DBLP:conf/icde/OlteanuHK10,DS07,re-07-eftqevpd}. These approaches either rely on Monte Carlo sampling, e.g., \cite{DS07,re-07-eftqevpd}, or a branch-and-bound paradigm~\cite{DBLP:conf/icde/OlteanuHK10,fink-11}. The approximation algorithm for bag expectation we present in this work is based on sampling.
Fink et al.~\cite{FH12} study aggregate queries over a probabilistic version of the extension of K-relations for aggregate queries proposed in~\cite{AD11d} (this data model is referred to as \emph{pvc-tables}). As an extension of K-relations, this approach supports bags. Probabilities are computed using a decomposition approach~\cite{DBLP:conf/icde/OlteanuHK10} over the symbolic expressions that are used as tuple annotations and values in pvc-tables. \cite{FH12} identifies a tractable class of queries involving aggregation. In contrast, we study a less general data model and query class, but provide a linear time approximation algorithm and provide new insights into the complexity of computing expectation (while \cite{FH12} computes probabilities for individual output annotations).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Compact Representations of Polynomials and Boolean Formulas}\label{sec:comp-repr-polyn}
There is a large body of work on compact using representations of Boolean formulas (e.g, various types of circuits including OBDDs~\cite{jha-12-pdwm}) and polynomials (e.g.,factorizations~\cite{OS16,DBLP:conf/tapp/Zavodny11}) some of which have been utilized for probabilistic query processing, e.g.,~\cite{jha-12-pdwm}. Compact representations of Boolean formulas for which probabilities can be computed in linear time include OBDDs, SDDs, d-DNNF, and FBDD. In terms of circuits over semiring expression,~\cite{DM14c} studies circuits for absorptive semirings while~\cite{S18a} studies circuits that include negation (expressed as the monus operation of a semiring). Algebraic Decision Diagrams~\cite{bahar-93-al} (ADDs) generalize BDDs to variables with more than two values. Chen et al.~\cite{chen-10-cswssr} introduced the generalized disjunctive normal form.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Parameterized Complexity Theory}\label{sec:param-compl-theory}
In~\Cref{sec:hard}, we utilized common conjectures from fine-grained complexity theory.
\BG{ATRI: Parameterized complexity discussion}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"

View File

@ -26,15 +26,18 @@ If we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ exactly in $T(\numedge)$ tim
in $O\inparen{T(\numedge) + \numedge}$ time.
\end{Theorem}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We now use \Cref{th:single-p} to prove \Cref{th:single-p-hard}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{proof}[Proof of \Cref{th:single-p}]
\begin{proof}[Proof of \Cref{th:single-p-hard}]
For the sake of contradiction, let us assume that for any $G$, we can compute $\rpoly_{G}^3(\prob,\dots,\prob)$ in $o\inparen{m^{1+\eps_0}}$ time.
Let $G$ be the input graph. It is easy to see that one can compute the expression tree for $\poly_{G}^3(\vct{X})$ in $O(m)$ time. Then by \Cref{th:single-p} we can compute $\numocc{G}{\tri}$, $\numocc{G}{\threepath}$ and $\numocc{G}{\threedis}$ in further time $o\inparen{m^{1+\eps_0}}+O(m)$. Thus, the overall, reduction takes $o\inparen{m^{1+\eps_0}}+O(m)= o\inparen{m^{1+\eps_0}}$ time, which violates \Cref{conj:graph}.
\end{proof}
\qed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Before moving on to prove \Cref{th:single-p-hard}, let us state the results, lemmas and defintions that will be useful in the proof.
Before moving on to prove \Cref{th:single-p}, let us state the results, lemmas and defintions that will be useful in the proof.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Preliminaries and Notation}
@ -73,7 +76,7 @@ For any graph $G$, the following formulas for $\numocc{G}{H}$ for their respecti
\subsubsection{The proofs}
Note that $\rpoly_{G}^3(\prob,\ldots, \prob)$ as a polynomial in $\prob$ has degree at most six. Next, we figure out the exact coefficients since this would be useful in our arguments:
Note that $\rpoly_{G}^3(\prob,\ldots, \prob)$ as a polynomial in $\prob$ has degree at most six. Next, we figure out the exact coefficients since this would be useful Hin our arguments:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Lemma}\label{lem:qE3-exp}
%When we expand $\poly_{G}^3(\vct{X})$ out and assign all exponents $e \geq 1$ a value of $1$, we have the following result,
@ -99,7 +102,7 @@ Let $e_1 = (i_1, j_1), e_2 = (i_2, j_2), e_3 = (i_3, j_3)$. Notice that each ex
This implies that all $3 + 3 = 6$ combinations of two distinct edges $e$ and $e'$ contribute to the same monomial in $\rpoly_{G}^3$. % consist of the same monomial in $\rpoly$, i.e. $(e_1, e_1, e_2)$ is the same as $(e_2, e_1, e_2)$.
Since $e\ne e'$, this case produces the following edge patterns: $\twopath, \twodis$, which contribute $p^3$ and $p^4$ respectively to $\rpoly_{G}^3\left(\prob,\ldots, \prob\right)$.
\textsc{case 3:} All $e_1,e_2$ and $e_3$ are distinct. For this case, we have $3! = 6$ permutations of $(e_1, e_2, e_3)$, each of which contribute to a different monomial in the SOP expansion of $\poly_{G}^3(\vct{X})$. This case consists of the following edge patterns: $\tri, \oneint, \threepath, \twopathdis, \threedis$, which contribute $p^3,p^4,p^4,p^5$ and $p^6$ respectively to $\rpoly_{G}^3\left(\prob,\ldots, \prob\right)$.
\textsc{case 3:} All $e_1,e_2$ and $e_3$ are distinct. For this case, we have $3! = 6$ permutations of $(e_1, e_2, e_3)$, each of which contribute to a different monomial in the SOP (see \Cref{def:expand-tree}) expansion of $\poly_{G}^3(\vct{X})$. This case consists of the following edge patterns: $\tri, \oneint, \threepath, \twopathdis, \threedis$, which contribute $p^3,p^4,p^4,p^5$ and $p^6$ respectively to $\rpoly_{G}^3\left(\prob,\ldots, \prob\right)$.
\end{proof}
\qed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%