Notation changes started.

master
Aaron Huber 2021-06-09 12:42:26 -04:00
parent 742108ef26
commit 4fe79fd1c7
10 changed files with 334 additions and 307 deletions

Binary file not shown.

Binary file not shown.

View File

@ -140,7 +140,7 @@ level 2/.style={sibling distance=0.7cm},
\begin{algorithmic}[1]
\Require \circuit: Circuit
\Ensure \circuit: Annotated Circuit
\Ensure \vari{sum} $\in \reals$
\Ensure \vari{sum} $\in \domR$
\State $\circuit' \gets \reduce(\circuit)$
\For{\gate in \topord(\circuit')}\label{alg:one-pass-loop}\Comment{\topord($\cdot$) is the topological order of \circuit}
\If{\gate.\type $=$ \var}

View File

@ -5,8 +5,8 @@ The evaluation of $\abs{\circuit}(1,\ldots, 1)$ can be defined recursively, as f
\begin{align}
\label{eq:T-all-ones}
\abs{\circuit}(1,\ldots, 1) = \begin{cases}
\abs{\circuit_\linput}(1,\ldots, 1) \cdot \abs{\circuit_\rinput}(1,\ldots, 1) &\textbf{if }\circuit.\type = \revision{\circmult}\\
\abs{\circuit_\linput}(1,\ldots, 1) + \abs{\circuit_\rinput}(1,\ldots, 1) &\textbf{if }\circuit.\type = \revision{\circplus} \\
\abs{\circuit_\linput}(1,\ldots, 1) \cdot \abs{\circuit_\rinput}(1,\ldots, 1) &\textbf{if }\circuit.\type = \circmult\\
\abs{\circuit_\linput}(1,\ldots, 1) + \abs{\circuit_\rinput}(1,\ldots, 1) &\textbf{if }\circuit.\type = \circplus \\
|\circuit.\val| &\textbf{if }\circuit.\type = \tnum\\
1 &\textbf{if }\circuit.\type = \var.
\end{cases}

View File

@ -21,7 +21,7 @@ We now introduce useful definitions and notation related to circuits and polynom
\begin{Definition}[$\expansion{\circuit}$]\label{def:expand-circuit}
For a circuit $\circuit$, we define $\expansion{\circuit}$ as a list of tuples $(\monom, \coef)$, where $\monom$ is a set of variables and $\coef \in \reals$.
For a circuit $\circuit$, we define $\expansion{\circuit}$ as a list of tuples $(\monom, \coef)$, where $\monom$ is a set of variables and $\coef \in \domR$.
$\expansion{\circuit}$ has the following recursive definition ($\circ$ is list concatenation).
$\expansion{\circuit} =

View File

@ -6,10 +6,10 @@
A \emph{probabilistic database} $\pdb = (\idb, \pd)$ is set of deterministic databases $\idb = \{ \db_1, \ldots, \db_n\}$ called possible worlds, paired with a probability distribution $\pd$ over these worlds.
A well-studied problem in probabilistic databases is to take a query $\query$ and a probabilistic database $\pdb$, and compute the \emph{marginal probability} of a tuple $\tup$ (i.e., its probability of appearing in the result of query $\query$ over $\pdb$).
This problem is \sharpphard for set semantics, even for \emph{tuple-independent probabilistic databases}~\cite{DBLP:series/synthesis/2011Suciu} (TIDBs), which are a subclass of probabilistic databases where tuples are independent events. The dichotomy of Dalvi and Suciu~\cite{10.1145/1265530.1265571} separates the hard cases, from cases that are in \ptime for unions of conjunctive queries (UCQs).
In this work we consider bag semantics, where each tuple is associated with a multiplicity $\db_i(\tup)$ in each possible world $\db_i$ and study the analogous problem of computing the expectation of the multiplicity of a query result tuple $\tup$ (denoted $\query(\db)(t)$):
In this work we consider bag semantics, where each tuple is associated with a multiplicity $\db_i(\tup)$ in each possible world $\db_i$ and study the analogous problem of computing the expectation (where $\overline{\db}$ denotes a random variable) of the multiplicity of a query result tuple $\tup$ (denoted $\query(\db)(t)$):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{equation}\label{eq:intro-bag-expectation}
\expct_{\overline{\mathbf{D}} \sim \probDist}[\query(\overline{\mathbf{D}})(t)] = \sum_{\db \in \idb} \query(\db)(t) \cdot \pd(\db) \hspace{2cm}\text{\textbf{(Expected Result Multiplicity)}}
\expct_{\randDB \sim \pd}[\query(\overline{D})(t)] = \sum_{\db \in \idb} \query(\db)(t) \cdot \probOf\pbox{\db} \hspace{2cm}\text{\textbf{(Expected Result Multiplicity)}}
\end{equation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -130,7 +130,7 @@ A well-known result in probabilistic databases is that under set semantics, the
over random variables
($\vct{X}=(X_1,\dots,X_n)$)
that encode the existence of input tuples. Each possible world $\db$ corresponds to an assignment $\{0,1\}^\numvar$ of the variables in $\vct{X}$ to either true (the tuple exists in this world) or false (the tuple does not exist in this world). Importantly, the following holds: if the lineage formula for $t$ evaluates to true under the assignment for a world $\db$, then $\tup \in \query(\db)$.
Thus, the marginal probability of tuple $\tup$ is equal to the probability that its lineage evaluates to true (with respect to the obvious analog of probability distribution $\probDist$ defined over $\vct{X}$).
Thus, the marginal probability of tuple $\tup$ is equal to the probability that its lineage evaluates to true (with respect to the obvious analog of probability distribution $\pd$ defined over $\Omega$ and induced over $\vct{X}$).
For bag semantics, the lineage of a tuple is a polynomial over variables $\vct{X}=(X_1,\dots,X_n)$ with % \in \mathbb{N}^\numvar$ with
coefficients in the set of natural numbers $\mathbb{N}$ (an element of semiring $\mathbb{N}[\vct{X}]$).
@ -140,10 +140,10 @@ In later sections, where we focus on a single lineage polynomial, we will simply
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Example}\label{ex:intro-lineage}
Associating a lineage variable with every input tuple as shown in \Cref{fig:ex-shipping-simp}, we can compute the lineage of every result tuple as shown in \Cref{subfig:ex-shipping-simp-route}. For example, the tuple Chicago is in the result, because $L_b$ joins with both $R_b$ and $R_c$. Its lineage is $\Phi = L_b \cdot R_b + L_b \cdot R_c$. The expected multiplicity of this result tuple is calculated by summing the multiplicity of the result tuple, weighted by its probability, over all possible worlds.
Associating a lineage variable with every input tuple as shown in \Cref{fig:ex-shipping-simp}, we can compute the lineage of every result tuple as shown in \Cref{subfig:ex-shipping-simp-queries}. For example, the tuple Chicago is in the result, because $L_b$ joins with both $R_b$ and $R_c$. Its lineage is $\Phi = L_b \cdot R_b + L_b \cdot R_c$. The expected multiplicity of this result tuple is calculated by summing the multiplicity of the result tuple, weighted by its probability, over all possible worlds.
In this example, $\Phi$ is a sum of products (SOP), and so we can use linearity of expectation to solve the problem in linear time (in the size of $\Phi$).
The expectation of the sum is the sum of the expectations of monomials.
The expectation of each monomial is then computed by multiplying the probabilities of the variables (tuples) occurring in the monomial.
The expectation of each monomial is then computed by multiplying the probabilities of the variables (tuples) occurring in the monomial, since we have independence of tuples.
The expected multiplicity for Chicago is $1.0$.
\end{Example}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -187,7 +187,7 @@ The expectation $\expct\pbox{\Phi^2}$ then is:
\noindent If the domain of a random variable $W$ is $\{0, 1\}$, then for any $k > 0$, $\expct\pbox{W^k} = \expct\pbox{W}$, which means that $\expct\pbox{\Phi^2}$ simplifies to:
\begin{footnotesize}
\begin{equation*}
\expct\pbox{L_a^2}\expct\pbox{L_b} + \expct\pbox{L_b}\expct\pbox{L_d} + \expct\pbox{L_b}\expct\pbox{L_c} + 2\expct\pbox{L_a}\expct\pbox{L_b}\expct\pbox{L_d} + 2\expct\pbox{L_a}\expct\pbox{L_b}\expct\pbox{L_c} + 2\expct\pbox{L_b}\expct\pbox{L_d}\expct\pbox{L_c}
\expct\pbox{L_a}\expct\pbox{L_b} + \expct\pbox{L_b}\expct\pbox{L_d} + \expct\pbox{L_b}\expct\pbox{L_c} + 2\expct\pbox{L_a}\expct\pbox{L_b}\expct\pbox{L_d} + 2\expct\pbox{L_a}\expct\pbox{L_b}\expct\pbox{L_c} + 2\expct\pbox{L_b}\expct\pbox{L_d}\expct\pbox{L_c}
\end{equation*}
\end{footnotesize}
\noindent This property leads us to consider a structure related to the lineage polynomial.
@ -197,7 +197,7 @@ For any polynomial $\poly(\vct{X})$, define the \emph{reduced polynomial} $\rpol
With $\Phi^2$ as an example, we have:
\begin{align*}
\widetilde{\Phi^2}(L_a, L_b, L_c, L_d)
=&\; L_aL_b + L_bL_d + L_bW_c + 2L_aL_bL_d + 2L_aL_bL_c + 2L_bL_cL_d
=&\; L_aL_b + L_bL_d + L_bL_c + 2L_aL_bL_d + 2L_aL_bL_c + 2L_bL_cL_d
\end{align*}
It can be verified that the reduced polynomial is a closed form of the expected count (i.e., $\expct\pbox{\Phi^2} = \widetilde{\Phi^2}(\probOf\pbox{L_a=1}, \probOf\pbox{L_b=1}, \probOf\pbox{L_c=1}), \probOf\pbox{L_d=1})$). In fact, we show in \Cref{lem:exp-poly-rpoly} that this equivalence holds for {\em all} UCQs over TIDB/BIDB.

View File

@ -1,41 +1,201 @@
% -*- root: main.tex -*-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% NOTATION
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Circuits
\newcommand{\caseheading}[1]{\smallskip \noindent \textbf{#1}.~}
%%%%%
\newcommand{\wElem}{w} %an element of \vct{w}
\newcommand{\suchthat}{\;|\;} %such that
\newcommand{\kElem}{k}%the kth element
%RA-to-Poly Notation
\newcommand{\polyinput}[2]{\left(#1,\ldots, #2\right)}
\newcommand{\numvar}{n}
\newcommand{\xplural}{s\xspace}
\xspaceaddexceptions{\xplural}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% COMMENTS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\BG}[1]{\todo{\textbf{Boris says:$\,$} #1}}
\newcommand{\SF}[1]{\todo{\textbf{Su says:$\,$} #1}}
\newcommand{\OK}[1]{\todo[color=gray]{\textbf{Oliver says:$\,$} #1}}
\newcommand{\AH}[1]{\todo[backgroundcolor=cyan]{\textbf{Aaron says:$\,$} #1}}
\newcommand{\AR}[1]{\todo[color=green]{\textbf{Atri says:$\,$} #1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% THEOREM LIKE ENVIRONMENTS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%v---what is this?
\DeclareMathAlphabet{\mathbbold}{U}{bbold}{m}{n}
\newtheorem{Theorem}{Theorem}[section]
\newtheorem{Definition}[Theorem]{Definition}
\newtheorem{Lemma}[Theorem]{Lemma}
\newtheorem{Proposition}[Theorem]{Proposition}
\newtheorem{Corollary}[Theorem]{Corollary}
\newtheorem{Example}[Theorem]{Example}
\newtheorem{hypo}[Theorem]{Conjecture}%used in mult_distinct_p.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Rel model
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Need to have all Rel stuff in one place
\newcommand{\tup}{t}
\newcommand{\rel}{R}
\newcommand{\prel}{\mathcal{\rel}}
\newcommand{\reli}{S}
\newcommand{\reli}{S}%<----better names?
\newcommand{\relii}{T}
\newcommand{\db}{D}
\newcommand{\idb}{\Omega}
\newcommand{\pdb}{\mathcal{D}}
\newcommand{\pxdb}{\mathbf{D}}
\newcommand{\nxdb}{D(\vct{X})}%\mathbb{N}[\vct{X}] db
\newcommand{\tset}{\mathcal{T}}%the set of tuples in a database
\newcommand{\pd}{\vct{P}}%pd for probability distribution
\newcommand{\eval}[1]{\llbracket #1 \rrbracket}%evaluation double brackets
\newcommand{\evald}[2]{\eval{{#1}}_{#2}}
\newcommand{\linsett}[3]{\Phi_{#1,#2}^{#3}}
\newcommand{\query}{Q}
\newcommand{\tset}{\mathcal{T}}%the set of tuples in a database
\newcommand{\join}{\Join}
\newcommand{\select}{\sigma}
\newcommand{\project}{\pi}
\newcommand{\union}{\cup}
\newcommand{\sch}{sch}
\newcommand{\attr}[1]{attr\left(#1\right)}
\newcommand{\rw}{\textbf{W}}%\rw for random world
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TERMINOLOGY AND ABBREVIATIONS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Perhaps PDB abbreviations should go here?
\newcommand{\expectProblem}{\textsc{Expected Result Multiplicity Problem}\xspace}
\newcommand{\termSMB}{standard monomial basis\xspace}
\newcommand{\abbrSMB}{SMB\xspace}%we already have this; one has to go
\newcommand{\termSOP}{sum of products\xspace}
\newcommand{\abbrSOP}{SOP}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Function Names and Typesetting %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\domain}{\func{Dom}}
\newcommand{\func}[1]{\textsc{#1}}
\newcommand{\polyf}{\func{poly}}
\newcommand{\evalmp}{\func{eval}}
\newcommand{\degree}{\func{deg}}
\newcommand{\size}{\func{size}}
\newcommand{\depth}{\func{depth}}
\newcommand{\topord}{\func{TopOrd}\xspace}
\newcommand{\smbOf}[1]{\func{\abbrSMB}\inparen{#1}}
%Verify if we need the above...
%saving \treesize for now to keep latex from breaking
\newcommand{\treesize}{\func{size}}
%I believe this is used in the algo psuedocode
\newcommand{\sign}{\func{sgn}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SEMIRINGS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\udom}{\mathcal{U}}
\newcommand{\domK}{K}
\newcommand{\semK}{\mathcal{K}}
\newcommand{\semN}{\mathbb{N}}
\newcommand{\semNX}{\mathbb{N}[\vct{X}]}
\newcommand{\onesymbol}{\mathbbold{1}}
\newcommand{\zerosymbol}{\mathbbold{0}}
\newcommand{\multsymb}{\otimes}
\newcommand{\addsymbol}{\oplus}
\newcommand{\addK}{\addsymbol_{\semK}}
\newcommand{\multK}{\multsymb_{\semK}}
\newcommand{\oneK}{\onesymbol_{\semK}}
\newcommand{\zeroK}{\zerosymbol_{\semK}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Incomplete DB/PDBs %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\idb}{\Omega}
\newcommand{\pd}{\mathcal{P}}%pd for probability distribution
\newcommand{\pdb}{\mathcal{D}}
\newcommand{\pxdb}{\pdb_{\semNX}}%<---changed from the origninal \mathbf{\db}
\newcommand{\nxdb}{D(\vct{X})}%\mathbb{N}[\vct{X}] db--Are we currently using this?
%BIDB
\newcommand{\block}{b}
\newcommand{\bivar}{x_{\block, i}}
%PDB Abbreviations
\newcommand{\abbrPDB}{\textnormal{PDB}\xspace}
\newcommand{\ti}{TIDB\xspace}
\newcommand{\tis}{TIDBs\xspace}
\newcommand{\bi}{BIDB\xspace}
\newcommand{\bis}{BIDBs\xspace}
%not sure if we use these; arguably the above abbrev macros should have a name change
\newcommand{\tiabb}{ti}
\newcommand{\biabb}{bi}
\newcommand{\biwset}{\idb_{\biabb}}
\newcommand{\biord}{\leq_{x_\block}}
\newcommand{\tiwset}{\idb_{\tiabb}}
\newcommand{\bipd}{\pd_{\biabb}}
\newcommand{\tipd}{\pd_{\tiabb}}
\newcommand{\bipdb}{\pdb_{\biabb}}
\newcommand{\tipdb}{\pdb_{\tiabb}}
%--------------------------------
\newcommand{\probDist}{\vct{\probOf}}%<---I don't think we need this.
\newcommand{\probAllTup}{\vct{\prob}}%<---I was using simply \vct{\prob}; decide on a convention
\newcommand{\wSet}{\Omega}%<---We have \idb, the set of possible worlds; decide on one of these
%Is this being used?
\newcommand{\pdbx}{X_{DB}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Math Symbols, Functions/Operators, Containers %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Number Sets
\newcommand{\domR}{\mathbb{R}}%Consider changing this to \domR
\newcommand{\domN}{\mathbb{N}}
%Probability, Expectation
\newcommand{\expct}{\mathop{\mathbb{E}}}%why not just call this \expect
\newcommand{\probOf}{Pr}%probability function
%Functions/Operators
\newcommand{\abs}[1]{\left|#1\right|}
\newcommand{\suchthat}{\;|\;} %such that
\newcommand{\comprehension}[2]{\left\{\;#1\;|\;#2\;\right\}}
\newcommand{\eval}[1]{\llbracket #1 \rrbracket}%evaluation double brackets
\newcommand{\evald}[2]{\eval{{#1}}_{#2}}
%Containers
\newcommand{\pbox}[1]{\left[#1\right]}%<---used for expectation
\newcommand{\pbrace}[1]{\left\{#1\right\}}
%consider replacing \pbrace with what is below
\newcommand{\inparen}[1]{\left({#1}\right)}
\newcommand{\inset}[1]{\left\{{#1}\right\}}%we already have this as \pbrace; need to pick one
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Variable, Polynomial and Vector Notation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Instance Variables
\newcommand{\prob}{p}
\newcommand{\wElem}{w} %an element of \vct{w}
%Polynomial Variables
\newcommand{\pVar}{X}%<----not used but recomment instituting this--pVar for polyVar
\newcommand{\kElem}{k}%the kth element<---where and how are we using this?
%Random Variables
\newcommand{\randWorld}{W}
\newcommand{\randDB}{\overline{\db}}
\newcommand{\rvW}{W}%\rvW for random variable of type World
%One of these needs to go...I think...
\newcommand{\randomvar}{W}%this little guy needs a home!
%Container for Polynomial Params
\newcommand{\polyinput}[2]{\left(#1,\ldots, #2\right)}%do we still use this?
%Number of Variables--this could easily be number of tups--maybe move to Rel Model?
\newcommand{\numvar}{n}
%Vector
\newcommand{\vct}[1]{{\bf #1}}
%using \wVec for world bit vector notation<-----Is this still the case?
%Polynomial
\newcommand{\poly}{\Phi}
\newcommand{\polyX}{\poly\inparen{\pVar}}%<---let's see if this proves handy
\newcommand{\rpoly}{\widetilde{\poly}}%r for reduced as in reduced 'Q'
\newcommand{\rpolyX}{\rpoly\inparen{\pVar}}%<---if this isn't something we use much, we can get rid of it
\newcommand{\polyForTuple}{\poly_{\tup}}%do we use this?
%Do we use this?
\newcommand{\out}{output}%output aggregation over the output vector
\newcommand{\prel}{\mathcal{\rel}}%What is this?
\newcommand{\linsett}[3]{\Phi_{#1,#2}^{#3}}%Where is this used?
\newcommand{\wbit}{w}%don't think we need this one
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Graph Notation %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\graph}[1]{G^{(#1)}}
\newcommand{\numocc}[2]{\#\left(#1,#2\right)}
\newcommand{\eset}[1]{E^{(#1)}_S} %edge set for arbitrary subgraph
%I don't think we use these anymore
\newcommand{\linsys}[1]{LS(\graph{#1})}
\newcommand{\lintime}[1]{LT^{\graph{#1}}}
\newcommand{\aug}[1]{AUG^{\graph{#1}}}
@ -43,12 +203,34 @@
\newcommand{\dtrm}[1]{Det\left(#1\right)}
\newcommand{\tuple}[1]{\left<#1\right>}
\newcommand{\indicator}[1]{\onesymbol\inparen{#1}}
%----------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Circuit Notation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\circuit}{\vari{C}}
\newcommand{\circuitset}[1]{\vari{CSet}\inparen{#1}}
\newcommand{\circmult}{\times}
\newcommand{\circplus}{+}
%formally \rchild and \lchild
\newcommand{\rinput}{\vari{R}}
\newcommand{\linput}{\vari{L}}
\newcommand{\inp}{\vari{input}}
\newcommand{\inputs}{\vari{inputs}}%do we use this?
\newcommand{\subcircuit}{\vari{S}}%does this clash/conflict with \coeffset?
\newcommand{\gate}{\vari{g}}
\newcommand{\lwght}{\vari{Lweight}}
\newcommand{\rwght}{\vari{Rweight}}
\newcommand{\prt}{\vari{partial}}
\newcommand{\degval}{\vari{degree}}
\newcommand{\type}{\vari{type}}
%Do we use this?
\newcommand{\subgraph}{\vari{S}_{\equivtree(\circuit)}}
%-----
\newcommand{\cost}{\func{Cost}}
\newcommand{\nullval}{NULL}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Query Classes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\qClass}{\mathcal{Q}}
\newcommand{\raPlus}{\ensuremath{\mathcal{RA}^{+}}\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Datalog
@ -56,9 +238,24 @@
\newcommand{\dlImp}[0]{\,\ensuremath{\mathtt{{:}-}}\,}
\newcommand{\dlDontcare}{\_}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Query Classes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\qClass}{\mathcal{Q}}
\newcommand{\raPlus}{\ensuremath{\mathcal{RA}^{+}}\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% COMPLEXITY
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\sharpphard}{\#P-hard\xspace}
\newcommand{\sharpwonehard}{\#W[1]-hard\xspace}
\newcommand{\ptime}{PTIME\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Approx Alg
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\randvar}{Y}
\newcommand{\coeffset}{S}
\newcommand{\distinctvars}{d}
@ -74,6 +271,7 @@
\newcommand{\conf}{\delta}
%Pseudo Code Notation
\newcommand{\plus}{\texttt{+}}
\newcommand{\mult}{\texttt{\times}}
@ -81,18 +279,23 @@
\newcommand{\approxq}{\algname{Approximate$\rpoly$}}
\newcommand{\onepass}{\algname{OnePass}}
\newcommand{\sampmon}{\algname{SampleMonomial}}
%I don't think we use reduce anymore
\newcommand{\reduce}{\algname{Reduce}}
\newcommand{\ceil}[1]{\left\lceil #1 \right\rceil}
\newcommand{\vari}[1]{\texttt{#1}\xspace}
\newcommand{\accum}{\vari{acc}}
\newcommand{\numsamp}{\vari{N}}
\newcommand{\numedge}{m}
\newcommand{\bivec}{\vari{b}_{\vari{vec}}}
\newcommand{\numsamp}{\vari{N}}%we have \samplesize above; we can get rid of one of these
\newcommand{\numedge}{m}%we have set size above; we can get rid of one of these
\newcommand{\bivec}{\vari{b}_{\vari{vec}}}%Section 3--proof in appendix for last theorem
%Major cleaning needed to get rid of obsolete notation like expression trees, etc.
%I don't know that we use any of the expression tree macros anymore; if we do, they would be predominantly in S 3 and 4 and their respective appendices
%expression tree T
\newcommand{\etree}{\vari{T}}
\newcommand{\stree}{\vari{S}}
\newcommand{\lchild}{\vari{L}}
\newcommand{\rchild}{\vari{R}}
%I don't think we talk of T but of C; let's update this. These should be used only in S 2 and S4
%members of T
\newcommand{\val}{\vari{val}}
\newcommand{\wght}{\vari{weight}}
@ -103,77 +306,82 @@
%%%%%%%
\renewcommand{\algorithmicrequire}{\textbf{Input:}}
\renewcommand{\algorithmicensure}{\textbf{Output:}}
\newcommand{\smb}{\poly\left(\vct{X}\right)}%smb for standard monomial basis
\newcommand{\smbOf}[1]{\textsc{SMB}(#1)}
%\newcommand{\smb}{\poly\left(\vct{X}\right)}%smb for standard monomial basis; S 2<---this command is, I believe, unnecessary
%not sure if we use this
%not sure if we use this
\newcommand{\etreeset}[1]{\vari{ET}\left(#1\right)}
%verify this
%\expandtree is a placeholder until I change other files with the new macro name \expansion
\newcommand{\expandtree}[1]{\vari{E}(#1)}
\newcommand{\expansion}[1]{\vari{E}(#1)}
%not sure if we use this; I think the only occurrence would be in the def section of S 4
\newcommand{\elist}[1]{\vari{List}\pbox{#1}}
%not sure if we use this anymore either
\newcommand{\equivtree}{\vari{EET}}
%expandtree tuple elements:
\newcommand{\monom}{\vari{v}}
\newcommand{\coef}{\vari{c}}
%----------------------------------
\newcommand{\abs}[1]{\left|#1\right|}
\newcommand{\func}[1]{\textsc{#1}}
\newcommand{\polyf}{\func{poly}}
\newcommand{\evalmp}{\func{eval}}
\newcommand{\degree}{\func{deg}}
\newcommand{\size}{\func{size}}
\newcommand{\depth}{\func{depth}}
\newcommand{\topord}{\func{TopOrd}\xspace}
%saving \treesize for now to keep latex from breaking
\newcommand{\treesize}{\func{size}}
\newcommand{\sign}{\func{sgn}}
%Random Variable
\newcommand{\randomvar}{W}
\newcommand{\domain}{\func{Dom}}
%PDBs
\newcommand{\pdbx}{X_{DB}}
\newcommand{\prob}{p}
\newcommand{\probOf}{P}
\newcommand{\probDist}{\vct{\probOf}}
\newcommand{\probAllTup}{\vct{\prob}}
\newcommand{\wSet}{\Omega}
\newcommand{\ti}{TIDB\xspace}
\newcommand{\tis}{TIDBs\xspace}
\newcommand{\bi}{BIDB\xspace}
\newcommand{\bis}{BIDBs\xspace}
\newcommand{\tiabb}{ti}
\newcommand{\biabb}{bi}
\newcommand{\biwset}{\idb_{\biabb}}
\newcommand{\block}{b}
\newcommand{\biord}{\leq_{x_\block}}
\newcommand{\tiwset}{\idb_{\tiabb}}
\newcommand{\bipd}{\pd_{\biabb}}
\newcommand{\tipd}{\pd_{\tiabb}}
\newcommand{\bipdb}{\pdb_{\biabb}}
\newcommand{\bivar}{x_{\block, i}}
\newcommand{\tipdb}{\pdb_{\tiabb}}
% REPRESENTATIONS
\newcommand{\rmod}{Mod}
\newcommand{\reprs}{\mathcal{M}}
% REPRESENTATIONS--this might be Boris' or Atri's stuff; verify if these macros are current
\newcommand{\rmod}{Mod}%mod function which transforms N[X]-DB to N-DB (S 2 and App A)
\newcommand{\reprs}{\mathcal{M}}%used to define Representation System in App A
\newcommand{\repr}{M}
%Polynomial Reformulation
\newcommand{\wbit}{w}
\newcommand{\expct}{\mathop{\mathbb{E}}}
\newcommand{\pbox}[1]{\left[#1\right]}
\newcommand{\pbrace}[1]{\left\{#1\right\}}
\newcommand{\vct}[1]{{\bf #1}}
%using \wVec for world bit vector notation
\newcommand{\poly}{Q}
\newcommand{\rpoly}{\widetilde{Q}}%r for reduced as in reduced 'Q'
\newcommand{\polyForTuple}{\poly_{\tup}}
\newcommand{\out}{output}%output aggregation over the output vector
\newcommand{\numocc}[2]{\#\left(#1,#2\right)}
%not sure about these? Perhaps in appendix B for \assign and S 5 for \support?
\newcommand{\assign}{\varphi}
\newcommand{\support}[1]{supp({#1})}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\eps}{\epsilon}%<----this is already defined as \error; need to pick one
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Forcing Layouts
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\trimfigurespacing}{\vspace*{-5mm}}
\newcommand{\mypar}[1]{\smallskip\noindent\textbf{{#1}.}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Proof/Section Headings %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Is this being used?
\newcommand{\caseheading}[1]{\smallskip \noindent \textbf{#1}.~}
%%%%%
%%%Adding stuff below so that long chain of display equatoons can be split across pages
\allowdisplaybreaks
%Macro for mult complexity
\newcommand{\multc}[2]{\overline{\mathcal{M}}\left({#1},{#2}\right)}
%consider perhaps putting the tikz code into a separate file.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Graph Symbols
% Tikz Graph Symbols
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Shift macro
\newcommand{\patternshift}[1]{\hspace*{-0.5mm}\raisebox{-0.35mm}{#1}\hspace*{-0.5mm} }
@ -264,172 +472,6 @@
\newcommand{\sg}[1]{S^{(#1)}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% COMMENTS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\BG}[1]{\todo{\textbf{Boris says:$\,$} #1}}
\newcommand{\SF}[1]{\todo{\textbf{Su says:$\,$} #1}}
\newcommand{\OK}[1]{\todo[color=gray]{\textbf{Oliver says:$\,$} #1}}
\newcommand{\AH}[1]{\todo[backgroundcolor=cyan]{\textbf{Aaron says:$\,$} #1}}
\newcommand{\SR}[1]{\todo[backgroundcolor=white]{\textbf{Note to self:$\,$} #1}}
\newcommand{\AR}[1]{\todo[color=green]{\textbf{Atri says:$\,$} #1}}
%\newcommand{\AR}[1]{}
%\newcommand{\AH}[1]{}
%\newcommand{\SF}[1]{}
%\newcommand{\BG}[1]{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Revision Parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\revision}[1]{\color{blue}{#1}\color{black}\xspace}
\newcommand{\oldstuff}[1]{\color{gray}{#1}\color{black}\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Circuit Notation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\circuit}{\vari{C}}
\newcommand{\circuitset}[1]{\vari{CSet}(#1)}
\newcommand{\circmult}{\times}
\newcommand{\circplus}{+}
%formally \rchild and \lchild
\newcommand{\rinput}{\vari{R}}
\newcommand{\linput}{\vari{L}}
\newcommand{\inp}{\vari{input}}
\newcommand{\inputs}{\vari{inputs}}
\newcommand{\subcircuit}{\vari{S}}
\newcommand{\gate}{\vari{g}}
\newcommand{\lwght}{\vari{Lweight}}
\newcommand{\rwght}{\vari{Rweight}}
\newcommand{\prt}{\vari{partial}}
\newcommand{\degval}{\vari{degree}}
\newcommand{\type}{\vari{type}}
\newcommand{\subgraph}{\vari{S}_{\equivtree(\circuit)}}
\newcommand{\cost}{\func{Cost}}
\newcommand{\nullval}{NULL}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Sets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\reals}{\mathbb{R}}
%
%borrowed from Su and Boris; Ihave them here for reference purposes.
%needs to be cleaned up
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% THEOREM LIKE ENVIRONMENTS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\DeclareMathAlphabet{\mathbbold}{U}{bbold}{m}{n}
\newtheorem{Theorem}{Theorem}[section]
\newtheorem{Definition}[Theorem]{Definition}
\newtheorem{Lemma}[Theorem]{Lemma}
\newtheorem{Proposition}[Theorem]{Proposition}
\newtheorem{Property}[Theorem]{Property}
\newtheorem{Corollary}[Theorem]{Corollary}
\newtheorem{Claim}[Theorem]{Claim}
\newtheorem{Example}[Theorem]{Example}
\newtheorem{Axiom}[Theorem]{Axiom}
\newtheorem{Question}[Theorem]{Question}
\newtheorem{Assumption}[Theorem]{Assumption}
\newtheorem{hypo}[Theorem]{Conjecture}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Rel model
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\tup}{t}
\newcommand{\comprehension}[2]{\left\{\;#1\;|\;#2\;\right\}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SEMIRINGS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\udom}{\mathcal{U}}
\newcommand{\semN}{\mathbb{N}}
\newcommand{\domN}{\mathbb{N}}
\newcommand{\semB}{\mathbb{B}}
\newcommand{\semNX}{\mathbb{N}[\vct{X}]}
\newcommand{\domK}{K}
\newcommand{\semK}{\mathcal{K}}
\newcommand{\support}[1]{supp({#1})}
\newcommand{\onesymbol}{\mathbbold{1}}
\newcommand{\zerosymbol}{\mathbbold{0}}
\newcommand{\multsymb}{\otimes}
\newcommand{\addsymbol}{\oplus}
\newcommand{\addormultsymbol}{\bigcdot}
\newcommand{\monsymbol}{\ominus}
\newcommand{\addK}{\addsymbol_{\semK}}
\newcommand{\multK}{\multsymb_{\semK}}
\newcommand{\genopK}{\odot_{\semK}}
\newcommand{\monK}{\monsymbol_{\semK}}
\newcommand{\oneK}{\onesymbol_{\semK}}
\newcommand{\zeroK}{\zerosymbol_{\semK}}
\newcommand{\addOf}[1]{\addsymbol_{#1}}
\newcommand{\multOf}[1]{\multsymb_{#1}}
\newcommand{\monOf}[1]{\monsymbol_{#1}}
\newcommand{\oneOf}[1]{\onesymbol_{#1}}
\newcommand{\zeroOf}[1]{\zerosymbol_{#1}}
\newcommand{\dbDomK}[1]{\mathcal{DB}_{#1}}
\newcommand{\assign}{\varphi}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% COMPLEXITY
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\sharpphard}{\#P-hard\xspace}
\newcommand{\sharpwonehard}{\#W[1]-hard\xspace}
\newcommand{\ptime}{PTIME\xspace}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TERMINOLOGY AND ABBREVIATIONS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\expectProblem}{\textsc{Expected Result Multiplicity Problem}\xspace}
\newcommand{\termSMB}{standard monomial basis\xspace}
\newcommand{\abbrSMB}{SMB\xspace}
\newcommand{\termSOP}{sum of products\xspace}
\newcommand{\abbrSOP}{SOP}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\eps}{\epsilon}
\newcommand{\inparen}[1]{\left({#1}\right)}
\newcommand{\inset}[1]{\left\{{#1}\right\}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Forcing Layouts
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\trimfigurespacing}{\vspace*{-5mm}}
\newcommand{\mypar}[1]{\smallskip\noindent\textbf{{#1}.}}
%%%Adding stuff below so that long chain of display equatoons can be split across pages
\allowdisplaybreaks
%Macro for mult complexity
\newcommand{\multc}[2]{\overline{\mathcal{M}}\left({#1},{#2}\right)}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"

View File

@ -6,19 +6,19 @@
We now introduce some terminology % for polynomials
and develop a reduced form (a closed form of the polynomial's expectation) for polynomials over probability distributions derived from a \bi or \ti.
%We will use $(X + Y)^2$ as a running example.
Note that a polynomial over $\vct{X}=(X_1,\dots,X_n)$ is formally defined as:
Note that a polynomial over $\vct{X}=(X_1,\dots,X_n)$ is formally defined as (with $c_\vct{i} \in \domN$):
\begin{equation}
\label{eq:sop-form}
Q(X_1,\dots,X_n)=\sum_{\vct{d}=(d_1,\dots,d_n)\in \semN^n} c_{\vct{d}}\cdot \prod_{i=1}^n X_i^{d_i}.
\poly\inparen{X_1,\dots,X_n}=\sum_{\vct{d}=(d_1,\dots,d_n)\in \semN^n} c_{\vct{d}}\cdot \prod_{i=1}^n X_i^{d_i}.
\end{equation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Standard Monomial Basis]\label{def:smb}
From above, the term $\prod_{i=1}^n X_i^{d_i}$ is a {\em monomial}. A polynomial $Q(\vct{X})$ is in standard monomial basis (SMB) when we keep only the terms with $c_{\vct{i}}\ne 0$ from \Cref{eq:sop-form}.
From above, the term $\prod_{i=1}^n X_i^{d_i}$ is a {\em monomial}. A polynomial $\poly\inparen{\vct{X}}$ is in standard monomial basis (\abbrSMB) when we keep only the terms with $c_{\vct{i}}\ne 0$ from \Cref{eq:sop-form}.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We consider \abbrSMB as the default representation of a polynomial.
We use $\smbOf{\poly}$ to denote the SMB form of a polynomial $\poly$.
We use $\smbOf{\poly}$ to denote the \abbrSMB form of a polynomial $\poly$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Degree]\label{def:degree}
@ -29,21 +29,7 @@ The degree of polynomial $\poly(\vct{X})$ is the largest $\sum_{i=1}^n d_i$ such
The degree of the polynomial $X^2+2XY+Y^2$ is $2$.
Product terms in lineage arise only from join operations (\Cref{fig:nxDBSemantics}), so intuitively, the degree of a lineage polynomial is analogous to the largest number of joins in any clause of the UCQ query that created it.
In this paper we consider only finite degree polynomials.
We call a polynomial $\query(\vct{X})$ a \emph{\bi-lineage polynomial} (resp., \emph{\ti-lineage polynomial}, or simply lineage polynomial), if
%\AH{Why is it required for the tuple to be n-ary? I think this slightly confuses me since we have n tuples.}
% OK: agreed w/ AH, this can be treated as implicit
there exists a $\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$, or $\semNX$-PDB $\pxdb$), and tuple $\tup$ such that $\query(\vct{X}) = \query(\pxdb)(\tup)$. % Before proceeding, note that the following is assume that polynomials are \bis (which subsume \tis as a special case).
%\SF{Where is $\block_{i, j}$ used? Is it $X_{\block_{1, 1}}$ or $X_{\block_1, 1}$ ?}
% and the probability distribution of $\pxdb$ is uniquely determined based on a probability vector $\vct{p}$ that associates each tuple a probability
% variables are independent of each other (or disjoint if they are from the same block) and each variable $X$ is associated with a probability $\vct{p}(X) = \pd[X = 1]$. Thus, we are dealing with polynomials $\poly(\vct{X})$ that are annotations of a tuple in the result of a query $\query$ over a BIDB $\pxdb$ where $\vct{X}$ is the set of variables that occur in annotations of tuples of $\pxdb$.
% While the definition of polynomial $\poly(\vct{X})$ over a $\bi$ input doesn't change, we introduce an alternative notation which will come in handy. Given $\ell$ blocks, we write $\poly(\vct{X})$ = $\poly(X_{\block_1, 1},\ldots, X_{\block_1, \abs{\block_1}},$ $\ldots, X_{\block_\ell, \abs{\block_\ell}})$, where $\abs{\block_i}$ denotes the size of $\block_i$, and $\block_{i, j}$ denotes tuple $j$ residing in block $i$ for $j$ in $[\abs{\block_i}]$.
% The number of tuples in the $\bi$ instance can be (trivially) computed as $\numvar = \sum\limits_{i = 1}^{\ell}\abs{\block_i}$ .
We call a polynomial $\poly\inparen{\vct{X}}$ a \emph{\bi-lineage polynomial} (resp., \emph{\ti-lineage polynomial}, or simply lineage polynomial), if there exists a \AH{Which formalism? UCQ?}$\raPlus$ query $\query$, \bi $\pxdb$ (\ti $\pxdb$, or $\semNX$-PDB $\pxdb$), and tuple $\tup$ such that $\poly\inparen{\vct{X}} = \query(\pxdb)(\tup)$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Modding with a set]\label{def:mod-set}
@ -126,7 +112,7 @@ Consider $\poly(X, Y) = (X + Y)(X + Y)$ where $X$ and $Y$ are from different blo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Valid Worlds]
For probability distribution $\probDist$, % and its corresponding probability mass function $\probOf$,
the set of valid worlds $\eta$ consists of all the worlds with probability value greater than $0$; i.e., for variable vector $\vct{W}$
the set of valid worlds $\eta$ consists of all the worlds with probability value greater than $0$; i.e., for random world variable vector $\vct{W}$
\[
\eta = \comprehension{\vct{w}}{\probOf[\vct{W} = \vct{w}] > 0}
\]

View File

@ -11,7 +11,7 @@ We represent query polynomials via {\em arithmetic circuits}~\cite{arith-complex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{Definition}[Circuit]\label{def:circuit}
A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source nodes (in degree of $0$) consist of elements in either $\reals$ or $\vct{X}$. The internal nodes and (the single) sink node of $\circuit$ (corresponding to the result tuple $t$) have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
A circuit $\circuit$ is a Directed Acyclic Graph (DAG) whose source nodes (in degree of $0$) consist of elements in either $\domR$ or $\vct{X}$. The internal nodes and (the single) sink node of $\circuit$ (corresponding to the result tuple $t$) have binary input and are either sum ($\circplus$) or product ($\circmult$) gates.
%
Each node in a circuit $\circuit$ has the following members: \type, \val, \vpartial, \vari{input}, \degval and \vari{Lweight}, \vari{Rweight}, where \type is the type of value stored in the node (one of $\{\circplus, \circmult, \var, \tnum\}$, \val is the value stored (a constant or variable), and \vari{input} is the list of the nodes inputs. We use $\circuit_\linput$ to denote the left input and $\circuit_\rinput$ the right input or the sink of circuit $\circuit$.
%The member \degval holds the degree of \circuit.
@ -108,13 +108,12 @@ Denote $\polyf(\circuit)$ to be the function from circuit $\circuit$ to its corr
Note that $\circuit$ need not encode an expression in SMB. For instance, $\circuit$ could represent a compressed form of the running example, such as $(X + 2Y)(2X - Y)$, as shown in \Cref{fig:circuit}, while $\polyf(\circuit) = 2X^2+3XY-2Y^2$.
\begin{Definition}[Circuit Set]\label{def:circuit-set}
$\circuitset{\smb}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \poly(\vct{X})$.
$\circuitset{\smbOf{\polyX}}$ is the set of all possible circuits $\circuit$ such that $\polyf(\circuit) = \smbOf\polyX$.
\end{Definition}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$. One can think of $\circuitset{\smb}$ as the infinite set of circuits each of which model an encoding (factorization) equal to $\polyf(\circuit)$.
%\supset \{2X^2 + 3XY - 2Y^2, (X + 2Y)(2X - Y), X(2X - Y) + 2Y(2X - Y), 2X(X + 2Y) - Y(X + 2Y)\}$.
The circuit of \Cref{fig:circuit} is an element of $\circuitset{2X^2+3XY-2Y^2}$. One can think of $\circuitset{\smbOf{\polyX}}$ as the infinite set of circuits each of which model an encoding (factorization) equal to $\polyf(\circuit)$.
Note that \Cref{def:circuit-set} implies that $\circuit \in \circuitset{\polyf(\circuit)}$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -126,7 +125,7 @@ Note that \Cref{def:circuit-set} implies that $\circuit \in \circuitset{\polyf(\
Let $\vct{X} = (X_1, \ldots, X_n)$, and $\pdb$ be an $\semNX$-PDB over $\vct{X}$ with probability distribution $\pd$ over assignments $\vct{X} \to \{0,1\}$, $\query$ an n-ary query, and $t$ an n-ary tuple.
The \expectProblem is defined as follows:\\[-7mm]
\begin{center}
\textbf{Input}: A circuit $\circuit \in \circuitset{\smb}$ for $\poly(\vct{X}) = \query(\pxdb)(t)$
\textbf{Input}: A circuit $\circuit \in \circuitset{\smbOf{\polyX}}$ for $\poly(\vct{X}) = \query(\pxdb)(t)$
\hspace*{5mm}\textbf{Output}: $\expct_{\vct{W} \sim \pd}[\poly(\vct{W})]$
\end{center}
\end{Definition}

View File

@ -4,7 +4,7 @@
\section{Background and Notation}\label{sec:background}
\iffalse
\subsection{Superlinearity of Bag PDBs}\label{sec:suplin-bags}
\subsection{Superlinearity of Bag \abbrPDB\xplural}\label{sec:suplin-bags}
Moving forward, we focus exclusively on bags. For $Q()\dlImp$$OnTime(\text{City}), Route(\text{City}_1, \text{City}_2),$ $OnTime(\text{City}')$ over the bag relations of \Cref{fig:ex-shipping-simp}, consider the product query $\poly^2()\dlImp Q \times Q$.
The factorized representation of $\poly^2$ is (for simplicity we ignore the random variables of $Route$ since each variable has probability of $1$):
\begin{equation*}
@ -33,18 +33,18 @@ With $\poly^2$ as an example, we have:
It can be verified that the reduced polynomial is a closed form of the expected count (i.e., $\expct\pbox{\poly^2} = \rpoly(\probOf\pbox{L_a=1}, \probOf\pbox{L_b=1}, \probOf\pbox{L_c=1}), \probOf\pbox{L_d=1})$).
The reduced form of a lineage polynomial can be obtained but requires a linear scan over the clauses of an SOP encoding of the polynomial. Note that for a compressed representation, this scheme would require an exponential number of computations in the size of the compressed representation. In \Cref{sec:hard}, we use $\rpoly$ to prove our hardness results .
%In prior work on lineage-based Bag-PDBs~\cite{kennedy:2010:icde:pip,DBLP:conf/vldb/AgrawalBSHNSW06,yang:2015:pvldb:lenses} where this encoding is implicitly assumed, computing the expected count is linear in the size of the encoding.
%In prior work on lineage-based Bag-\abbrPDB\xplural~\cite{kennedy:2010:icde:pip,DBLP:conf/vldb/AgrawalBSHNSW06,yang:2015:pvldb:lenses} where this encoding is implicitly assumed, computing the expected count is linear in the size of the encoding.
%In general however, compressed encodings of the polynomial can be exponentially smaller in $k$ for $k$-products --- the query $\poly^k$ obtained by taking the product of $k$ copies of $\poly$ as a factorized encoding of size $6\cdot k$, while the SOP encoding is of size $2\cdot 3^k$.
%This leads us to the \textbf{central question of this paper}:
%\begin{quote}
%{\em
%Is it always the case that the expectation of a UCQ in a Bag-PDB can be computed in time linear in the size of the \textbf{compressed} lineage polynomial?}
%Is it always the case that the expectation of a UCQ in a Bag-\abbrPDB can be computed in time linear in the size of the \textbf{compressed} lineage polynomial?}
%\end{quote}
%If so, then Bag-PDBs can indeed compete with deterministic databases.
%If so, then Bag-\abbrPDB\xplural can indeed compete with deterministic databases.
%This is unfortunately not the case, and an approximation is required.
\fi
\subsection{Probabilistic Databases (PDBs)}
\subsection{Probabilistic Databases (\abbrPDB\xplural)}
An \textit{incomplete database} $\idb$ is a set of deterministic databases $\db$ called possible worlds.
Denote the schema of $\db$ as $\sch(\db)$. A \textit{probabilistic database} $\pdb$ is a pair $(\idb, \pd)$ where $\idb$ is an incomplete database and $\pd$ is a probability distribution over $\idb$. Queries over probabilistic databases are evaluated using the so-called possible world semantics. Under the possible world semantics, the result of a query $\query$ over an incomplete database $\idb$ is the set of query answers produced by evaluating $\query$ over each possible world: $\query(\idb) = \comprehension{\query(\db)}{\db \in \idb}$.
@ -53,12 +53,12 @@ For a probabilistic database $\pdb = (\idb, \pd)$, the result of a query is th
%
\[\forall \db \in \query(\idb): \pd'(\db) = \sum_{\db' \in \idb: \query(\db') = \db} \pd(\db') \]
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_n)$ with natural number coefficients and exponents.
Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_\numvar)$ with natural number coefficients and exponents.
We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}. % and summarized here.
$\semNX$-relations are functions from tuples to elements of $\semNX$, typically called annotations.
We write $R(t)$ to denote the polynomial annotating tuple $t$ in relation $R$. Note that $R(t)$ is the lineage polynomial for $t$.
Each possible world is defined by an assignment of $N$ binary values $\vct{W} \in \{0, 1\}^{\abs{\vct{X}}}$ to $\vct{X}$.
The multiplicity of $t \in R$ in this possible world, denoted $R(t)(\vct{W})$, is obtained by evaluating the polynomial annotating $t$ on $\vct{W}$.
Each possible world is defined by an assignment of $\numvar$ binary values $\vct{\wElem} \in \{0, 1\}^{\numvar}$ to $\vct{X}$.
The multiplicity of $t \in R$ in this possible world, denoted $R(t)(\vct{\wElem})$, is obtained by evaluating the polynomial annotating $t$ on $\vct{\wElem}$.
$\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
@ -88,12 +88,12 @@ $\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We will use $\semNX$-PDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-PDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{|\vct X|} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
$\semNX$-PDBs and a function $\rmod$ (which transforms an $\semNX$-PDB to classical, or $\semN$-PDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
We will use $\semNX$-\abbrPDB $\pxdb$, defined as the tuple $(\idb_{\semNX}, \pd)$, where $\semNX$-database $\idb_{\semNX}$ is paired with probability distribution $\pd$.
We denote by $\polyForTuple$ the annotation of tuple $t$ in the result of $\query$ on an implicit $\semNX$-\abbrPDB (i.e., $\polyForTuple = \query(\pxdb)(t)$ for some $\pxdb$) and as before, interpret it as a function $\polyForTuple: \{0,1\}^{\numvar} \rightarrow \semN$ from vectors of variable assignments to the corresponding value of the annotating polynomial.
$\semNX$-\abbrPDB\xplural and a function $\rmod$ (which transforms an $\semNX$-\abbrPDB to a classical bag-\abbrPDB, or $\semN$-\abbrPDB~\cite{DBLP:conf/pods/GreenKT07,feng:2019:sigmod:uncertainty}) are both formalized in \Cref{subsec:supp-mat-background}.
\begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
Given an $\semN$-PDB $\pdb = (\idb,\pd)$ and $\semNX$-PDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, we have:
$ \expct_{\idb \sim \pd}[\query(\idb)(t)] = \expct_{\vct{W} \sim \pd'}\pbox{\polyForTuple(\vct{W})}. $
Given an $\semN$-\abbrPDB $\pdb = (\idb,\pd)$ and $\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, we have:
$ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\randWorld\sim \pd'}\pbox{\polyForTuple(\randWorld)}. $
\footnote{Although assumed by most prior work on set-probabilistic databases, e.g., as an obvious consequence of~\cite{IL84a}'s Theorem 7.1, we are unaware of any formal proof for bag-probabilistic databases.}
\end{Proposition}
\noindent A formal proof of \Cref{prop:expection-of-polynom} is given in \Cref{subsec:expectation-of-polynom-proof}.
@ -102,15 +102,15 @@ We focus on this problem from now on, assume an implicit result tuple, and so dr
\subsubsection{\tis and \bis}
\label{subsec:tidbs-and-bidbs}
In this paper, we focus on two popular forms of PDBs: Block-Independent (\bi) and Tuple-Independent (\ti) PDBs.
In this paper, we focus on two popular forms of \abbrPDB\xplural: Block-Independent (\bi) and Tuple-Independent (\ti) \abbrPDB\xplural.
%
A \bi $\pxdb = (\idb_{\semNX}, \pd)$ is an $\semNX$-PDB such that (i) every tuple is annotated with either $0$ (i.e., the tuple does not exist) or a unique variable $X_i$ and (ii) that the tuples $\tup$ of $\pxdb$ for which $\pxdb(\tup) \neq 0$ can be partitioned into a set of blocks such that variables from separate blocks are independent of each other and variables from the same block are disjoint events.
A \bi $\pxdb = (\idb_{\semNX}, \pd)$ is an $\semNX$-\abbrPDB such that (i) every tuple is annotated with either $0$ (i.e., the tuple does not exist) or a unique variable $X_i$ and (ii) that the tuples $\tup$ of $\pxdb$ for which $\pxdb(\tup) \neq 0$ can be partitioned into a set of blocks such that variables from separate blocks are independent of each other and variables from the same block are disjoint events.
In other words, each random variable corresponds to the event of a single tuple's presence.
%
A \emph{\ti} is a \bi where each block contains exactly one tuple.
\Cref{subsec:supp-mat-ti-bi-def} explains \tis and \bis in greater detail.
%
In a \bi (and by extension a \ti) $\pxdb$, tuples are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ where tuple $t_{i,j} \in \block_i$ is associated with a probability $\prob_{\tup_{i,j}} = \pd[X_{i,j} = 1]$, and is annotated with a unique variable $X_{i,j}$.\footnote{
In a \bi (and by extension a \ti) $\pxdb$, tuples are partitioned into $\ell$ blocks $\block_1, \ldots, \block_\ell$ where tuple $t_{i,j} \in \block_i$ is associated with a probability $\prob_{\tup_{i,j}} = \probOf[X_{i,j} = 1]$, and is annotated with a unique variable $X_{i,j}$.\footnote{
Although only a single independent, $[\abs{\block_i}+1]$-valued variable is customarily used per block, we decompose it into $\abs{\block_i}$ correlated $\{0,1\}$-valued variables per block that can be used directly in polynomials (without an indicator function). For $t_j \in b_i$, the event $(X_{i,j} = 1)$ corresponds to the event $(X_i = j)$ in the customary annotation scheme.
}
Because blocks are independent and tuples from the same block are disjoint, the probabilities $\prob_{\tup_{i,j}}$ and the blocks induce the probability distribution $\pd$ of $\pxdb$.