ra

2021-09-17 13:11:40 -05:00 · 2021-09-17 13:11:40 -05:00 · 84655e15c7
parent f657c94086
commit 84655e15c7
2 changed files with 9 additions and 9 deletions
--- a/intro-rewrite-070921.tex
+++ b/intro-rewrite-070921.tex
@ -245,7 +245,7 @@ We show in \Cref{sec:gen}
 A key insight of this paper is that the representation of $\circuit$ matters.
 For example, if we insist that $\circuit$ represent the lineage polynomial in the standard monomial basis (henceforth, \abbrSMB)\footnote{
  This is the representation, typically used in set-\abbrPDB\xplural, where the polynomial is reresented as sum of `pure' products. See \Cref{def:smb} for a formal definition.
-}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \dbbase}}^k}$, and hence, just $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ will be too large.
+}, the answer to the above question in general is no, since then we will need $\abs{\circuit}\ge \Omega\inparen{\inparen{\qruntime{Q, \dbbase}}^k}$\BG{should be $|\idb |$?}, and hence, just $\timeOf{\abbrStepOne}(Q,\dbbase,\circuit)$ will be too large.

 However, systems can directly emit compact, factorized representations of $\poly(\vct{X})$ (e.g., as a consequence of the standard projection push-down optimization~\cite{DBLP:books/daglib/0020812}).
 For example, in~\Cref{fig:two-step}, $B(Y+Z)$ is a factorized representation of the SMB-form $BY+BZ$.
--- a/ra-to-poly.tex
+++ b/ra-to-poly.tex
@ -19,10 +19,10 @@ For a probabilistic  database $\pdb = (\idb, \pd)$,  the result of a query is th
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 %Let $\semNX$ denote the set of polynomials over variables $\vct{X}=(X_1,\dots,X_\numvar)$ with natural number coefficients and exponents.
-%We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}. 
+%We model incomplete relations using Green et. al.'s $\semNX$-databases~\cite{DBLP:conf/pods/GreenKT07}, discussed in detail in \Cref{subsec:supp-mat-krelations}.
 % $\semNX$-databases are functions from tuples to elements of $\semNX$, typically called annotations.
-%Given an $\semNX$-database $\db$,  it is common to use $\db(\tup)$ to denote the polynomial annotating tuple $\tup$ in $\db$. 
-%%Note that based on this definition of $\rel$, $\rel(\tup)$ is the lineage polynomial for $\tup$.  
+%Given an $\semNX$-database $\db$,  it is common to use $\db(\tup)$ to denote the polynomial annotating tuple $\tup$ in $\db$.
+%%Note that based on this definition of $\rel$, $\rel(\tup)$ is the lineage polynomial for $\tup$.
 %Let $\numvar$ be the number of tuples in $\pdb$.  Then, each possible world is defined by an assignment of $\numvar$ binary values $\vct{\wElem} \in \{0, 1\}^{\numvar}$ to $\vct{X}$.
 %The multiplicity of $\tup \in \db$, denoted $\db(\tup)(\vct{\wElem})$, is obtained by evaluating the polynomial annotating $\tup$ on $\vct{\wElem}$.
 %$\semNX$-relations are closed under $\raPlus$ (\Cref{fig:nxDBSemantics}).
@ -38,7 +38,7 @@ For a probabilistic  database $\pdb = (\idb, \pd)$,  the result of a query is th
 Recall \Cref{fig:nxDBSemantics} which depicts the semantics for constructing a lineage polynomial $\apolyqdt$ for any $\raPlus$ query.  We now make a meaningful connection between possible world semantics and world assignments on the lineage polynomial.

 \begin{Proposition}[Expectation of polynomials]\label{prop:expection-of-polynom}
-Given a \abbrBPDB $\pdb = (\idb,\pd)$ and lineage polynomial $\apolyqdt$ for aribitrary output tuple $\tup$, %$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$, 
+Given a \abbrBPDB $\pdb = (\idb,\pd)$, $\raPlus$ query $\query$, and lineage polynomial $\apolyqdt$ for aribitrary output tuple $\tup$, %$\semNX$-\abbrPDB $\pxdb = (\idb_{\semNX}',\pd')$ where $\rmod(\pxdb) = \pdb$,
 we have (denoting $\randDB$ as the random variable over $\idb$):
  $ \expct_{\randDB \sim \pd}[\query(\randDB)(t)] = \expct_{\vct{\randWorld}\sim \pdassign}\pbox{\apolyqdt\inparen{\vct{\randWorld}}}. $
 \end{Proposition}
@ -50,14 +50,14 @@ We focus on the problem of computing $\expct_\pdassign\pbox{\apolyqdt\inparen{\v
 \label{subsec:tidbs-and-bidbs}
 In this paper, we focus on two popular forms of \abbrPDB\xplural: Block-Independent (\bi) and Tuple-Independent (\ti) \abbrPDB\xplural.
 %
-A \bi $\pdb$ is a \abbrPDB with the constraint that 
-%(i) every tuple $\tup_i$ is annotated with a unique random variable $\randWorld_i \in \{0, 1\}$ and (ii) that 
+A \bi $\pdb$ is a \abbrPDB with the constraint that
+%(i) every tuple $\tup_i$ is annotated with a unique random variable $\randWorld_i \in \{0, 1\}$ and (ii) that
 the tuples in $\dbbase$ can be partitioned into a set of $\ell$ blocks such that tuples $\tup_{i, j}, \tup_{k, j'}$ from separate blocks $(i\neq k, j \in [\abs{i}], j' \in [\abs{k}])$ are independent of each other while tuples $\tup_{i, j}, \tup_{i, k}$ from the same block are disjoint events.\footnote{
  Although only a single independent, $[\abs{\block_i}+1]$-valued variable is customarily used per block~\cite{DBLP:series/synthesis/2011Suciu}, we decompose it into $\abs{\block_i}$ correlated $\{0,1\}$-valued variables per block that can be used directly in polynomials (without an indicator function).  For $t_{i, j} \in b_i$, the event $(\randWorld_{i,j} = 1)$ corresponds to the event $(\randWorld_i = j)$ in the customary annotation scheme.
-}  
+}
 Each tuple $\tup_{i, j}$ is annotated with a random variable $\randWorld_{i, j} \in \{0, 1\}$ denoting its presence in a possible world $\db$.  The probability distribution $\pd$ over $\dbbase$ is the one induced from individual tuple probabilities $\prob_{i, j}\in \vct{\prob}=\inparen{\prob_{1, 1},\ldots,\prob_{\abs{\block},\ldots,\abs{\block_{\abs{\block}}}}}$ and the conditions on the blocks.  A \abbrTIDB is a \abbrBIDB where each block has size exactly $1$.

-Instead of looking only at the possible worlds of $\pdb$, one can consider all worlds, including those that cannot exist due to disjointness.  The all worlds set can be modeled by $\vct{\randWorld}\in \{0, 1\}^\numvar$,\footnote{Here and later on in the paper, especially in \Cref{sec:algo}, we will overload notation and rename the variables as $X_1,\dots,X_n$, where $n=\sum_{i=1}^\ell \abs{b_i}$.} such that $\randWorld_k \in \vct{\randWorld}$ represents the presence of $\tup_{i, j}$ (where $k = \sum_{\ell = 1}^{i - 1} \abs{b_\ell} + j$).  We denote a probability distribution over all $\vct{\randWorld} \in \{0, 1\}^\numvar$ as $\pdassign$.  When $\pdassign$ is the one induced from each $\prob_{i, j}$ while assigning $\probOf\pbox{\vct{\randWorld}} = 0$ for any $\vct{\randWorld}$ with $\randWorld_{i, j} = \randWorld_{i, k} = 1$ for any block $i$ and $j\neq k$, we end up with a bijective mapping from $\pd$ to $\pdassign$, such that each mapping is equivalent, implying the distributions are equivalent.  
+Instead of looking only at the possible worlds of $\pdb$, one can consider all worlds, including those that cannot exist due to disjointness.  The all worlds set can be modeled by $\vct{\randWorld}\in \{0, 1\}^\numvar$,\footnote{Here and later on in the paper, especially in \Cref{sec:algo}, we will overload notation and rename the variables as $X_1,\dots,X_n$, where $n=\sum_{i=1}^\ell \abs{b_i}$.} such that $\randWorld_k \in \vct{\randWorld}$ represents the presence of $\tup_{i, j}$ (where $k = \sum_{\ell = 1}^{i - 1} \abs{b_\ell} + j$).  We denote a probability distribution over all $\vct{\randWorld} \in \{0, 1\}^\numvar$ as $\pdassign$.  When $\pdassign$ is the one induced from each $\prob_{i, j}$ while assigning $\probOf\pbox{\vct{\randWorld}} = 0$ for any $\vct{\randWorld}$ with $\randWorld_{i, j} = \randWorld_{i, k} = 1$ for any block $i$ and $j\neq k$, we end up with a bijective mapping from $\pd$ to $\pdassign$, such that each mapping is equivalent, implying the distributions are equivalent.
 %that $\forall i \in \abs{\block}, \forall j\neq k \in [\block_i] \suchthat \db\inparen{\tup_{i, j}} = 0 \vee \db\inparen{\tup_{i, k} = 0}$.In other words, each random variable corresponds to the event of a single tuple's presence.
 %A \emph{\ti} is a \bi where each block contains exactly one tuple.
 \Cref{subsec:supp-mat-ti-bi-def} explains \abbrTIDB\xplural and \abbrBIDB\xplural in greater detail.