paper-BagRelationalPDBsAreHard/ra-to-poly.tex

%root: main.tex
%!TEX root=./main.tex
\onecolumn
\section{Query translation into polynomials}
%\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}
%1) RA notation
%2) DB (TIDB) notation
%3) How queries translate into polynomials
%}
\subsection{Introduction}


An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world.  Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$.  For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$, 
\OK{It seems like you're using separate notation for $\wSet$ and $\idb$ to allow yourself to ``cheat'' below and redefine $\idb = (\wSet, \pd)$.  I would suggest that you pick one symbol to represent the set and use it consistently throughout this section.}
 define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$.  When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$.  
%Below may possibly need to be used again...we'll see.
%probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. \AR{I'm not sure why you are using the notation $\mathcal{A}$ and $P$, which you do not seem to use beyond this section. I would recommend that you only introduce a notation if you plan to use them later on.} Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$. \AR{If you want to use $(\wSet,P)$ make sure you use the same notation in Sec 1.3 as well. If not, then use the notation from Sec 1.3 here}
\OK{It's also common to define possible worlds semantics here as well. e.g., $Q(\Omega)= \{ Q(D) | D \in \Omega \}$ }

\subsection{Modeling and Semantics}
Define $\vct{X}$ to be the variables $X_1,\dots,X_M$.  Let the set of tuples in an arbitrary $\db$ be $\tset$.
\OK{In papers written at this level of abstraction, it's conventional to use $\db$ as the set of tuples (no need for a separate $\tset$). (the alternative, and unnecessary here, convention is that a database is a set of relations)}
Further define $\nxdb$ as an $\mathbb{N}[\vct{X}]$ database, i.e., an incomplete/probabilistic database model where each tuple $\tup \in \tset$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later.  

\OK{Suggest holding off on the definition of $\nxdb$ until you define $\mathbb{N}[\vct{X}]$-databases in the subsection below.}

\AH{The following \cref{subsubsec:k-rel} is a rough draft to convey a high level, superficial view of the K-relational database framework, specifically in the setting of $\mathbb{N}[\vct{X}]$-relation.  Definitely needs some tweaking...any advice is much appreciated.}


\subsubsection{K-relations}\label{subsubsec:k-rel}
A K-relation~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are each annotated with an expression whose values come from its respective commutative K-semiring, denoted $\{K, \oplus, \otimes, \mathbbold{0}, \mathbbold{1}\}$.  A commutative $K$-semiring has associative and commutative operators $\oplus$ and $\otimes$, with $\otimes$ distributing over $\oplus$, $\mathbbold{0}$ the identity of $\oplus$, $\mathbbold{1}$ likewise of $\otimes$, and element $\mathbbold{0}$ anihilates all elements of $K$ when being combined with $\otimes$.  The information encoded in the annotation depends on the underlying semiring of the relation.  
As noted in \cite{DBLP:conf/pods/GreenKT07}, the $\mathbb{N}[\vct{X}]$-semiring produces polynomial values, whose variables can then be substituted with $K$-values from other semirings, evaluating the operators with the operators of the substituted semiring, to produce varying semantics such as set, bag, and security annotations.  
\OK{The first occurrence of ``produces'' in this sentence is the wrong word.  $\mathbb{N}[\vct{X}]$ is the set of all polynomials.  There is a semiring defined over this set.}
Note that $\mathbb{N}[\vct{X}]$ databases are effectively C-tables, since all first order formulas can be lifted to polynomials, where disjunction is equivalent to the addition operator and conjunction is equivalent to the multiplication operator, and in boolean semantics, negation of variable $x$ can be easily translated into $(1 - x)$.
\OK{lifting is not the right word here.  Suggest "When used with $\mathbb B$-typed variables, an N[X] relation is effectively a C-Table."}
  This would correspond to substituting values and operators from the $\{\mathbb{B}, \vee, \wedge, \bot, \top\}$ semiring.

%A nice alternative perspective
%Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$.

Since $\nxdb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tset \mapsto \mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$.
\OK{Limiting the left hand side to only the tuples in D is insufficient, as queries may produce new tuples that were not in the original database.  Perhaps redefine $\tset$ as the set of all tuples?}
It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations.  Since $\nxdb$ is an $\mathbb{N}[\vct{X}]$ database,recall then that we are working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$. 
\OK{This last sentence is largely repeating the last sentence of the prior paragraph.}

The evalution semantics notation $\llbracket \cdot \rrbracket = x$ simply mean that the result of evaluating expression $\cdot$ is given by following the semantics $x$.  Given a query $\query$, operations in $\query$ are translated into the following polynomial operations.

\begin{align*}
&\eval{\project_A(\rel)}(\tup) = &&\sum_{\tup': \project_A(\tup) = \tup} \eval{\rel}(\tup')\\
&\eval{(\rel_1 \union \rel_2)}(\tup) = &&\eval{\rel_1}(\tup) + \eval{\rel_2}(\tup)\\
&\eval{(\rel_1 \join \rel_2)}(\tup) = &&\eval{\rel_1}(\project_{\sch(\rel_1)}(\tup)) \times \eval{\rel_2}(\project_{\sch(\rel_2)}(\tup))	\\
&\eval{\select_\theta(\rel)}(\tup) = &&\begin{cases}
					\eval{\rel}(\tup)	&\text{if }\theta(\tup) = 1\\
					0		&\text{otherwise}.
				\end{cases}\\
&\eval{R}(\tup) = &&\rel(\tup)
\end{align*}

Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.
\OK{Translated isn't the word I'd use here.  These semantics show how to obtain the annotation on a tuple in the result of the query from the annotations on tuples in the input to the query.}


\subsection{Defining the Data}

In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world.  For example, consider the case of the  Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability $\prob_\tup$.  Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus  $\numTup = M$, and each $\vct{w} \in \{0, 1\}^M$ is indeed a possible world.  However in the Block Independent Disjoint data model, because of the disjoint condition on tuples within the same block, it is not the general case that every element $\vct{w} \in \{0, 1\}^M$ is in fact a possible world.  
\OK{This may be nitpicking, but I don't see how this follows.  Is the implication that a BIDB may not give you exactly $2^M$ possible worlds?  If so, say that, and be clear that you can assign make the probability of some $\vct{w}$s to 0.}
Denote a random world (according to distribution $P$) to be $\rw$.  Provided that for any non-possible world $\vct{w} \in \{0, 1\}^M, \pd[\rw = \vct{w}] = 0$, then, a probability distribution over $\{0, 1\}^M$ implies a distribution over $\Omega$, which we have already defined as $\pd$.  
\OK{Denote a random variable selecting a world according to...?}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This could be a way to think of world binary vectors in the general case
%Let $\vct{w}$ be a $\left\lceil\log_2\left(\left|\wSet\right|\right)\right\rceil = \numTup$ binary bit vector, uniquely identifying possible world $\db_i \in \idb$. 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Assume a domain of $\{0, 1\}$ for each $X_i \in \vct{X}$.  Since, from this point on, our discussion will involve one polynomial for an arbirtrary $\tup$, we thus abuse notation by using $\poly(\vct{X})$ to be the annotated polynomial $\llbracket\poly(\idb)\rrbracket(\tup)$.  

One of the aggregates we desire to compute over the annotated polynomial is the expectation, denoted,

\AH{With our notation, I no longer think that $\vct{w} \sim \pd$ is necessary footer for $\expct$.  We can probably just have $\expct\limits_{\vct{w}}$ instead.  Do you agree?}
\AR{No. How would you state Lemma 4 without explicitly using $P$ in the definition of expectation?}

\[\expct_{\vct{\rw} \sim \pd}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \pd[\rw = \vct{w}].\]

The $\ti$ model has features that we can exploit.  Since the powerset of $[\numTup]$ is exactly $\wSet$, the bit-string world value $\vct{w}$ can be used as indexing to determine which tuples are present in the $\vct{w}$ world.  Given an $\numTup$-sized vector $\vct{p}$, where the $i^{th}$ element, $\prob_i$ is the probability of the $i^{th}$ tuple, denote the vector $\vct{p}$ according to the probability distributation $\pd$ as $\pd^{(\vct{p})}$.  We can then write an equivalent expectation for $\ti$ model,

\[\expct_{\rw\sim \pd^{(\vct{p})}}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]


\OK{isn't this restating the first paragraph?  Suggest maybe simplifying the paragraph from a goal-oriented perspective.  e.g., "We will use the binary value $\vec{w}$ to identify possible worlds.  If there are exactly $2^M$ possible worlds (e.g., as in a TIDB)... " (sidenote: binary value suggests exactly 2 possible values)}
Started texing poly reformation write up. 2020-06-12 11:45:15 -04:00			`%root: main.tex`
Oliver's notes 2020-06-26 17:27:52 -04:00			`%!TEX root=./main.tex`
Changed to one column. 2020-07-14 11:45:57 -04:00			`\onecolumn`
Started texing poly reformation write up. 2020-06-12 11:45:15 -04:00			`\section{Query translation into polynomials}`
Latest Version 2020-06-26 12:59:24 -04:00			`%\AH{This section will involve the set of queries (RA+) that we are interested in, the probabilistic/incomplete models we address, and the outer aggregate functions we perform over the output \textit{annotation}`
			`%1) RA notation`
			`%2) DB (TIDB) notation`
			`%3) How queries translate into polynomials`
			`%}`
Started rewriting section 1.2 2020-07-03 11:45:43 -04:00			`\subsection{Introduction}`
Started with my pass on Sec 1 2020-07-02 16:15:35 -04:00
Started rewriting section 1.2 2020-07-03 11:45:43 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			`An incomplete database $\idb$ is a set of deterministic databases $\db_i$ where each element is known as a possible world. Since $\idb$ is modeling all the possible worlds of an uncertain database, it follows that each $\db_i \in \idb$ has the same named set of relations, $\{\rel_1,\ldots, \rel_n\}$ (albeit not equivalent across all instances), whose schemas $(\sch(\rel_i))$ are unchanging across each $\db_j$. For the set of possible worlds, $\wSet$, i.e. the set of all $\db_i \in \idb$,`
			\OK{It seems like you're using separate notation for $\wSet$ and $\idb$ to allow yourself to ``cheat'' below and redefine $\idb = (\wSet, \pd)$. I would suggest that you pick one symbol to represent the set and use it consistently throughout this section.}
			`define an injective mapping to the set $\{0, 1\}^M$, where for each vector $\vct{w} \in \{0, 1\}^M$ there is at most one element $\db_i \in \idb$ mapped to $\vct{w}$. When $\idb$ is a probabilistic database, $\idb$ can be viewed as a two tuple $(\wSet, \pd)$, where $\wSet$ as noted, is the set of possible worlds, and $\pd$ is the probability distribution over $\wSet$.`
Started rewriting section 1.2 2020-07-03 11:45:43 -04:00			`%Below may possibly need to be used again...we'll see.`
			%probability space $\left(\Omega, \mathcal{A}, P\right)$ over that set. \AR{I'm not sure why you are using the notation $\mathcal{A}$ and $P$, which you do not seem to use beyond this section. I would recommend that you only introduce a notation if you plan to use them later on.} Since the set of possible outcomes is the set of possible worlds, $\wSet$, and the set of outcomes is equivalent to the set of events, we will simplify notation and use $\left(\wSet, P\right)$ to denote the probability space of $\idb$. \AR{If you want to use $(\wSet,P)$ make sure you use the same notation in Sec 1.3 as well. If not, then use the notation from Sec 1.3 here}
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{It's also common to define possible worlds semantics here as well. e.g., $Q(\Omega)= \{ Q(D) \| D \in \Omega \}$ }`
NOT done with pass yet. Am in middle of Sec 1.2. Will finish my pass later tonight 2020-07-02 16:23:46 -04:00
Added probability notation to notation section 2020-07-02 12:06:59 -04:00			`\subsection{Modeling and Semantics}`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`Define $\vct{X}$ to be the variables $X_1,\dots,X_M$. Let the set of tuples in an arbitrary $\db$ be $\tset$.`
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{In papers written at this level of abstraction, it's conventional to use $\db$ as the set of tuples (no need for a separate $\tset$). (the alternative, and unnecessary here, convention is that a database is a set of relations)}`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`Further define $\nxdb$ as an $\mathbb{N}[\vct{X}]$ database, i.e., an incomplete/probabilistic database model where each tuple $\tup \in \tset$ is annotated with a polynomial over variables $X_1,\ldots, X_M$ for some value of $M$ that will be specified later.`
Started rewriting section 1.2 2020-07-03 11:45:43 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{Suggest holding off on the definition of $\nxdb$ until you define $\mathbb{N}[\vct{X}]$-databases in the subsection below.}`
More work on background/notational/translation section 2020-06-30 15:31:06 -04:00
Minor changes after 071020 meeting. 2020-07-10 13:50:13 -04:00			`\AH{The following \cref{subsubsec:k-rel} is a rough draft to convey a high level, superficial view of the K-relational database framework, specifically in the setting of $\mathbb{N}[\vct{X}]$-relation. Definitely needs some tweaking...any advice is much appreciated.}`
Done with pass on Sec 1.2 2020-07-02 16:58:19 -04:00
A few comments. 2020-07-16 21:41:43 -04:00
Minor changes after 071020 meeting. 2020-07-10 13:50:13 -04:00			`\subsubsection{K-relations}\label{subsubsec:k-rel}`
A few comments. 2020-07-16 21:41:43 -04:00			A K-relation~\cite{DBLP:conf/pods/GreenKT07} is a relation whose tuples are each annotated with an expression whose values come from its respective commutative K-semiring, denoted $\{K, \oplus, \otimes, \mathbbold{0}, \mathbbold{1}\}$. A commutative $K$-semiring has associative and commutative operators $\oplus$ and $\otimes$, with $\otimes$ distributing over $\oplus$, $\mathbbold{0}$ the identity of $\oplus$, $\mathbbold{1}$ likewise of $\otimes$, and element $\mathbbold{0}$ anihilates all elements of $K$ when being combined with $\otimes$. The information encoded in the annotation depends on the underlying semiring of the relation.
			`As noted in \cite{DBLP:conf/pods/GreenKT07}, the $\mathbb{N}[\vct{X}]$-semiring produces polynomial values, whose variables can then be substituted with $K$-values from other semirings, evaluating the operators with the operators of the substituted semiring, to produce varying semantics such as set, bag, and security annotations.`
			\OK{The first occurrence of ``produces'' in this sentence is the wrong word. $\mathbb{N}[\vct{X}]$ is the set of all polynomials. There is a semiring defined over this set.}
			`Note that $\mathbb{N}[\vct{X}]$ databases are effectively C-tables, since all first order formulas can be lifted to polynomials, where disjunction is equivalent to the addition operator and conjunction is equivalent to the multiplication operator, and in boolean semantics, negation of variable $x$ can be easily translated into $(1 - x)$.`
			`\OK{lifting is not the right word here. Suggest "When used with $\mathbb B$-typed variables, an N[X] relation is effectively a C-Table."}`
			`This would correspond to substituting values and operators from the $\{\mathbb{B}, \vee, \wedge, \bot, \top\}$ semiring.`
More changes for the translation/notation/background section 2020-06-30 20:08:32 -04:00
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`%A nice alternative perspective`
			`%Intuitively, one can think of $\idb$ as a parameterized database, whose abstract form maps to each deterministic $\db_i \in \idb$.`
Oliver's notes 2020-06-26 17:27:52 -04:00
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`Since $\nxdb$ is a database that maps tuples to polynomials, it is customary for arbitrary table $\rel$ to be viewed as a function $\rel: \tset \mapsto \mathbb{N}[\vct{X}]$, where $\rel(\tup)$ denotes the polynomial mapped to tuple $\tup$.`
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{Limiting the left hand side to only the tuples in D is insufficient, as queries may produce new tuples that were not in the original database. Perhaps redefine $\tset$ as the set of all tuples?}`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`It has been shown in previous work that commutative semirings precisely model translations of RA+ query operations to set annotations. Since $\nxdb$ is an $\mathbb{N}[\vct{X}]$ database,recall then that we are working with the commutative semiring $\{\mathbb{N}[\vct{X}], +, \times, 0, 1\}$.`
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{This last sentence is largely repeating the last sentence of the prior paragraph.}`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00
			`The evalution semantics notation $\llbracket \cdot \rrbracket = x$ simply mean that the result of evaluating expression $\cdot$ is given by following the semantics $x$. Given a query $\query$, operations in $\query$ are translated into the following polynomial operations.`
Started translation, notation section 2020-06-23 15:49:19 -04:00
			`\begin{align*}`
Minor changes after 071020 meeting. 2020-07-10 13:50:13 -04:00			`&\eval{\project_A(\rel)}(\tup) = &&\sum_{\tup': \project_A(\tup) = \tup} \eval{\rel}(\tup')\\`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`&\eval{(\rel_1 \union \rel_2)}(\tup) = &&\eval{\rel_1}(\tup) + \eval{\rel_2}(\tup)\\`
Minor changes after 071020 meeting. 2020-07-10 13:50:13 -04:00			`&\eval{(\rel_1 \join \rel_2)}(\tup) = &&\eval{\rel_1}(\project_{\sch(\rel_1)}(\tup)) \times \eval{\rel_2}(\project_{\sch(\rel_2)}(\tup)) \\`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`&\eval{\select_\theta(\rel)}(\tup) = &&\begin{cases}`
Minor changes after 071020 meeting. 2020-07-10 13:50:13 -04:00			`\eval{\rel}(\tup) &\text{if }\theta(\tup) = 1\\`
Started translation, notation section 2020-06-23 15:49:19 -04:00			`0 &\text{otherwise}.`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			`\end{cases}\\`
			`&\eval{R}(\tup) = &&\rel(\tup)`
Started translation, notation section 2020-06-23 15:49:19 -04:00			`\end{align*}`
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			`Query operations are translated into one of the two semiring operators, with $\project$ and $\union$ of agreeing tuples being the equivalent of the '+' opertator in polynomial $\poly$, $\join$ translating into the $\times$ operator, and finally, $\select$ is modeled as a function that returns either $\rel(\tup)$ or $0$ based on some predicate.`
			`\OK{Translated isn't the word I'd use here. These semantics show how to obtain the annotation on a tuple in the result of the query from the annotations on tuples in the input to the query.}`
Modeling and Semantics Section redone using evaluation expression notation 2020-07-07 15:37:18 -04:00
Started translation, notation section 2020-06-23 15:49:19 -04:00
Added probability notation to notation section 2020-07-02 12:06:59 -04:00			`\subsection{Defining the Data}`
More work on background/notational/translation section 2020-06-30 15:31:06 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			In the general case, the binary value of $\vct{w}$ uniquely identifies a potential possible world. For example, consider the case of the Tuple Independent Database $(\ti)$ data model in which each table is a set of tuples, each of which are independent of one another, and individually occur with a specific probability $\prob_\tup$. Because of independence, a $\ti$ with $\numTup$ tuples naturally has $2^\numTup$ possible worlds, thus $\numTup = M$, and each $\vct{w} \in \{0, 1\}^M$ is indeed a possible world. However in the Block Independent Disjoint data model, because of the disjoint condition on tuples within the same block, it is not the general case that every element $\vct{w} \in \{0, 1\}^M$ is in fact a possible world.
			`\OK{This may be nitpicking, but I don't see how this follows. Is the implication that a BIDB may not give you exactly $2^M$ possible worlds? If so, say that, and be clear that you can assign make the probability of some $\vct{w}$s to 0.}`
			`Denote a random world (according to distribution $P$) to be $\rw$. Provided that for any non-possible world $\vct{w} \in \{0, 1\}^M, \pd[\rw = \vct{w}] = 0$, then, a probability distribution over $\{0, 1\}^M$ implies a distribution over $\Omega$, which we have already defined as $\pd$.`
			`\OK{Denote a random variable selecting a world according to...?}`
RA to poly translation; corrections 062320 2020-06-23 19:33:28 -04:00
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
			`%This could be a way to think of world binary vectors in the general case`
			`%Let $\vct{w}$ be a $\left\lceil\log_2\left(\left\|\wSet\right\|\right)\right\rceil = \numTup$ binary bit vector, uniquely identifying possible world $\db_i \in \idb$.`
			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
RA to poly translation; corrections 062320 2020-06-23 19:33:28 -04:00
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			`Assume a domain of $\{0, 1\}$ for each $X_i \in \vct{X}$. Since, from this point on, our discussion will involve one polynomial for an arbirtrary $\tup$, we thus abuse notation by using $\poly(\vct{X})$ to be the annotated polynomial $\llbracket\poly(\idb)\rrbracket(\tup)$.`
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00
			`One of the aggregates we desire to compute over the annotated polynomial is the expectation, denoted,`

			`\AH{With our notation, I no longer think that $\vct{w} \sim \pd$ is necessary footer for $\expct$. We can probably just have $\expct\limits_{\vct{w}}$ instead. Do you agree?}`
Done with pass on Sec 1 2020-07-09 00:33:02 -04:00			`\AR{No. How would you state Lemma 4 without explicitly using $P$ in the definition of expectation?}`
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00
Minor corrects+ new comments 2020-07-09 15:59:57 -04:00			`\[\expct_{\vct{\rw} \sim \pd}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\cdot \pd[\rw = \vct{w}].\]`
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00
Changes to modeling, data define sections per Atri's comments 070920 2020-07-09 15:26:27 -04:00			The $\ti$ model has features that we can exploit. Since the powerset of $[\numTup]$ is exactly $\wSet$, the bit-string world value $\vct{w}$ can be used as indexing to determine which tuples are present in the $\vct{w}$ world. Given an $\numTup$-sized vector $\vct{p}$, where the $i^{th}$ element, $\prob_i$ is the probability of the $i^{th}$ tuple, denote the vector $\vct{p}$ according to the probability distributation $\pd$ as $\pd^{(\vct{p})}$. We can then write an equivalent expectation for $\ti$ model,
Rewrote data defintion based on 070320 discussion. 2020-07-08 13:08:35 -04:00
Minor corrects+ new comments 2020-07-09 15:59:57 -04:00			`\[\expct_{\rw\sim \pd^{(\vct{p})}}\pbox{\poly(\rw)} = \sum\limits_{\wVec \in \{0, 1\}^\numTup} \poly(\wVec)\prod_{\substack{i \in [\numTup]\\ s.t. \wElem_i = 1}}\prob_i \prod_{\substack{i \in [\numTup]\\s.t. w_i = 0}}\left(1 - \prob_i\right).\]`
RA to poly translation; corrections 062320 2020-06-23 19:33:28 -04:00
Made pass on Sec 1 2020-06-23 09:57:35 -04:00
A few comments. 2020-07-16 21:41:43 -04:00			`\OK{isn't this restating the first paragraph? Suggest maybe simplifying the paragraph from a goal-oriented perspective. e.g., "We will use the binary value $\vec{w}$ to identify possible worlds. If there are exactly $2^M$ possible worlds (e.g., as in a TIDB)... " (sidenote: binary value suggests exactly 2 possible values)}`