Finished pass on Appendix D, F, G.

2021-09-20 16:44:12 -04:00 · 2021-09-20 16:44:12 -04:00 · 9141644476
parent ee6eb1ac2c
commit 9141644476
5 changed files with 26 additions and 26 deletions
--- a/app_samp-monom-analysis.tex
+++ b/app_samp-monom-analysis.tex
@ -3,8 +3,8 @@

 \subsection{\sampmon Remarks}\label{subsec:sampmon-remarks}
 \input{app_sample-monomial-pseudo-code}
-We briefly describe the top-down traversal of \sampmon.  For a parent $+$ gate, the input to be visited is sampled from the weighted distribution precomputed by \onepass.
-When a parent $\times$ node is visited, both inputs are visited.
+We briefly describe the top-down traversal of \sampmon.  When \circuit.\type $= +$, the input to be visited is sampled from the weighted distribution precomputed by \onepass.
+When a \circuit.\type$= \times$ node is visited, both inputs are visited.
 The algorithm computes two properties: the set of all variable leaf nodes visited, and the product of the signs of visited coefficient leaf nodes.
 %
 We will assume the TreeSet data structure to maintain sets with logarithmic time insertion and linear time traversal of its elements.
@ -25,15 +25,15 @@ For the inductive step, let us take a circuit $\circuit$ with $d = k + 1$.  Note

 We will next prove by induction on the depth $d$ of $\circuit$ that for $(\monom,\coef) \in \expansion{\circuit}$, $\monom$ is sampled with a probability $\frac{|\coef|}{\abs{\circuit}\polyinput{1}{1}}$.

-For the base case $d = 0$, by definition~\ref{def:circuit} we know that the \circuit consists of a single gate that has to be either a coefficient or a variable.  For either case, the probability of the value returned is $1$ since there is only one value to sample from.  When \circuit.\val $= x$, the algorithm always return the variable set $\{x\}$.  When $\circuit.\type = \tnum$, \sampmon will always return $\emptyset$.
+For the base case $d = 0$, by definition~\ref{def:circuit} we know that the $\size\inparen{\circuit} = 1$ and \circuit.\type$=$ \tnum or \var.  For either case, the probability of the value returned is $1$ since there is only one value to sample from.  When \circuit.\val $= x$, the algorithm always return the variable set $\{x\}$.  When $\circuit.\type = \tnum$, \sampmon will always return $\emptyset$.

 For the inductive hypothesis, assume that for $d \leq k$ and $k \geq 0$ $\sampmon$ indeed returns $\monom$ in $(\monom, \coef)$ of $\expansion{\circuit}$ with probability $\frac{|\coef|}{\abs{\circuit}\polyinput{1}{1}}$.%bove is true.%lemma~\ref{lem:sample} is true.

 We prove now for $d = k + 1$ the inductive step holds.  It is the case that the sink of $\circuit$ has two inputs $\circuit_\linput$ and $\circuit_\rinput$.  Since $\circuit_\linput$ and $\circuit_\rinput$ are both depth $d - 1 \leq k$, by inductive hypothesis, $\sampmon$ will return $\monom_\linput$ in $(\monom_\lchild, \coef_\lchild)$ of $\expansion{\circuit_\linput}$ and $\monom_\rinput$ in $(\monom_\rchild, \coef_\rchild)$ of $\expansion{\circuit_\rinput}$, from $\circuit_\linput$ and $\circuit_\rinput$ with probability $\frac{|\coef_\lchild|}{\abs{\circuit_\linput}\polyinput{1}{1}}$ and $\frac{|\coef_\rchild|}{\abs{\circuit_\rinput}\polyinput{1}{1}}$.

-Consider the case when $\circuit.\type = \circmult$.  Note that we are sampling a term $(\monom, \coef)$ from $\expansion{\circuit}$.  It is the case that $\monom = \monom_\lchild \cup \monom_\rchild$, where $\monom_\lchild$ is coming from $\circuit_\linput$ and $\monom_\rchild$ from $\circuit_\rinput$.  The probability that \sampmon$(\circuit_{\lchild})$ returns $\monom_\lchild$ is $\frac{|\coef_{\monom_\lchild}|}{|\circuit_\linput|(1,\ldots, 1)}$ and $\frac{|\coef_{\monom_\rchild}|}{\abs{\circuit_\rinput}\polyinput{1}{1}}$ for $\monom_\rchild$.  Since both $\monom_\lchild$ and $\monom_\rchild$ are sampled with independent randomness, the final probability for sample $\monom$ is then $\frac{|\coef_{\monom_\lchild}| \cdot |\coef_{\monom_\rchild}|}{|\circuit_\linput|(1,\ldots, 1) \cdot |\circuit_\rinput|(1,\ldots, 1)}$.  For $(\monom, \coef)$ in $\expansion{\circuit}$, by \cref{def:expand-circuit} it is indeed the case that $|\coef| = |\coef_{\monom_\lchild}| \cdot |\coef_{\monom_\rchild}|$ and that (as shown in \cref{eq:T-all-ones}) $\abs{\circuit}(1,\ldots, 1) = |\circuit_\linput|(1,\ldots, 1) \cdot |\circuit_\rinput|(1,\ldots, 1)$, and therefore $\monom$ is sampled with correct probability $\frac{|\coef|}{\abs{\circuit}(1,\ldots, 1)}$.
+Consider the case when $\circuit.\type = \circmult$.  For the term $(\monom, \coef)$ from $\expansion{\circuit}$ that is being sampled it is the case that $\monom = \monom_\lchild \cup \monom_\rchild$, where $\monom_\lchild$ is coming from $\circuit_\linput$ and $\monom_\rchild$ from $\circuit_\rinput$.  The probability that \sampmon$(\circuit_{\lchild})$ returns $\monom_\lchild$ is $\frac{|\coef_{\monom_\lchild}|}{|\circuit_\linput|(1,\ldots, 1)}$ and $\frac{|\coef_{\monom_\rchild}|}{\abs{\circuit_\rinput}\polyinput{1}{1}}$ for $\monom_\rchild$.  Since both $\monom_\lchild$ and $\monom_\rchild$ are sampled with independent randomness, the final probability for sample $\monom$ is then $\frac{|\coef_{\monom_\lchild}| \cdot |\coef_{\monom_\rchild}|}{|\circuit_\linput|(1,\ldots, 1) \cdot |\circuit_\rinput|(1,\ldots, 1)}$.  For $(\monom, \coef)$ in $\expansion{\circuit}$, by \cref{def:expand-circuit} it is indeed the case that $|\coef| = |\coef_{\monom_\lchild}| \cdot |\coef_{\monom_\rchild}|$ and that (as shown in \cref{eq:T-all-ones}) $\abs{\circuit}(1,\ldots, 1) = |\circuit_\linput|(1,\ldots, 1) \cdot |\circuit_\rinput|(1,\ldots, 1)$, and therefore $\monom$ is sampled with correct probability $\frac{|\coef|}{\abs{\circuit}(1,\ldots, 1)}$.

-For the case when $\circuit.\type = \circplus$, \sampmon ~will sample $\monom$ from one of its inputs.  By inductive hypothesis we know that any $\monom_\lchild$ in $\expansion{\circuit_\linput}$ and any $\monom_\rchild$ in $\expansion{\circuit_\rinput}$ will both be sampled with correct probability $\frac{|\coef_{\monom_\lchild}|}{\circuit_{\lchild}(1,\ldots, 1)}$ and $\frac{|\coef_{\monom_\rchild}|}{|\circuit_\rinput|(1,\ldots, 1)}$, where either $\monom_\lchild$ or $\monom_\rchild$ will equal $\monom$, depending on whether $\circuit_\linput$ or $\circuit_\rinput$ is sampled.  Assume that $\monom$ is sampled from $\circuit_\linput$, and note that a symmetric argument holds for the case when $\monom$ is sampled from $\circuit_\rinput$.  Notice also that the probability of choosing $\circuit_\linput$ from $\circuit$ is $\frac{\abs{\circuit_\linput}\polyinput{1}{1}}{\abs{\circuit_\linput}\polyinput{1}{1} + \abs{\circuit_\rinput}\polyinput{1}{1}}$ as computed by $\onepass$.  Then, since $\sampmon$ goes top-down, and each sampling choice is independent (which follows from the randomness in the root of $\circuit$ being independent from the randomness used in its subtrees), the probability for $\monom$ to be sampled from $\circuit$ is equal to the product of the probability that $\circuit_\linput$ is sampled from $\circuit$ and $\monom$ is sampled in $\circuit_\linput$, and
+For the case when $\circuit.\type = \circplus$, \sampmon ~will sample $\monom$ from one of its inputs.  By inductive hypothesis we know that any $\monom_\lchild$ in $\expansion{\circuit_\linput}$ and any $\monom_\rchild$ in $\expansion{\circuit_\rinput}$ will both be sampled with correct probability $\frac{|\coef_{\monom_\lchild}|}{\abs{\circuit_{\lchild}}(1,\ldots, 1)}$ and $\frac{|\coef_{\monom_\rchild}|}{|\circuit_\rinput|(1,\ldots, 1)}$, where either $\monom_\lchild$ or $\monom_\rchild$ will equal $\monom$, depending on whether $\circuit_\linput$ or $\circuit_\rinput$ is sampled.  Assume that $\monom$ is sampled from $\circuit_\linput$, and note that a symmetric argument holds for the case when $\monom$ is sampled from $\circuit_\rinput$.  Notice also that the probability of choosing $\circuit_\linput$ from $\circuit$ is $\frac{\abs{\circuit_\linput}\polyinput{1}{1}}{\abs{\circuit_\linput}\polyinput{1}{1} + \abs{\circuit_\rinput}\polyinput{1}{1}}$ as computed by $\onepass$.  Then, since $\sampmon$ goes top-down, and each sampling choice is independent (which follows from the randomness in the root of $\circuit$ being independent from the randomness used in its subtrees), the probability for $\monom$ to be sampled from $\circuit$ is equal to the product of the probability that $\circuit_\linput$ is sampled from $\circuit$ and $\monom$ is sampled in $\circuit_\linput$, and
 \begin{align*}
 &\probOf(\sampmon(\circuit) = \monom) = \\
 &\probOf(\sampmon(\circuit_\linput) = \monom) \cdot \probOf(SampledChild(\circuit) = \circuit_\linput)\\
@ -42,7 +42,7 @@ For the case when $\circuit.\type = \circplus$, \sampmon ~will sample $\monom$ f
 \end{align*}
 and we obtain the desired result.

-It is trivial to see that \sampmon returns the correct sign value of $\coef$.  Note that the only time the sign of $\coef$ for a $(\monom, \coef)$ in $\expansion{\circuit}$ may change is when an internal gate $\circuit.\type = \circmult$.  The same behavior is mirrored in \sampmon, where the sign can o The behavior is analogous in \sampmon, and this behavior is agnostic to the particular natural number values that 
+%It is trivial to see that \sampmon returns the correct sign value of $\coef$.  Note that the only time the sign of $\coef$ for a $(\monom, \coef)$ in $\expansion{\circuit}$ may change is when an internal gate $\circuit.\type = \circmult$, which follows the property of standard multiplication of integers (specifically integers in the set $\inset{-1, 1}$).%  The same behavior is mirrored in \sampmon, where the sign can o The behavior is analogous in \sampmon, and this behavior is agnostic to the particular natural number values that 

 Lastly, we show by simple induction of the depth $d$ of \circuit that \sampmon indeed returns the correct sign value of $\coef$ in $(\monom, \coef)$.

@ -52,13 +52,13 @@ In the base case, $\circuit.\type = \tnum$ or $\var$.  For the former, \sampmon

 For the inductive hypothesis, we assume for a circuit of depth $d \leq k$ and $k \geq 0$ that the algorithm correctly returns the sign value of $\coef$.

-Similar to before, for a depth $d \leq k + 1$, it is true that $\circuit_\linput$ and $\circuit_\rinput$ both return the correct sign of $\coef$.  For the case that $\circuit.\type = \circmult$, the sign value of both inputs are multiplied, which is the correct behavior by \cref{def:expand-circuit}.  When $\circuit.\type = \circplus$, only one input of $\circuit$ is sampled, and the algorithm simply passes on its corresponding sign value, which by \cref{def:expand-circuit} is the correct behavior, since all elements in $\expansion{\circuit}$ are products with no sums, and a $\circplus$ operation does not modify any $(\monom, \coef)$ of $\expansion{\circuit}$.
+Similar to before, for a depth $d \leq k + 1$, it is true that $\circuit_\linput$ and $\circuit_\rinput$ both return the correct sign of $\coef$.  For the case that $\circuit.\type = \circmult$, the sign value of both inputs are multiplied, which is the correct behavior by \cref{def:expand-circuit}.  When $\circuit.\type = \circplus$, only one input of $\circuit$ is sampled, and the algorithm  returns the correct sign value of $\coef$ by inductive hyptothesis.%simply passes on its corresponding sign value, which by \cref{def:expand-circuit} is the correct behavior, since all elements in $\expansion{\circuit}$ are products with no sums, and a $\circplus$ operation does not modify any $(\monom, \coef)$ of $\expansion{\circuit}$.


 \paragraph*{Run-time Analysis}
 It is easy to check that except for lines~\ref{alg:sample-plus-bsamp} and~\ref{alg:sample-times-union}, all lines take $O(1)$ time.  Consider an execution of \cref{alg:sample-times-union}. We note that we will be adding a given set of variables to some set at most once: since the sum of the sizes of the sets at a given level is at most $\degree(\circuit)$, each gate visited takes $O(\log{\degree(\circuit)})$.  For \Cref{alg:sample-plus-bsamp}, note that we pick $\circuit_\linput$ with probability $\frac a{a+b}$ where $a=\circuit.\vari{Lweight}$ and $b=\circuit.\vari{Rweight}$. We can implement this step by picking a random number $r\in[a+b]$ and then checking if $r\le a$. It is easy to check that $a+b\le \abs{\circuit}(1,\dots,1)$. This means we need to add and compare $\log{\abs{\circuit}(1,\ldots, 1)}$-bit numbers, which can certainly be done in time $\multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}$ (note that this is an over-estimate).
 % we have $> O(1)$ time when $\abs{\circuit}(1,\ldots, 1) > \size(\circuit)$.  when this is the case that for each sample, we have $\frac{\log{\abs{\circuit}(1,\ldots, 1)}}{\log{\size(\circuit)}}$ operations, since we need to read in and then compare numbers of of $\log{{\abs{\circuit}(1,\ldots, 1)}}$ bits.  
-Denote \cost(\circuit) (\Cref{eq:cost-sampmon}) to be an upper bound of the number of nodes visited by \sampmon.  Then the runtime is $O\left(\cost(\circuit)\cdot \log{\degree(\circuit)}\cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}\right)$.
+Denote \cost(\circuit) (\Cref{eq:cost-sampmon}) to be an upper bound of the number of gates visited by \sampmon.  Then the runtime is $O\left(\cost(\circuit)\cdot \log{\degree(\circuit)}\cdot \multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}\right)$.

 We now bound the number of recursive calls in $\sampmon$ by $O\left((\degree(\circuit) + 1)\right.$$\left.\cdot\right.$ $\left.\depth(\circuit)\right)$, which by the above will prove the claimed runtime.  

@ -82,7 +82,7 @@ We prove the following inequality holds.

 Note that \cref{eq:strict-upper-bound} implies the claimed runtime.  We prove \cref{eq:strict-upper-bound} for the number of gates traversed in \sampmon using induction over $\depth(\circuit)$.  Recall how degree is defined in \cref{def:degree}.

-For the base case $\degree(\circuit) = \depth(\circuit) = 0$, $\cost(\circuit) = 1$, and it is trivial to see that the inequality $2\degree(\circuit) \cdot \depth(\circuit) + 1 \geq \cost(\circuit)$ holds.
+For the base case $\degree(\circuit) = \inset{0, 1}, \depth(\circuit) = 0$, $\cost(\circuit) = 1$, and it is trivial to see that the inequality $2\degree(\circuit) \cdot \depth(\circuit) + 1 \geq \cost(\circuit)$ holds.

 For the inductive hypothesis, we assume the bound holds for any circuit where $\ell \geq \depth(\circuit) \geq 0$.
 Now consider the case when \sampmon has an arbitrary circuit \circuit input with $\depth(\circuit) = \ell + 1$.  By definition \circuit.\type $\in \{\circplus, \circmult\}$. Note that since $\depth(\circuit) \geq 1$, \circuit must have input(s).  Further we know that by the inductive hypothesis the inputs $\circuit_i$ for $i \in \{\linput, \rinput\}$ of the sink gate \circuit uphold the bound
@ -159,7 +159,7 @@ Putting it together we obtain the following for (\ref{eq:plus-middle}):
 &2\degree_{\max}\depth_{\max} + 2\degree_{\max} + 2\depth_{\max} + 3\nonumber\\
 &\qquad \geq 2\degree_{\max}\depth_{\max} + 2\depth_{\max} + 2, \label{eq:plus-upper-bound-final}
 \end{align}
-where it can be readily seen that the inequality stand and (\ref{eq:plus-upper-bound-final}) follows.  This proves (\ref{eq:plus-middle}).
+where it can be readily seen that the inequality stands and (\ref{eq:plus-upper-bound-final}) follows.  This proves (\ref{eq:plus-middle}).

 Similar to the case of $\circuit.\type = \circmult$, (\ref{eq:plus-rhs}) follows by equations $(\ref{eq:cost-sampmon})$ and $(\ref{eq:ih-bound-cost})$.

--- a/app_sample-monomial-pseudo-code.tex
+++ b/app_sample-monomial-pseudo-code.tex
@ -20,9 +20,9 @@
 				\State $\vari{sgn} \gets \vari{sgn} \times \vari{s}$\label{alg:sample-times-product}
 			\EndFor
 			\State $\Return ~(\vari{vars}, \vari{sgn})$
-		\ElsIf{$\circuit.\type = numeric$}\Comment{The leaf is a coefficient}
+		\ElsIf{$\circuit.\type = \tnum$}\Comment{The leaf is a coefficient}
 			%\State $\vari{sgn} \gets \vari{sgn} \times sign(\circuit.\val)$
-			\State $\Return ~\left(\{\}, sign(\circuit.\val)\right)$\label{alg:sample-num-return}
+			\State $\Return ~\left(\{\}, \func{sgn}(\circuit.\val)\right)$\label{alg:sample-num-return}\Comment{$\func{sgn}(\cdot)$ outputs $-1$ for \circuit.\val $\geq 1$ and $-1$ for \circuit.\val $\leq -1$}% $\coef\in\inset{-1, 1}$ corresponding to the sign of \circuit.\val.}
 		\ElsIf{$\circuit.\type = var$}
 			%\State $\vari{vars} \gets \vari{vars} \; \cup \; \{\;\circuit.\val\;\}\label{alg:sample-var-union}$\Comment{Add the variable to the set}
 			\State $\Return~\left(\{\circuit.\val\}, 1\right)	$\label{alg:sample-var-return}
--- a/appendix.tex
+++ b/appendix.tex
@ -324,13 +324,13 @@ Respectively, these are: \\

 %With \Cref{lem:circ-model-runtime,lem:tlc-is-the-same-as-det} and our upper bound results on \approxq, we now have all the pieces to argue that using our approximation algorithm,  the expected multiplicities of an $\raPlus$ query can be computed in essentially the same runtime as deterministic query processing for the same query, proving claim (iv) of the Introduction.

-\section{Proof of \Cref{cor:cost-model}}
-\begin{proof}
-This follows from \Cref{lem:circuits-model-runtime} (\Cref{sec:circuit-runtime}) and \Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that \Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
-\qed
-\end{proof}
+%\section{Proof of \Cref{cor:cost-model}}
+%\begin{proof}
+%This follows from \Cref{lem:circuits-model-runtime} (\Cref{sec:circuit-runtime}) and \Cref{cor:approx-algo-const-p} (where the latter is used with $\delta$ being substituted\footnote{Recall that \Cref{cor:approx-algo-const-p} is stated for a single output tuple so to get the required guarantee for all (at most $n^k$) output tuples of $Q$ we get at most $\frac \delta{n^k}$ probability of failure for each output tuple and then just a union bound over all output tuples. } with $\frac \delta{n^k}$).
+%\qed
+%\end{proof}

-\mypar{Higher Moments}
+\section{Higher Moments}
 %\label{sec:momemts}
 %
 We make a simple observation to conclude the presentation of our results.
--- a/experiments.tex
+++ b/experiments.tex
@ -1,26 +1,26 @@
 % root: main.tex


-Recall that by definition of $\bi$, a query result cannot be derived by a self-join between non-identical tuples belonging to the same block.  Note, that by \Cref{cor:approx-algo-const-p}, $\gamma$ must be a constant in order for \Cref{alg:mon-sam} to acheive linear time.  We would like to determine experimentally whether queries over $\bi$ instances in practice generate a constant number of cancellations or not.  Such an experiment would ideally use a database instance with queries both considered to be typical representations of what is seen in practice. 
+Recall that by definition of $\abbrBIDB$, a query result cannot be derived by a self-join between non-identical tuples belonging to the same block.  Note, that by \Cref{cor:approx-algo-const-p}, $\gamma$ must be a constant in order for \Cref{alg:mon-sam} to acheive linear time.  We would like to determine experimentally whether queries over $\abbrBIDB$ instances in practice generate a constant number of cancellations or not.  Such an experiment would ideally use a database instance with queries both considered to be typical representations of what is seen in practice. 

 We ran our experiments using Windows 10 WSL Operating System with an Intel Core i7 2.40GHz processor and 16GB RAM.  All experiments used the PostgreSQL 13.0 database system.

 For the data we used the MayBMS data generator~\cite{pdbench} tool to randomly generate uncertain versions of TPCH tables.  The queries computed over the database instance are $\query_1$, $\query_2$, and $\query_3$ from~\cite{Antova_fastand}, all of which are modified versions of TPC-H queries $\query_3$, $\query_6$, and $\query_7$ where all aggregations have been dropped.

-As written, the queries disallow $\bi$ cross terms.  We first ran all queries, noting the result size for each.  Next the queries were rewritten so as not to filter out the cross terms.  The comparison of the sizes of both result sets should then suggest in one way or another whether or not there exist many cross terms in practice.  As seen, the experimental query results contain little to no cancelling terms.  \Cref{fig:experiment-bidb-cancel} shows the result sizes of the queries, where column CF is the result size when all cross terms are filtered out, column CI shows the number of output tuples when the cancelled tuples are included in the result,  and the last column is the value of $\gamma$.  The experiments show $\gamma$ to be in a range between $[0, 0.1]\%$, indicating that only a negligible or constant (compare the result sizes of $\query_1 < \query_2$ and their respective $\gamma$ values) amount of tuples are cancelled in practice when running queries over a typical $\bi$ instance.  Interestingly, only one of the three queries had tuples that violated the $\bi$ constraint.
+As written, the queries disallow $\abbrBIDB$ cross terms.  We first ran all queries, noting the result size for each.  Next the queries were rewritten so as not to filter out the cross terms.  The comparison of the sizes of both result sets should then suggest in one way or another whether or not there exist many cross terms in practice.  As seen, the experimental query results contain little to no cancelling terms.  \Cref{fig:experiment-bidb-cancel} shows the result sizes of the queries, where column CF is the result size when all cross terms are filtered out, column CI shows the number of output tuples when the cancelled tuples are included in the result,  and the last column is the value of $\gamma$.  The experiments show $\gamma$ to be in a range between $[0, 0.1]\%$, indicating that only a negligible or constant (compare the result sizes of $\query_1 < \query_2$ and their respective $\gamma$ values) amount of tuples are cancelled in practice when running queries over a typical \abbrBIDB instance.  Interestingly, only one of the three queries had tuples that violated the \abbrBIDB constraint.

 To conclude, the results in \Cref{fig:experiment-bidb-cancel} show experimentally that $\gamma$ is negligible in practice for BIDB queries.  We also observe that (i) tuple presence is independent across blocks, so the corresponding probabilities (and hence $\prob_0$) are independent of the number of blocks, and (ii) \bis model uncertain attributes, so block size (and hence $\gamma$) is a function of the ``messiness'' of a dataset, rather than its size.
-Thus, we expect the corollary to hold in general.
+Thus, we expect \Cref{cor:approx-algo-const-p} to hold in general.

 \begin{figure}[ht]
 		\begin{tabular}{ c | c c c}\label{tbl:cancel}
 			Query & CF & CI & $\gamma$\\
 			\hline
-			 $\poly_1$ & $46,714$ & $46,768$ & $0.1\%$\\
-			 $\poly_2$ & $179.917$ & $179,917$ & $0\%$\\
-			 $\poly_3$ & $11,535$ & $11,535$ & $0\%$\\
+			 $\query_1$ & $46,714$ & $46,768$ & $0.1\%$\\
+			 $\query_2$ & $179.917$ & $179,917$ & $0\%$\\
+			 $\query_3$ & $11,535$ & $11,535$ & $0\%$\\
 		\end{tabular}
-	\caption{Number of Cancellations for Queries Over $\bi$.}
+	\caption{Number of Cancellations for Queries Over $\abbrBIDB$.}
 	\label{fig:experiment-bidb-cancel}
 \end{figure}

--- a/rebuttal.tex
+++ b/rebuttal.tex
@ -9,7 +9,7 @@ We use this section to document the changes that have been made since our prior
 \subsection{Meta Review}
 \RCOMMENT{Problem definition not stated rigorously nor motivated. Discussion needed on the standard PDB approach vs your approach.}
 We rewrote \Cref{sec:intro} to specifically address this concern.  The opening paragraph precisely and formally states the query evaluation problem in \abbrBPDB\xplural.  We use a series of problem statements to clearly define the problem we are addressing as it relates to the query evaluation problem.  
- We  made the concrete problem statements more precise by more clearly formalizing $\qruntime{Q, \dbbase}$ and stating our runtime objectives relative to it \AR{Dangling notation or Cref not being able to figure out `Sections'?}(\Cref{prob:informal,prob:big-o-joint-steps,prob:intro-stmt})
+ We  made the concrete problem statements more precise by more clearly formalizing $\qruntime{Q, \dbbase}$ and stating our runtime objectives relative to it (\Cref{prob:informal},~\ref{prob:big-o-joint-steps},~\ref{prob:intro-stmt}).
 %Notably, explicit discussion of provenance polynomials is limited to the proofs in the appendices.

 We have included a discussion of the standard approach, e.g. see the paragraph \textbf{Relationship to Set-Probabilistic Query Evaluation} on page 4.