paper-BagRelationalPDBsAreHard/Sketching Worlds/experiments.tex

% root: main.tex


Recall that by definition of $\abbrBIDB$, a query result cannot be derived by a self-join between non-identical tuples belonging to the same block.  Note, that by \Cref{cor:approx-algo-const-p}, $\gamma$ must be a constant in order for \Cref{alg:mon-sam} to acheive linear time.  We would like to determine experimentally whether queries over $\abbrBIDB$ instances in practice generate a constant number of cancellations or not.  Such an experiment would ideally use a database instance with queries both considered to be typical representations of what is seen in practice.

We ran our experiments using Windows 10 WSL Operating System with an Intel Core i7 2.40GHz processor and 16GB RAM.  All experiments used the PostgreSQL 13.0 database system.

For the data we used the MayBMS data generator~\cite{pdbench} tool to randomly generate uncertain versions of TPCH tables.  The queries computed over the database instance are $\query_1$, $\query_2$, and $\query_3$ from~\cite{Antova_fastand}, all of which are modified versions of TPC-H queries $\query_3$, $\query_6$, and $\query_7$ where all aggregations have been dropped.

As written, the queries disallow $\abbrBIDB$ cross terms.  We first ran all queries, noting the result size for each.  Next the queries were rewritten so as not to filter out the cross terms.  The comparison of the sizes of both result sets should then suggest in one way or another whether or not there exist many cross terms in practice.  As seen, the experimental query results contain little to no cancelling terms.  \Cref{fig:experiment-bidb-cancel} shows the result sizes of the queries, where column CF is the result size when all cross terms are filtered out, column CI shows the number of output tuples when the cancelled tuples are included in the result,  and the last column is the value of $\gamma$.  The experiments show $\gamma$ to be in a range between $[0, 0.1]\%$, indicating that only a negligible or constant (compare the result sizes of $\query_1 < \query_2$ and their respective $\gamma$ values) amount of tuples are cancelled in practice when running queries over a typical \abbrBIDB instance.  Interestingly, only one of the three queries had tuples that violated the \abbrBIDB constraint.

To conclude, the results in \Cref{fig:experiment-bidb-cancel} show experimentally that $\gamma$ is negligible in practice for BIDB queries.  We also observe that (i) tuple presence is independent across blocks, so the corresponding probabilities (and hence $\prob_0$) are independent of the number of blocks, and (ii) \bis model uncertain attributes, so block size (and hence $\gamma$) is a function of the ``messiness'' of a dataset, rather than its size.
Thus, we expect \Cref{cor:approx-algo-const-p} to hold in general.

\begin{figure}[ht]
		\begin{tabular}{ c | c c c}\label{tbl:cancel}
			Query & CF & CI & $\gamma$\\
			\hline
			 $\query_1$ & $46,714$ & $46,768$ & $0.1\%$\\
			 $\query_2$ & $179.917$ & $179,917$ & $0\%$\\
			 $\query_3$ & $11,535$ & $11,535$ & $0\%$\\
		\end{tabular}
	\caption{Number of Cancellations for Queries Over $\abbrBIDB$.}
	\label{fig:experiment-bidb-cancel}
\end{figure}