Finishing CBO slides

This commit is contained in:
Oliver Kennedy 2018-03-07 00:45:02 -05:00
parent 60a5169c55
commit 889440b39c

View file

@ -777,6 +777,179 @@
</section>
</section>
<section>
<section>
<h3>(Some) Estimation Techniques</h3>
<dl style="font-size: 80%">
<dt style="color: grey;">Guess Randomly</dt>
<dd style="color: grey;">Rules of thumb if you have no other options...</dd>
<dt style="color: grey;">Uniform Prior</dt>
<dd style="color: grey;">Use basic statistics to make a very rough guess.</dd>
<dt style="color: grey;">Sampling / History</dt>
<dd style="color: grey;">Small, Quick Sampling Runs (or prior executions of the query).</dd>
<dt style="color: grey;">Histograms</dt>
<dd style="color: grey;">Using more detailed statistics for improved guesses.</dd>
<dt style="color: blue;">Constraints</dt>
<dd style="color: blue;">Using rules about the data for improved guesses.</dd>
</dl>
</section>
<section>
<h3>Key / Unique Constraints</h3>
<pre style="margin-top: 50px;"><code class="sql">
CREATE TABLE R (
A int,
B int UNIQUE
...
PRIMARY KEY A
);
</code></pre>
<p style="margin-top: 50px;">
No duplicate values in the column.
$$\texttt{COUNT(DISTINCT A)} = \texttt{COUNT(*)}$$
</p>
</section>
<section>
<h3>Foreign Key Constraints</h3>
<pre style="margin-top: 50px;"><code class="sql">
CREATE TABLE S (
B int,
...
FOREIGN KEY B REFERENCES R.B
);
</code></pre>
<p style="margin-top: 50px;">
All values in the column appear in another table.
$$\pi_{attrs(S)}\left(S \bowtie_B R\right) \subseteq S$$
</p>
</section>
<section>
<h3>Functional Dependencies</h3>
<pre style="margin-top: 50px;"><code class="sql">
Not expressible in SQL
</code></pre>
<p style="margin-top: 50px;">
One set of columns uniquely determines another.<br/>
$\pi_{A}(\delta(\pi_{A, B}(R)))$ has no duplicates and...
$$\pi_{attrs(R)-A}(R) \bowtie_A \delta(\pi_{A, B}(R)) = R$$
</p>
</section>
<section>
<h3>Constraints</h3>
<h4>The Good</h4>
<ul>
<li style="font-size: 70%" class="fragment">Sanity check on your data: Inconsistent data triggers failures.</li>
<li style="font-size: 70%" class="fragment">More opportunities for query optimization.</li>
</ul>
<h4 style="margin-top: 50px;" class="fragment">The Not-So Good</h4>
<ul>
<li style="font-size: 70%" class="fragment">Validating constraints whenever data changes is (usually) expensive.</li>
<li style="font-size: 70%" class="fragment">Inconsistent data triggers failures.</li>
</ul>
</section>
<section>
<h3>Foreign Key Constraints</h3>
<p style="margin-top: 50px;">Foreign keys are like pointers. What happens with broken pointers?</p>
</section>
<section>
<h3>Foreign Key Enforcement</h3>
<p>Foreign keys are defined with update triggers <code>ON INSERT [X]</code>, <code>ON UPDATE [X]</code>, <code>ON DELETE [X]</code>. Depending on what [X] is, the constraint is enforced differently:</p>
<dl style="font-size: 80%">
<dt><code>CASCADE</code></dt>
<dd>Create/delete rows as needed to avoid invalid foreign keys.</dd>
<dt><code>NO ACTION</code></dt>
<dd>Abort any transaction that ends with an invalid foreign key reference.</dd>
<dt><code>SET NULL</code></dt>
<dd>Automatically replace any invalid foreign key references with NULL</dd>.
</dl>
</section>
<section>
<p style="font-weight: bold;">
<code>CASCADE</code> and <code>NO ACTION</code> ensure that the data never has broken pointers, so
</p>
$$\pi_{attrs(S)}\left(S \bowtie_B R\right) = S$$
</section>
<section>
<h3>Functional Dependencies</h3>
<p style="margin-top: 50px;"><b>A generalization of keys:</b> One set of attributes that uniquely identify another.</p>
<ul>
<li>SS# uniquely identifies Name.</li>
<li>Employee uniquely identifies Manager.</li>
<li>Order number uniquely identifies Customer Address.</li>
</ul>
<p class="fragment">Two rows with the same As must have the same Bs</p>
<p class="fragment" style="font-size: 80%">(but can still have identical Bs for two different As)</p>
</section>
<section>
<h3>Normal Forms</h3>
<p style="margin-top: 50px;">"All functional dependencies should be keys."</p>
<p class="fragment">(Otherwise you want two separate relations)</p>
<p class="fragment">(for more details, see CSE 560)</p>
</section>
<section>
$$P(A = B) = min\left(\frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}, \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } B)}\right)$$
</section>
<section>
<p>
$$R \bowtie_{R.A = S.B} S = \sigma_{R.A = S.B}(R \times S)$$
(and $S.B$ is a foreign key referencing $R.A$)
</p>
<p class="fragment" style="margin-top: 30px; font-size: 80%">
The (foreign) key constraint gives us two things...
$$\texttt{COUNT}(\texttt{DISTINCT } A) \approx \texttt{COUNT}(\texttt{DISTINCT } B)$$
<span style="font-size: 60%; font-weight: bold; margin: 0px;">and</span>
$$\texttt{COUNT}(\texttt{DISTINCT } A) = |R|$$
</p>
<p class="fragment" style="margin-top: 30px; font-size: 80%">
Based on the first property the total number of rows is roughly...
$$|R| \times |S| \times \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}$$
</p>
<p class="fragment" style="margin-top: 30px; font-size: 80%">
Then based on the second property...
$$ = |R| \times |S| \times \frac{1}{|R|} = |S|$$
</p>
<p class="fragment" style="margin-top: 30px; font-size: 50%">(Statistics/Histograms will give you the same outcome... but constraints can be easier to propagate)</p>
</section>
</section>
<section>
<p><b>Next class:</b> Exam Review</p>
</section>
</div></div>
<script src="../reveal.js-3.6.0/js/reveal.js"></script>