From 889440b39c186b0ddbfda1b0567528f14d6dded5 Mon Sep 17 00:00:00 2001 From: Oliver Kennedy Date: Wed, 7 Mar 2018 00:45:02 -0500 Subject: [PATCH] Finishing CBO slides --- .../2018-03-05-CostBasedOptimization2.html | 173 ++++++++++++++++++ 1 file changed, 173 insertions(+) diff --git a/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html b/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html index 4d59a1d0..13ca507c 100644 --- a/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html +++ b/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html @@ -777,6 +777,179 @@ +
+
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+ +
+

Key / Unique Constraints

+

+            CREATE TABLE R ( 
+              A int,
+              B int UNIQUE
+              ... 
+              PRIMARY KEY A
+            );
+          
+

+ No duplicate values in the column. + $$\texttt{COUNT(DISTINCT A)} = \texttt{COUNT(*)}$$ +

+
+ +
+

Foreign Key Constraints

+

+            CREATE TABLE S ( 
+              B int,
+              ... 
+              FOREIGN KEY B REFERENCES R.B
+            );
+          
+

+ All values in the column appear in another table. + $$\pi_{attrs(S)}\left(S \bowtie_B R\right) \subseteq S$$ +

+
+ +
+

Functional Dependencies

+ +

+            Not expressible in SQL
+          
+ +

+ One set of columns uniquely determines another.
+ $\pi_{A}(\delta(\pi_{A, B}(R)))$ has no duplicates and... + $$\pi_{attrs(R)-A}(R) \bowtie_A \delta(\pi_{A, B}(R)) = R$$ +

+
+ +
+

Constraints

+ +

The Good

+
    +
  • Sanity check on your data: Inconsistent data triggers failures.
  • +
  • More opportunities for query optimization.
  • +
+ +

The Not-So Good

+
    +
  • Validating constraints whenever data changes is (usually) expensive.
  • +
  • Inconsistent data triggers failures.
  • +
+ +
+ +
+

Foreign Key Constraints

+ +

Foreign keys are like pointers. What happens with broken pointers?

+
+ +
+

Foreign Key Enforcement

+ +

Foreign keys are defined with update triggers ON INSERT [X], ON UPDATE [X], ON DELETE [X]. Depending on what [X] is, the constraint is enforced differently:

+ +
+
CASCADE
+
Create/delete rows as needed to avoid invalid foreign keys.
+ +
NO ACTION
+
Abort any transaction that ends with an invalid foreign key reference.
+ +
SET NULL
+
Automatically replace any invalid foreign key references with NULL
. +
+
+ +
+

+ CASCADE and NO ACTION ensure that the data never has broken pointers, so +

+ $$\pi_{attrs(S)}\left(S \bowtie_B R\right) = S$$ +
+ +
+

Functional Dependencies

+ +

A generalization of keys: One set of attributes that uniquely identify another.

+ +
    +
  • SS# uniquely identifies Name.
  • +
  • Employee uniquely identifies Manager.
  • +
  • Order number uniquely identifies Customer Address.
  • +
+ +

Two rows with the same As must have the same Bs

+

(but can still have identical Bs for two different As)

+
+ +
+

Normal Forms

+

"All functional dependencies should be keys."

+

(Otherwise you want two separate relations)

+

(for more details, see CSE 560)

+
+ +
+ + $$P(A = B) = min\left(\frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}, \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } B)}\right)$$ + +
+
+ +

+ $$R \bowtie_{R.A = S.B} S = \sigma_{R.A = S.B}(R \times S)$$ + (and $S.B$ is a foreign key referencing $R.A$) +

+ +

+ The (foreign) key constraint gives us two things... + $$\texttt{COUNT}(\texttt{DISTINCT } A) \approx \texttt{COUNT}(\texttt{DISTINCT } B)$$ + and + $$\texttt{COUNT}(\texttt{DISTINCT } A) = |R|$$ +

+ +

+ Based on the first property the total number of rows is roughly... + $$|R| \times |S| \times \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}$$ +

+ +

+ Then based on the second property... + $$ = |R| \times |S| \times \frac{1}{|R|} = |S|$$ +

+ +

(Statistics/Histograms will give you the same outcome... but constraints can be easier to propagate)

+
+
+ +
+

Next class: Exam Review

+
+