From 7f013655b43a1c3512cec53c66d5edf9bdffc76e Mon Sep 17 00:00:00 2001 From: Oliver Date: Sun, 7 Mar 2021 22:39:50 -0500 Subject: [PATCH] slides --- src/teaching/cse-562/2021sp/index.erb | 12 +- .../slide/2021-02-18-QueryAlgorithms.erb | 17 + .../2021sp/slide/2021-03-04-Indexing2.html | 2 +- .../2021sp/slide/2021-03-09-CostOpt1.erb | 555 ++++++++++++++++ .../slide/2021-03-09/EstimationXKCD.png | Bin 0 -> 22646 bytes .../2021sp/slide/2021-03-11-CostOpt2.erb | 606 ++++++++++++++++++ .../2021sp/slide/2021-03-11/JoinIssue.svg | 279 ++++++++ 7 files changed, 1466 insertions(+), 5 deletions(-) create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-11-CostOpt2.erb create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg diff --git a/src/teaching/cse-562/2021sp/index.erb b/src/teaching/cse-562/2021sp/index.erb index f06aac31..87ec2bbe 100644 --- a/src/teaching/cse-562/2021sp/index.erb +++ b/src/teaching/cse-562/2021sp/index.erb @@ -51,12 +51,16 @@ schedule: materials: slides: slide/2021-03-04-Indexing2.html - date: "Mar. 9" - topic: "Spark's Optimizer + Checkpoint 2" - due: "Checkpoint 1" - - date: "Mar. 11" topic: "Cost-Based Optimization" - - date: "Mar. 16" + due: "Checkpoint 1" + materials: + slides: slide/2021-03-09-CostOpt1.html + - date: "Mar. 11" topic: "Cost-Based Optimization (contd.)" + materials: + slides: slide/2021-03-11-CostOpt2.html + - date: "Mar. 16" + topic: "Spark's Optimizer + Checkpoint 2" - date: "Mar. 18" topic: "Distributed Queries: Challenges + Partitioning" - date: "Mar. 23" diff --git a/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb b/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb index b38ce150..d7be63f8 100644 --- a/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb +++ b/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb @@ -17,6 +17,23 @@ textbook: "Ch. 15.1-15.5, 16.7" More similar examples with Union and Cross would also help. Might help to tighten up the time spent a little too. I had to cut out before introducing Sort-Merge Joins + + +------- + 2021 by OK: + + Applied changes above. Things went better. + + Looking at costs in terms of the "overhead" of each operator is proving to be *really* + hard for the students to grasp. I suspect it might be easier for the students to grasp + a recursive definition. + + e.g., cost(\pi(R)) = cost(R) + + This would, among other things, make the (B)NLJ cost a lot easier to specify. + + I made these changes already to 03-09-CostOpt1, so they should probably be backported here next time I teach the class. + -->
diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html b/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html index c8208fd1..8b63c9c8 100644 --- a/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html +++ b/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html @@ -1,5 +1,5 @@ --- -template: templates/cse4562_2019_slides.erb +template: templates/cse4562_2021_slides.erb title: "Indexing (Part 2) and Views" date: March 4, 2021 textbook: "Papers and Ch. 8.1-8.2" diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb b/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb new file mode 100644 index 00000000..a47dbf22 --- /dev/null +++ b/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb @@ -0,0 +1,555 @@ +--- +template: templates/cse4562_2021_slides.erb +title: "Cost-Based Optimization" +date: March 9, 2021 +textbook: Ch. 16 +--- + + + +
+
+

General Query Optimizers

+
    +
  1. Apply blind heuristics (e.g., push down selections)
  2. +
  3. Enumerate all possible execution plans by varying (or for a reasonable subset) +
      +
    • Join/Union Evaluation Order (commutativity, associativity, distributivity)
    • +
    • Algorithms for Joins, Aggregates, Sort, Distinct, and others
    • +
    • Data Access Paths
    • +
    +
  4. +
  5. Estimate the cost of each execution plan
  6. +
  7. Pick the execution plan with the lowest cost
  8. +
+
+
+ +
+
+

Idea 1: Run each plan

+
+ +
+ + © Paramount Pictures +
+ +
+

If we can't get the exact cost of a plan, what can we do?

+
+ +
+

Idea 2: Run each plan on a small sample of the data.

+

Idea 3: Analytically estimate the cost of a plan.

+
+ +
+

Plan Cost

+
+
+
CPU Time
+
How much time is spent processing.
+
+ +
+
# of IOs
+
How many random reads + writes go to disk.
+
+ +
+
Memory Required
+
How much memory do you need.
+
+
+
+ +
+ + Randal Munroe (cc-by-nc) +
+ +
+

Remember the Real Goals

+
    +
  1. Accurately rank the plans.
  2. +
  3. Don't spend more time optimizing than you get back.
  4. +
  5. Don't pick a plan that uses more memory than you have.
  6. +
+
+
+ + + +
+
+

Accounting

+

Figure out the IO cost of the entire* subtree.

+ +

Only count the amount of memory added by each operator.

+ + +

* Different from earlier in the semester.

+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationRATotal IOs (#pages)Memory (#tuples)
Table Scan$R$$\frac{|R|}{\mathcal P}$$O(1)$
Projection$\pi(R)$$\textbf{io}(R)$$O(1)$
Selection$\sigma(R)$$\textbf{io}(R)$$O(1)$
Union$R \uplus S$$\textbf{io}(R) + \textbf{io}(S)$$O(1)$
Sort (In-Mem)$\tau(R)$$0$$O(|R|)$
Sort (On-Disk)$\tau(R)$$\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$$O(\mathcal B)$
(B+Tree) Index Scan$Index(R, c)$$\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$$O(1)$
(Hash) Index Scan$Index(R, c)$$1$$O(1)$
+ +
    +
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. +
  3. Size of $R$ ($|R|$)
  4. +
  5. Pages of Buffer ($\mathcal B$)
  6. +
  7. Keys per Index Page ($\mathcal I$)
  8. +
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationRATotal IOs (#pages)Mem (#tuples)
Nested Loop Join (Buffer $S$ in mem)$R \times_{mem} S$$\textbf{io}(R)+\textbf{io}(S)$$O(|S|)$
Block NLJ (Buffer $S$ on disk)$R \times_{disk} S$$\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$$O(1)$
Block NLJ (Recompute $S$)$R \times_{redo} S$$\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$$O(1)$
1-Pass Hash Join$R \bowtie_{1PH, c} S$$\textbf{io}(R) + \textbf{io}(S)$$O(|S|)$
2-Pass Hash Join$R \bowtie_{2PH, c} S$$\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$$O(1)$
Sort-Merge Join $R \bowtie_{SM, c} S$[Sort][Sort]
(Tree) Index NLJ$R \bowtie_{INL, c}$$|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$$O(1)$
(Hash) Index NLJ$R \bowtie_{INL, c}$$|R| \cdot 1$$O(1)$
(In-Mem) Aggregate$\gamma_A(R)$$\textbf{io}(R)$$adom(A)$
(Sort/Merge) Aggregate$\gamma_A(R)$[Sort][Sort]
+ +
    +
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. +
  3. Size of $R$ ($|R|$)
  4. +
  5. Pages of Buffer ($\mathcal B$)
  6. +
  7. Keys per Index Page ($\mathcal I$)
  8. +
  9. Number of distinct values of $A$ ($adom(A)$)
  10. +
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + +
SymbolParameterType
$\mathcal P$Tuples Per PageFixed ($\frac{|\text{page}|}{|\text{tuple}|}$)
$|R|$Size of $R$Precomputed$^*$ ($|R|$)
$\mathcal B$Pages of BufferConfigurable Parameter
$\mathcal I$Keys per Index PageFixed ($\frac{|\text{page}|}{|\text{key+pointer}|}$)
$adom(A)$Number of distinct values of $A$Precomputed$^*$ ($|\delta_A(R)|$)
+

* unless $R$ is a query

+
+ +
+ + +
+
+

Estimating IOs requires Estimating $|Q(R)|$, $|\delta_A(Q(R))|$

+
+ +
+

Cardinality Estimation

+

Unlike estimating IOs, cardinality estimation doesn't care about the algorithm, so we'll just be working with raw RA.

+ +

Also unlike estimating IOs, we care about the cardinality of $|Q(R)|$ as a whole, rather than the contribution of each individual operator.

+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperatorRAEstimated Size
Table$R$$|R|$
Projection$\pi(Q)$$|Q|$
Union$Q_1 \uplus Q_2$$|Q_1| + |Q_2|$
Cross Product$Q_1 \times Q_2$$|Q_1| \times |Q_2|$
Sort$\tau(Q)$$|Q|$
Limit$\texttt{LIMIT}_N(Q)$$N$
Selection$\sigma_c(Q)$$|Q| \times \texttt{SEL}(c, Q)$
Join$Q_1 \bowtie_c Q_2$$|Q_1| \times |Q_2| \times \texttt{SEL}(c, Q_1\times Q_2)$
Distinct$\delta_A(Q)$$\texttt{UNIQ}(A, Q)$
Aggregate$\gamma_{A, B \leftarrow \Sigma}(Q)$$\texttt{UNIQ}(A, Q)$
+ +
    +
  • $\texttt{SEL}(c, Q)$: Selectivity of $c$ on $Q$, or $\frac{|\sigma_c(Q)|}{|Q|}$
  • +
  • $\texttt{UNIQ}(A, Q)$: # of distinct values of $A$ in $Q$.
  • +
+
+ +
+

Cardinality Estimation

+

(The Hard Parts)

+ +
+
$\sigma_c(Q)$ (Cardinality Estimation)
+
How many tuples will a condition $c$ allow to pass?
+ +
$\delta_A(Q)$ (Distinct Values Estimation)
+
How many distinct values of attribute(s) $A$ exist?
+
+
+
+ +
+
+

Idea 1: Assume each selection filters down to 10% of the data.

+
+ +
+ +

no... really!

+ © Paramount Pictures +
+ +
+

... there are problems

+
+

Inconsistent estimation

+

$|\sigma_{c_1}(\sigma_{c_2}(R))| \neq |\sigma_{c_1 \wedge c_2}(R)|$

+
+
+

Too consistent estimation

+

$|\sigma_{id = 1}(\texttt{STUDENTS})| = |\sigma_{residence = 'NY'}(\texttt{STUDENTS})|$

+
+

... but remember that all we need is to rank plans.

+
+ +
+

Many major databases (Oracle, Postgres, Teradata, etc...) use something like 10% rule if they have nothing better.

+ + +

(The specific % varies by DBMS.)

+ +

(Teradata uses 10% for the first AND clause,
cut by another 75% for every subsequent clause)

+
+ +
+

(Some) Estimation Techniques

+ +
+
+
The 10% rule
+
Rules of thumb if you have no other options...
+
+ +
+
Uniform Prior
+
Use basic statistics to make a very rough guess.
+
+ +
+
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+
+ +
+
Histograms
+
Using more detailed statistics for improved guesses.
+
+ +
+
Constraints
+
Using rules about the data for improved guesses.
+
+
+
+
+ + + +
+ +
+

Uniform Prior

+ +

We assume that for $\sigma_c(Q)$ or $\delta_A(Q)$...

+
    +
  1. Basic statistics are known about $Q$:
      +
    • COUNT(*)
    • +
    • COUNT(DISTINCT A) (for each A)
    • +
    • MIN(A), MAX(A) (for each numeric A)
    • +
  2. +
  3. Attribute values are uniformly distributed.
  4. +
  5. No inter-attribute correlations.
  6. +
+

+ If necessary statistics aren't available (point 1), fall back to the 10% rule. +

+

+ If statistical assumptions (points 2, 3) aren't perfectly true, we'll still likely be getting a better estimate than the 10% rule. +

+
+ +
+

COUNT(DISTINCT A)

+

$\texttt{UNIQ}(A, \pi_{A, \ldots}(R)) = \texttt{UNIQ}(A, R)$

+

$\texttt{UNIQ}(A, \sigma(R)) \approx \texttt{UNIQ}(A, R)$

+

$\texttt{UNIQ}(A, R \times S) = \texttt{UNIQ}(A, R)$ or $\texttt{UNIQ}(A, S)$

+

$$max(\texttt{UNIQ}(A, R), \texttt{UNIQ}(A, S)) \leq\\ \texttt{UNIQ}(A, R \uplus S)\\ \leq \texttt{UNIQ}(A, R) + \texttt{UNIQ}(A, S)$$

+
+ +
+

MIN(A), MAX(A)

+

$min_A(\pi_{A, \ldots}(R)) = min_A(R)$

+

$min_A(\sigma_{A, \ldots}(R)) \approx min_A(R)$

+

$min_A(R \times S) = min_A(R)$ or $min_A(S)$

+

$min_A(R \uplus S) = min(min_A(R), min_A(S))$

+
+ +
+

Estimating $\delta_A(Q)$ requires only COUNT(DISTINCT A)

+
+ +
+

Estimating Selectivity

+ +

Selectivity is a probability ($\texttt{SEL}(c, Q) = P(c)$)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
$P(A = x_1)$$=$$\frac{1}{\texttt{COUNT(DISTINCT A)}}$
$P(A \in (x_1, x_2, \ldots, x_N))$$=$$\frac{N}{\texttt{COUNT(DISTINCT A)}}$
$P(A \leq x_1)$$=$$\frac{x_1 - \texttt{MIN(A)}}{\texttt{MAX(A)} - \texttt{MIN(A)}}$
$P(x_1 \leq A \leq x_2)$$=$$\frac{x_2 - x_1}{\texttt{MAX(A)} - \texttt{MIN(A)}}$
$P(A = B)$$=$$\textbf{min}\left( \frac{1}{\texttt{COUNT(DISTINCT A)}}, \frac{1}{\texttt{COUNT(DISTINCT B)}} \right)$
$P(c_1 \wedge c_2)$$=$$P(c_1) \cdot P(c_2)$
$P(c_1 \vee c_2)$$=$$1 - (1 - P(c_1)) \cdot (1 - P(c_2))$
+ +

(With constants $x_1$, $x_2$, ...)

+
+ +
+

Limitations

+ +
+
+
Don't always have statistics for $Q$
+
For example, $\pi_{A \leftarrow (B \cdot C)}(R)$
+
+ +
+
Don't always have clear rules for $c$
+
For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+ +
+
Attribute values are not always uniformly distributed.
+
For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+ +
+
Attribute values are sometimes correlated.
+
For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+
+

...but handles most usage patterns

+
+ +
+ ... next class more! +
+ +
\ No newline at end of file diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png b/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png new file mode 100644 index 0000000000000000000000000000000000000000..f31d42c6f84105970d809f7be0bd8d3ad81ff938 GIT binary patch literal 22646 zcmYg$1x%bx)b#?ztrRF$eDUJ$E{hah+_kv7mjcCMafjmW?#|-w+TvQ=;phGGf5|t= z3#nZ5{mC83A^8 zD=Dd|?2zo>6JQY(bo*rmMg^t@{x`^3TbjFB$+|=T!_30Wi7>bzg}#%YvZ$Z3 zF~|z#J$McS;eF^QKYxP1>mL(ZceU?kYNYW|yDz7NN zTl%_OSmSGgX|Xw;+dn;U5zHQU1)EH{_}Nb33|>G*K-DwYgXRMHO5tzvw!zmN%4+c` z@V{`S(kaYseJtF#j^EqA4f?6Wh{6owienSv1*j9585w=W3gy-HZj*Gg+asPygDb*= zWxZtW6BkQyiqnc;)%c4y3b0B`@*hNx18#wtYI+QUq8%EK(rBOY7YGONSJQ^m+0r{X z+6+5qjl*_7l<^(wrt_?d@N0NmBbrDP({Nf);lXQoa>KiuVY({nF=N{kk-Bdhc71qz z4tVSi6O{{c57tHVC78e=$Fs!6#nQuKXLnOv3;NI~v7TSO6OzDL!ad8&_LYsjZqM9c z%xto5vWj*|?Y>g84Or#^UQS1O8G)4QaYi=iyQ&_OavgEbb7b2lS~J)Puh6Z7=X_f= z8memE&Lq-7bu@38guN+PuDc)Y44?hYbBtADWJv4rGe&|RfFZ)*IsMTiQ zu^VEWF3)AZY*lOfgi)6HPKQJOzr1006JFPp8#gRxcbdVp2|sA z0rg+peJ2#sb==SAHdQ{m`4#4DiOuEo}zc?Q(L={8+-$-+>R{21M z`@>vZy3UG|$~WFcEk{pV>>ZTC@yX_+1ScTz6|g7p(8j4y6iCR1C48czQjz@+AKi&q z=`k$FT#@23TjUqy@0F(zBO#&8xzD8Z^*EkX zdK=%3Kj!`ZotKx__g5xlv;8?0m=}7N)qbbEBiv8d8iNebK?d!4703r~X0O@6Tw}jp zh}uJ-Z%A|?v($!M<7-EGZe`NT)artB^2Sq|=FPl!fCunO)N z4YZhGV9O^xRgN_7i|z7g|5W$VlG5bvZsf>{$f-P}vlL*s^x$f9CmTi7MATwvm(x!4 zdhG4~MQ|b>_@Kh|@xV><_U~id35J|buj$mZo(0IRRNXwwN2F`H$!d^xztJ|Wl?e#i zVEXYpKDiY`pF17cr=}CnesbhH1{ZiuQ$&JmCBN?QR3{`)t^3wHu15+>dp>VqE3gm? z{Cj)S`X^dj)X04vF@DDZ4g_i^}Y3_YJs!l>sD58C5$2l@<`>aF^wlJ(Z+VBMq20&;Nc#a@o!-DloqBp5@4e zzNJRo5g@qvIa)Du5O#Y=>njPI;XcuqF(yXs==MNIV5*_I*d@t$ZMgYXxR30T@tU^1 zBI3)$lBg4^n}?_8v5ruyT{uiryo^*?nKYOML`JZTg}w53sxP9=4f*BjLnJZF-9?EG z=GPq8Yu*(k+p~c)-aj*KGOb5L7J|+n&ya-mt-h0b^5d+7AapwBW6MhFzAxY^ci5r< ze28g+7be6f7*wQu_uUn1p(fIZ{sAS32Pw;xkt3t2k^;>Q2c^W8hw@o7u&7R%~~ER?Zh;B#MWsR%sQ5+ zCv)JO3nNSn)GI4>^)PTpxwf}WJz%Ly5BOKVw6u1S6nXnNx=%%(h8K?+1R1S)S(O1X zBqzT(CDo7>nWGFitC{eK<4{C0E0aMFX2I1zfg0kCDPM8@`j1?di9wJihhUdJ0=9o= zy8~b&BH|xHK2fPa0$^Ixr^FRBo^r#V#Cf;Dq1ACaT+M%|LA3>D^^%B%;Qm&rL1YwS z*_UTaCf`-XFCIMN|p; zh2q?QeMG&FUm2%0?`5t?;_O}cNLe4IdsAU*go6#z|8Fw&?SfAm!4vG zlzGH5geELKGANgn7Hq#!#%Srs=cFCEi@?D|nafP-54kZmfOQtOj$qqxejk}68!<@9 zL?hOw1ij!Xq4|fWxE**T>w>uwSkR0jBIio!qK~u_Lz-}mKkAH1gy7H z;d6-alslL?T;QP3JJha@0wt6J?PQRsIH3p)6ubSgzMhovSi!d76@6yv-+%1H4c)Pt zamh1f3Pnsc5};OC{D!*Xo<`I!_^EKgu-JD?tOEd%KU%r79TdIZ0;d>Io)4QRUT=>- zRVcSv_9>Ifunmx$Hf7T8j;yGs0u6HO5*2trXY=LJ@d;@Ca3lk1hMUP&c_5)Oqx*Xu zmEIsJjN84b0I!eGU#_)ORqPu`aw{i%QR-%~`7R&H#P%h!db?A0hH;l9Rjk(8+6v(2 zZO!FmplEQvT8Cm2lH8J1wDA}f&j#hG2^M#C*x_`o4hnAW7KoR0Cz$Ie z#$U!tr?TN#wXafdBq5PX%%8`Df8>1HhjnzEiv;2U^zaDWa2BA+PCa#vq#)S^4FshJ z#-hk1ls|J^JT^Ce4O+IVh;gUSM_xl9QV}%6!KV&JrSYe9$Ub$0@nE4@8pb9h@1N|& zs=>TZgVly7(xD|LHQdhE$7~vVVy;&2 z^Vg9<{XDMfwPGkvw*VzYpN6I{D&6oYEn&QzeZ$ljJFtvqoKYuV6(4lQjM_Y61+2uM zMmD=$_fKq)7wkGnENJzAWlyn|1k@+TT?MrI0bAA^tWH~}0AG{L{B+WxXo^uWa&JdY zvM}M#kK}^V`d{0{(*4Y#bP$_D+Hq=0pq%p@%J|2GraWb!uN$QOLbm2vQjq|YiZ1Ss z>E_HexH1&uS<_Hjze>NI`6n=2c(U_eeZXgFzJGI6sKG?7SGd_*v85taI8S57!bnyw zYi6%R;)tJ)>r8_z@t@FZQ73rhZu9cqnJFT#n#q(sCFGRqjZ3A6K@Z)qH7=u?MUZY} z^IY{2)6YX&S~^)s?p9BrhB58%C4Dmn!+%fclxV9XX*qd#QqK)#kUkF+*taWZp4I@8 zE!Ch-_2dW$Yf{HJoc2HIEwU(TEY)m;iYP+d7Zg*w=8q965^Ls>5DsKtqwnq-P_iwR5Of=a^rVzn&3WApXf& zdn*ZWDG!+S*?D^vXE7 zKHPhYxqsOp68Rc3LQ3AgR-2aV=ebBYP$x*LNL?T+<|Ond%Ac!yB3b)~&ejeC4uGHJ8*7t)CV2^8IQk+Ha74f)sUsfm@fx;@m*bRR8e$SR z#`h1%t^;RYFnzhOjv6#|V`wN}dneh+CVd)JgyB{d_xd+BoQ*^RV#(+4gmFfXA6a*Z z1HzD=5+Cz&@|sqeRLd*HiqqiBE$Am;sb)vetuxCslHs+g{soNJkQl*zTkpjr>WfrD zi=%$0hXAV3iMd4ru2yiWld^{Lz^R+kj|4gl+Ds84yhv;y^KXuNjr6oGZYP<)h5}uv z#rhPpHpF>fzs7^kx>J@AQ$%Yy^+)ZFLP&sgVUK}zidOuQ@)5cg|FR;$wW*X4?LTOdhaE-_$DI3f+Q{_hq+-sm-{gljiD zC($s(U=%M!-ii0NKU6wy2je>@iO^Fo=I%ksUPK=vk5xBai0_<~`(d?s?wAcO)J`1; z#rm}oGbp}T6i?9T3Jo}_WMwV&_bDJ=j{Nf_M;5`;0O2F4N&LUO-`JZb9p&b+-(fuA zMoBAIsK`55T(0tO42GGdIKwslwk?MRHOm~*^H?>#0Ag!>KQ&Wrfp!YV#L^ePZc;5}vShP}y4n|+2ZOB_|&8hw~b zxeTB86oFO7!Y!Xl0lm;sK)^#j=^Can#htllXd!ukG9FZTOHN%wjz4hpBs45FZ5$8f zw%Ej@cV829$y}V8P(n3(q$0fboebkXP$u|jZEK&UB%fJ-b-v>bWJKbaErs)-43|dD zH7^^UR^9Oc?kZEcti#GV%#sQ@B!vAaf6LZddKq_PZKggK_6H^PFa-0%roK)1#z%b^ zfM#gC&P9Sv1Z2G_D-J?J+jVpscDF-mW9Aywgjm}ybczP=TLUiy!Tp+F_V339m_?=R z3Q2$*8m$g(57z@W8Cquij&;RP_9s&jBK$ESq<@uqbd(`!G6K04F>*yjd9EYY6&uKk_*9`*YrW68my1hu#hb32AB+pR7O{k=eH|wb6M$^_L_)(?x*< zi7ja4rryOmBl6_rrM2hmIN~Xmnt=-2(wCjo-%fuL5twgCh)o1=Lm5;2$8F`))t%YD zmAJh0kJ(v~K4AU2XO-!|F)_IBAQ(i?M>G=)OdWejR<=Qg26CxD#ephD>7B^kfj3MQCYFC57s;35Csn^}UnA=?Y+)Ju|DslU1+7S+fdL)|H zQ0)?z(2z49m1SCIQ#qk0bZC_CDQ)zUv-N&hqA}1XE=683C~?&NusoAlzW7V`=iegk zv2;PVEy!pgz;LhgLxjv_b)3&Pe-X15L-4|@{n`0T4chNb#IIX7(ew>TOC*S}M7nW&*?y8mPy@Z67e9sfWW zExw(kar~>-?ZI>XV{1dzf^7UCU=k47APsRpYEQ6iA5V@|-I2jG`envB6uy;4f zCMUaiHc^}9XEy5Hc5L)|o^jn|L*qYm^vOF581uBR@p|oiukrcU^%*w-_dYi0wdH*E zHhA4EJzBed@3-Qw8KE1-$CuXpde2w)xDho9FfBy>R{7nu!edgY1pn?v!3caWnMmFl zxu}P!<$iBcc)x+t9p5Q(vA~kiO^Hm4MIRuJAcI&vFm8)Q|GizM!!@^sEQ$qVx975yPlJa_mwQ^Y7{;N;D>Jq!+-bW3`haC9j}yE9z^`tib(O)X{Sazu%h zZUqQ;4FuTIS*-dAnt%&u0GZ`0+&h!_s2;0x+>F5&ce_e~d-8#C6dw{~NGG+7!d1wk zRIxOXDZjue_cwJWpbwF4c91E`6t#mRBvC4W|CZir+-e;pRklD5watoWrPvzfpLBr` zIR@vu@YJ(}?0oDdPZInW_k2P!0rmx^3aW*|Q{>Ms6rU@RU{C%)ZHu2|xlVdf+c%&}(J+2&f>s zLEl6LS$|n5EqqyD(xLiMC0BYs3#PiMBP1+{OW*Rwnm?255cGI(mpsaI)1XdkWvNVw z8q0ixN#`{<#fFM|xDwB%Zija@lPb#S586Tw3vx62RvkaJfkMJ*;(*+yG2A73I*@HX zut~}^Ox*}L>hcgH=2cz5zSuqb3osmG0k$!AR?T2ld*E%sr<6!PE{J0m!*t;B-iW_! zo(!tmy!wqj+Na`hY{QEcXi+8=zr$uEtnxXt*Al{PmJW5#dAtwmEzEOB1OwN%L2RW+ z4xy3UW#En#KF44@AkE(^=|mW=1%sul0!$oXXhmnH!EC*Rf{KYyQ%?hY=NG*?lN4B-)Nv1v^3GPA1+4qy(&7_z6g2=t*-H%~5mzi~@i5&z^F5FhP#!aD)o43k|5L3B%34E9Lo zdGdtP^6P!DA*Su(C13cZ3j1Pq*vAs9K4GeBmkCp~sPChu8qI_Psdj%>4@Cetl9uzC(NH88}z!n?@n2M?eN?r zBri==IJj3FM--U$AAhql@RQ3IDnZHJxj8xiEei1T_4@Rru`hk0387(8=ZFTjU=c-K z;~D%ztIYQocv;luK4avq-uh512Na1tq5-b#)oSt zJDcPabH$Z?e8pK!IWa7q71zcV-;6{5!uEdPtq#)fE4BNHP`@{~-=cIU7OA~ne-tc* zLiFAsR3bdVjQn#jE(S0Mhn(_%DJp=Qu39}^hXtd9@jKo<*!SuN*3%?cPL8J9$Z@3| z9eR>E35<`D%S#Vea_pgZR&pRk@e9{VEGpd|M<*(l)NS$Hv%Mr-!iRM7`Nooe?o~X$ zuF#c^0ML2KzlUb9L=DE<$}}JN=h#PQeVqSQ)saI?I*z$WvwJ2ad9at62)i=c&6-G* zxB`G3zP+=vD(sl`w2QBVo`AHdF?4+ymY?}Fr5CT?^aq_U-0)rA~R9u z?1|XtYZ+NcO)jv5j~8>e(OUMm^ZMOX{Hem_=Fw-5ZJVsMJx$^wtiXiwJ8CE@jSSnZ z8UnQC^h3EI$@5(~Qy?K?!fi<}^!BMlLe}z1jpF8xu0XB}2(cKVD-4MZ>F`Z{o?MA~ zdNvTJm%>E<{mz;ghu7z=^vF}If43u53zGS|Iz5$wSM7x(UPibwsHE3*Sgp6yOtM;R z?Zf)UT=<0F_)3}W-XL0|OsF6=eYdpxROlPO4AsgnCI0G?&M*RI(Z$vSf?AsEr}0cn zV;3#C!b_<8aep@6Pc`m-JA-ddPi(jFfbSvuHXJQ<;Va_IzS0lLyr@4FOsVW4@(wV% zdwjY;&)hW9Dn(MBi#hu!HOOK0G|K>#kLx0u@AW13kEC@Q3#9Dgcl~%FqC4r?G9~5$ z4~9`V7d;3M(TVBv6>P}*Qq#Zhl~NkHX7VF}$cI-oEIFIbPX3IkJ%UR9v6DesRtw5I zE;fWHIiCqsaV8&O0Be30`6}Z>waA~ROV{;n_Jz0`DqZNsRc6NN8^Wz8TAk$~AhwQS z5p~yHrLR;SHGhw}{#J|thsV7CR^=gRNp&J_QI|s2!r-3s=Oa(x44INmwoQ3RFSV=m zAKj(smrXfdFxl~gh*r#FPx6x&B#NwyDybwb%m&+StY}Rc!M&ccVJr~Xam{0OkN67LeVIMsPGi%&P+=go!SF@P9NL6EPi9sx2M1+$!F7eYUsT9ID zw#iTOyBPZ$orK)b`#{t>b0f57el{;D8+>L=rGO%mo*S*<^NHyowV2o)T(re63XswH zyi9sE34REOi$acnbzBR~9)ZNaqN?Cs=cLS!M&^ki-e}*NSf&8Qz;k|b!tY(@5zP{H z+bf|-5rvrPorMJt^y;>vB@w_R1uWfkpGVDc>f#O#s{btnF6X=Dd$?&Lev3>QC@kI)AD81`Otb>hoRcVb|QpAGyPeI6ACq}oFt`ZZWCe0TXG z;^^`tE#fEXg5qg@rrF?|VB89&jix{h7<~W_y2|s2;y*ovCjG#q@yj8h(HfB^^pZ+r z-a&>la}aZHsT<~29@*E|b@8l9+_Y@&kD9=qi6a5nqQ#R#f+Jyv%ca4Fx=tQ(Fr07n z1p3(wxEb#9mMEhz7efuTX!Si8^IoJL-%)@g=%&jv4--QJ;TK$)U=@B>DS7RRf1dcX}C)t zG2uLwpc}-6o*voGfcwy0l3t>*ik>X`$u=Z1b^?f>t^ z-5ke00zDcRv&h-o#S>+A6;*3uO~dWrL)1h|e7JJfTqw`K_-DtEwe*bJKAd3>DK`7h zCJ}h=%nBdb$geR=l=9CB#~r#cS>9}gLr{j+{@F5nQqS%Y3Lq5QQWCh>FK)~}EimG* z739X!(#=yaU>?XrdrAp|~awhj^ccQR~V5Usd^8H11nhU%a43^Rjp?BZSrOAJ2O zvzKs$s^euviPIVCb6nV-je1B|zyhg`X6WUa zT~4RiG$B?l!swd)kR76n%#&I=`rOp%H?G~jrJuLZ$GsP zL7VG8xbyQb%MuI)Mqxt{;U<(&W$g=K3+YtyfVa&I01&x@f)A@Xz`Q~Xn$dAP>ybdK zhhK;{x#m^Dk$r!SX>v}3n+?L$8=F*FHWgnCY|IzsF9Y#@ z>nCN(ai-2Etr+P1?{QSZzxM{*~ zA$4YEmJd$Ame2aY)D3AWCf9*BgP&@uco6sK*yAYW4FW{ElJDubBOHUbNG7^2ZZ&6w z5Er5>4^qtG^kxaC4uu7Yn{M1&I!bbANLoou4LO!G`@*3SC-f-k5J*sDpwGnd*nNil zw=G?QQRPPNqNCaD31#jM&2!5f4HJUJ7!lmrL+AvYidblnc|5+}IaL6*r8_q|6yGw7 z&;uF#75gjWut^|owU{3)M(`j8IKqFxL5imN-h@zv5G@&7m4~l2A2ALsZ9YEPIr@B{ zj@PG7QBY5^#kNy2h*o=K%SlUp6}MBV_znkf4gq{jOtUQH@=`Xq(hh77=QK4(JQ=NL z)vsx-a;xLH?HNgl{~dhtG8+3B&Q--c>_&7a_A0l+q9fzm9+PVOt`M*nqy|B^Wgv=H znH0h!e~y3PU^KRCkCyhRIx!onSnQ0X6U<~{HwumK(^w~ryUjt= zo(9-y2g%UMJ2rp3d=xqVgnJrfK2;gE!`F0O)HiScZFVzca^%h!U-<85QTTP|ZKPK^ zWH%JiUBYyXp6~fWhVFkg9FYHm-*d>zVr+OT#eani?V!K!X+|8m0BrZ3kkIr7m8i1m zanO5Q-&0jCkM*e9f7Ufc=G~Bi;4cvWTS^|}CJ3O4>z5AnCCcFb$ZwErDwZhy=)%0{ zNG_XyNu+WksrddOWbLA8CP@k3s{~B19HvHo&=7P4M<4gdMLa#w<*3&4CZLkNm)EQJ zpQ}P5;=}8kIQ_`+%D71c4JLk<&y!Y6CaU^r=fjy!`J-6V_g7(2dh)&TLE%y^5{^pm zBI?3ng)}#XGBvw3+t&fi8rHC`Ylc_je2N$@2CK?cI3hhc!61m4Ya}}7mr_Dgg9R4u zCQA<2p}OjXH+PwWhYW+68QB2QHnEOZPT53TK9|EN)N^B@-~!VR!J+pw?A=#KN93ng z2@EF1dDL?_1c~3B1a{xhRSviHi=GlbZLAj`9*HM!fMg1HSEI(Rr*iJnE#%A0s$c=M zf`VJZQ3*dw;xIABw0M4N5g6T3m%0=Ca-Vpo$#^xonVPTyhNR!f^E>-N1*Ya!=GXfl z74NkAh?1)Gd%vci^VB~c`>kwREm=zvi1@0__ybB@3}SBBtyA$_-70m9bi*$922mmj zR-`Zr(wMT`_jQ=lv;v;U9Lp$K4*b8A?9{QRCxPFrYC=UzE^qi1(>nM1Lx1g@0@p!y;m!{f|CR*`X9Ze%n{!mp_G4ApjL&P&Yd}C3G zDXgbs6ql;9>DyRMinLs=l%8}Xiq`8OO-N!l@Af~pY6`6UY&C@dsEI3Ruq3&MonqGd z-CQ6S-?Hz7lJbfh5cQTWGhm>pq{{Paq1?7g3lHwfntgUZZfL{EwILRhKr1aplV`qU z*0}fH5zZQCG^<(CEAz5#mzacZRxhhm^k(FZM8zIM!3C#TDZ;fV=&A~@vEALQq1=7Z zlErOkyKU&s&Lx|-e)H?yD5jwOMjIXp5dkj4xzfnhDHUMo!g>%~3&=r<%MCjn_FW8O zA6^bn-Ff@0axs*KqSDxRZDNm7-m>~FK&C=%pht&9kj`VM@Oq-wEjon$UH!8S|F^QL ztRj<>YLuxcACcwKAE_i^=xPKIklU9-S>uWHCw#0E50GHD=KN4rZ%>2LqORlktyDpV zg5dK0^(<^X_`LQ>BwLg5x)r;&eK+a{SLohasF;kT;umARghK}^zSFp2#u>11v z0N^O&EeVZEff|&Z1dvF|r1{GGW@^Gs1lT390AC2QNq(57vV$85d1EHSZEFs9_^ecR z8eXf|YolZIrFeKS=*SjgDy4N`%g`VH4YKzASL<%F7c{x87?(O=We!cU4Ce%UnLwEP z;{hUh!y*jeYf)C*>D|~8IS_JyiN8d+tTF%d+rvs<==>a3bi9Q*)!dtUdMd%9MWp3I z8A3H{(au9BcK{QLdGeXa>YU<0ZtbDF^pTS_Kyok}Ladg0`RFFBIS9F45TVTO|7H3} z;{X28Gab;LSQ5kK{ZN&9P^e~v?aoD#KlcGQixMAXb=dQ?2(ko_mgjig+({i}pyY7M zcY5F7L^@Fb2IGKtVmpa>M|_6dpv%wITgP-6f-X`3Td8l@QQQ9n%tlaGx-Eloo4!;> z?Dx~YymCv?JN_h`Qpq~|`--n8QX%h}AWP$E0Gue(#sgh7m~CP;SlG8F5lMx0M7@N(Zf;{NKyd;2`Fv}bSs6**B1!^>xxy#Y$6XX#kE zW`v;2ud&Bt6G}Q`)F2XAKBm(7^7H0p_K+s$xmY;>li1oULIxEti1C7eaQ{UYub6@4 zbK_?gzr+QA1Im*^++`I2N0^qq;_G~f)iQ`#rf_Ks9xAAGH6dspU;<2?{#x|s_nB_p zQye40cSdq(CBg-eKlO@rW&dSMVK^0|1Xqcgs}fs@^~_y@-Zq_cN=dUTZ|w_A1w-xX z>J~Y)g!W>-#!GN3K^B5}15geQ0oxECyKDJ9$7TNnCijqq0KHT4V@npsTZm1r;+#%% zJ(ZQYO9AqxvXqAB_P^A=>j8X!+ItKl1dJBgMEeArGI)Msl=LkEzEXhf5T+94)i{14 zT9zTk?obHOz}3Vu)No~oDgv%?8b;UydWb@FdRQNu#OX`PbEtNzJlG^r58UWgMf+Ri zYe^P-eOJkskl}3W=vb&Uwt3blp_v}?2qsh-CURqA1B#a(kMregL2Ou+xKrF+zv9x-u2KTs2Z2O}S0lJ;`)j$hVvLt6tDGm7k zvMxJnlV(xWdUH z6MbLIbXfF6j>rij!MZTroq`!xQ@{VBPYDhMBWh3i_9K}jL_Olg&d`Hp^2JU9!fNNL zA_5Xy!k_*UwtOJIYF|X>y6CO)4)BaMR;IiLLMuEZ2aoUaveBy<#T5l(ax4uv2lCj0 z+mwXkj{{m+g$YWmXa!W_0~VZGYVc)ugvehN*MKzUV5nqfo$d$g01OoPh=7P*;F?rz z1t;8f;Db#hqvr^6OZ>A7{6su^fLj-)3Pfcpp(yeaAhpj#8Zf#8h#Y=8i{Xo{fXsv_ z@f7_5n#P4fF)*MzpOka>11Q1}r=O3dw8g3JWV~|&!wC|&5}-C+@G9v62$!1*M%Uo8 zUh*aB2`H;RjwSG`spGfT*5ZpJPuhpwo8X?6?@8pOx>Zo_->Ij7EyMCGzq5Qby1NQ8 z5<&0H{Vy8n0ROm7!fKR9g;*5U;8$_T-47@doXcxz@27GqynS*H6dYw3TrXPOIxQhpE1lsp!pu>)wFFo5q0 zq-(<;S-@zs?g2-XDl_Q4gpbE@M<_2!vR<$}qdil;x8UGFO%R5tYOvfS_nGgbzUXU0 zy56@4GQ!4&czQrLxRc`p7$(Oa^?TduZ%k`*s%JSKRru<%0ys75!$nkl->c~-jYYR| zJ)et#<3x)3WJrkudFPXJ+zeBKK%(XcR~bw|bGltTV%zvvhFhAZZv8XqF5%WzLcpn& z9u27wt|(gxI$2%YshadShzbxS%GDiLj%;OUV>2^N7}bKKZG^oAk%GjjhQqOFvVkxI z6qaWaKVZhBmC8@rQG{qwLzSd?2Hp@L`zVG4%P0YUC#UVY>rgHakbY2j_3G9-4_3g} z{j6O|2Yfs}=GXW>kD?ew1@7SYf*d51{BCK*hMdhnVEs{N7{Ybs0qAZ8J`S8r$@`;{Rw*((~5qg8&E z{|)=rZXGWnX}hMG5KzSTO$u<$Fi*#Xy3EeX!SP}4tK9r)$ByZ?QHQADZdwO?7ip%2oey?@?Z^UdLkc#$NH zqm44^P3sU)VD(m0ryuE<9mM@LmaK*zj@Y5C_jBEZX)f6~Bv$>eWz9%l3jzl-5TZ4^VJ z{&j1Y#c3{$Hy&g5+D1crV$fP(VSz!vYvj}HBweK3H;;#NGBD;KmQWhcd*Lz8JR!N_ z^kZv$Bzqr5E&cV`E`4WJ7{eM;hmG<IE`;r zrw705Xd;4wGD`)p?w7bXkF3mX!p}pHZLJ#VlW=j3HM)v}@}%B47sdVfR3_)9qVkl< zqmFBiD^QaB^>#ne%YlU7o37Raj@3;Me}yu_=DqNn;KZ0*sHI!@Gc9nk3a z)Y#U}91eaLB~_=^mnkzbE2(k76o;LL^Kl;WIFDb?r%!GOdA@2rZhk>S?i3O#x@%P= zz8?zklYH09QyYZeYssJk^eec}cPGz+*?7pv!U6!!?~Bn`u7K`l=<9HHpqjZS z&iMBbGt=J%XUK_b>T-dNc}|4(zovaB0&)vQYMfha^90$c1Q=EhU{(^T8W zrW;lq!18^rF`_JwBp{9S-aGCZnjw26vYVGGj8U>j``ISO{1_uG*QnL-&acaUxB63n z;}F#y8~*kY%h_?={BzMfYx1N-MbgvE!&6+pdlJ2pr|TZ3*ik99`4_y6W}jbkFqKu9 z?sEuam5w5XMrPFez7}&0U~%iFGtYky{I8WRMO*<{!U!eBwJc~?B`TE`lJm&dY`h0J96!UN`#%wB!nKFOz-82tgIYZz4&8ee;A?1^l#$uR_Br{kp#SARe}cFuLLuwfTr5wC}nFpv`yv{F!d5FgFO}jYx*^p6}1uzH0qe0FS&= z{GUCYxZt3|a$&m0W^u^Oqb;mJVR5+g9xrPH2Odm-lxQ@niP4wBpU#Te!li0jWSCk0 zJZm%6D_)206u4&5pGw~NR6x;F4%0tWFGqU-hKtTey+K2h>n-QFNzWrHEbA6YLt!sx zI+ga)yQ8oBYH|o&JoTOtLz#7Vlf+v5r6of3kCWOS+n)VbwiQ#}O9|pJz7q#l3T~?` z=T+iGli(lo{m~N?NYa{^>?PD!c5|7WezOsC6Jq_-SLYH(UoV|hU#^!m&~FbL0yP=} zP)132hv~tnE7>x&78C31P7k|2A3?1aLy6@d2T9n`zwRWb8Ph+=08iwh)o_OTEy{Yn zJrN6NB9lTNy?tp^j&b@>-#Vx!2yj1rQN9V`Yu(jf*@Ii3k@+Th$c2J7R=QD)OX639 zIkn*ZwkK|qBV=xPm)8kw`a=(W$s;TqcE2#?`OEu&<{<*nkeDL43)=}*=pf?w_*`?Q z64Xx7k1A1$S1$A>gGPbU08d*7o1w~Jc;@)ae&2|qPY9B(6ntLkB2|YNV}HUJWvo!% z0I(H27|ATx*8bjZ-{p<%vesdN@_xx%bei<##4-ccO|_}UDCtU~<4~rPW9F4aQT4PW z8!hXmWXuvKaBp5F4~!YG^W3;Q^_Z8ja0^_gPm$2N+y>S^$PUCS=3rev_kSd>zMy!F z539m#L7ze7m>+}xR%vaX8y^u7K@XpQW{s3A??g^-1}Rh4RX)U=?wO-F=v&ad^p$yU zJTU|A?T6>uj*vSl*m|#sW`5_x+T{BxJF?4PI%qe-yT#K{)qqF?zUAd!xkvx_Ti1Qx z%mZp=7ua~{Z|BX+Yg=Xq(oxbqvpq_5b$LIX-SF#8-=z9eL)}ZieVUOg$D4T4rsoxF z7Pd0k|DG~5H<*7eew0YR_e_@aM~Bu&JJ}xXs|HRH|22Y0M(AnpCK@Nzj>okSXC-3} zIb@$t72A?%l>6`8Ofm%7$Ox-Ra{gT4y8Z57+#-EFQjFi#bLt`-Tq2EWM$0gFomA!4 zjfL?VA4UUWfhiFdYSf&8unkd4vURjpj=F=>7L$_RHUioCP*s4dfG2!`4Bv{x&ki4t zI(9yJJ#WLO1$OZ%x_DrlR=#6yUUcP0l;6~!ECjXZOMiVHjIT-ZxPbjF*HYBD$y;18 z+mm}PZiVFn=Z~b6HQ0hLWw!aF^gIc^p?F8AN548kWIYD$<0^Ihng1{obmN}yr{N#7 zC@6E4-6p?#vsorXm2ju`xwnr7`&??<{^jiCw4$rdyAH)+pRL@Py3q&aUH}2jkod({ zhB#nih!kedyQ~UmYTfn2V?{&|@Q4`EM~<0vOVeKk)nOU&OaFKRp`<#|eN8C%0dL;X z!#dp1H=%B6if6MCX3VS5(`nV5cmT-RaJOP-&po@Nj6KQ@WmC=*se{@Gb_0t^rw)ET zs~Dq!WL26X46#5YqZ<~YgWy)WryAY}))4>8t6St7@0WjTl|ef0LY}z3;&NximZc<; z?_~qd>JiJd$fj-r5_Px?5x-)EQCZ{_l4ke2#v@iVN?i4}_gaFZv zQa~DM%gb4B{$ZvKqxE93Zk-q%(nH}&uhrZ&)oSAjZRKx>!Y93Jy&W`&djL5eajep)_Q>yI=a(D|(^|oBz3p|X3`d}BkrY1(LbRC&i#5GBN@1g(nP6+ z14xUGH?#Xc#yYwi$LG)dYb%P`ym&JD7L%fwfg}PpFc1yAZjR6<@Xm;cU}^a$L)8RN z!tnf7u&lvcc5$En!W-WeOzXe}I6N+Mr3Bpy*Qwg~+IkB0M4PNoOU^3|HO9TJF8{@{gb6V|2TY z<3WQ+kB-+v)p&^J(P`5oBKf(nzqoy4T$@SzM{%^IFWR%0%MEjUZcs%^HT6?BgUVoN z!R<@=^<7IY{d947Tt6^wdz>cR}>6 zhHR}sHS_TK+AeA4BDJe~vMk1m!K88C<@(U`a+4|zVvgnZ=|PS9L4}f+Xu@VUzDCYV zKB1a9-41>w-uSiG^4p%HS4Ir}U~M^3*hY3b=ZhA?!A?NMjU4w8kEK<|2(fzehOoWy0DS_;UPO|Jn8bKc0;Y#cKxcuHXDr}G}u7c$uBN@II zo4X5Rm=QSiZOr53dCZiQBX>{ddwxE;k`}9K#AEgYSt5IvGE*z&HF3P2YrcH>*zYPg z$$}MFJ0S0;fC=o^PvI=XbR@n(dEDZe3+B<9(&4$;LVfyVvf6GC(M zk4f4NEE#ZcLjHS=)JqlGrT zo~DHx`~~OA35f|;w=^cJ$?|R;bGp|ZR2MX0;ip~a_R_M-jXB0c_0r^sLW+drW2j^I zUUQnK($eo*A#l<`MSz^7IV;&}X!>HCh>>_U+w1b19mFD^Noxa=OD*g)k--s96bcVceZNFYS9g#8Dp>vkWK!Ti5eZnbS1cZrT5*ND|5Q@c` z>V{s3vJes6{$BvT7eVN0Sg&K(Y6bvwt&5bY-);PiQPY;~e&BOw-`tTsy7n49YO>q% z@rza*ymQYxG&npm`rVs10Z~ccAHAD?tR4XELj~8$g;Yv?SPD7IxEk92Pvny4T#~$p zLX&n__S)wrTU7I+uHze+>NI|kM7pT}&=VJ;ZPmQ4iB;8w{oF;%Ht*G`Pv5?M`wW@6 z>cnlocu87(T6%g)yj-bIjSUP42zc%P_G3VBd~C>_&0{M7P#^q^--Q&Ps={FTo3M~T z4-?eoA$do9NYX#ga2xHCRh%5FckD83)}qbFPoLlM=g_7_jP6P1WK|eTl9If((Q5uh zGAd9At9jBZ7fd?Eo3yLdZrR1#cTexwzG}v>_La*zS(mS40V4y8lC|s9sZ*!2Q`Hvr zTXyc$*11yEn$?|LTwN;}>Eu)^)QeRpRitF;GLF`ztJi5a^p8%}OIqk^Yip}%YG`Zg zSzE{^O^t5{X?|sOnF2sVUrz_Xc8-(pD32$}2&3Hk`*4rTj{{Pfk!z`nenz@y)#^=> z!Fpc<<@h@1ZMUWlYV9&`>GnU0iS_$#IrH(W1u!E$PQqZ%h|m~W%u)`+iU zfIB}ccnYhx95rdp%Bj=F{jqG$)bZnYc!m~k$3`C3;lzVu#ts~|D~^=blC|DQC~5Q6 zW5?#(@ZGF0YY~@AoRZT|%9nVw;31^*>F+gLV3f7AhTIsCA5!@T&1N$-H6N0#^XT(W zlEk-sW47J@2}nt*${H_~me2Y%C*Ovnv#l=c$uDMkbonRkbpNYhpu{hJ22yfUwh;9x zasKMD1=v%YWL4@wUe2S-qQ^f6sa*vlrSa}3AVvB494;iuk6#-M_vGL5lx&CPJ&w++ z%Wr-TQkx4#$~f|8ASE{l*r$=?-tS_{m$GZwUAE0@5%T*bNVVn9Kx&elDL*1f*RR)+ z0%Br6+g_s^d>G0befgsV)gW1*z9^ z9ZPlp!?t-mwm=8jf^Q&o?I+m<5^V|^$S?R)kouAxRF{{Epk82z-E-9ffinGR+26Ys z@KmehpMw;K75K7d1%_2fDi?V3!k=brI#s|^=kX^Y^+Btkm;J`p-LnM>fIrW%Anla@ z!k6^3kmAUK-gbMzZD*7(;NG!6(Eyy$E}w_${euJlzJ*j^0TBlF-->Cvl|QGd`BP7S zjxfmkssALe(>L=aY|4NB9sg$QOY*x_^x)?@5KeZ>>#5Z;ekM|U*(#s6onMIN=j3y5 z!=LVK$pb2ve`572RIZ@wi`su>923_roOLPh6(4l|b9kFZeFn6y z>N4!eKMQ~?WbVcNHX_X|iAuFh&&^(i+*6v;RsC-c$KQ<#nFxv zOMM3Cye1Lfj>(QZWr%nS_-ucRtdQp5J8`Dwl4DMF1VYl@P zL0QtAM(Bix2e$6X%GoLPKRUT>wGt&){4+jP%pjO#UA0h>tdDgT=C7nAU*AxuGZsL# zC}vi~6Y}%$&W&2M>o9P{_&+)eN6uD*Cwbo@iXa2-kB$E2)K+PZ=w5!yRzx7-e_N+tKL%pe&Kr;J8|)2L4f zN$zjmxwmtrgACs-1d3#Y4k!iyLu5Bc*Olc$ssDN{`^yDw!0iAF{30dS(nme#med7$hx&dT+^O zYsU;{obfw9y(r3lA{WX6BtP~SQbn97R|>FbF{8&S+08-fuilKW`I8B`CJ^^LYNYVwb zIUhCRc+DBGgA+;4&HAK)8YMsSMI#h>Wv}{zNna9ggpg#|qf2v8XYNoZxRc~V%x53u z2y*n!PPfoN?TMkjq$6M(%nV1vw5E7OGO9K@&`Sj9JDu%;PUVH5>fQh*=Lkcp1R=rN zC@f6V7XrRBOTf9bj)rl)w}qd5n=prDGx0Uwz(D_=xX}qcXD`~S=+5-VMv}d8E>^8n zCdql&dKZ02Mu{quBwfT|>2^3yGRaL7s6{fY2|Q%Cx4=D;No~-hmTG*i73BcwA;hqq z#7xe`)wt-`tl|I+kh3OR>-QR!q#f04(25!M#_5fPcmHIr;@1=XJc7UQgEmA$sBk|3 zIrpA)lI2Dnfthp^GG%0x7QIUk{uuo|d}=c`lI#UQECNI?+6(SFg|%iUd`MoywhXPZ z5|X|u^)8?b$+s$od(RxcaM8jn<2j6lw4l?0QpYw7))C1w&t6A2fm^ASYJ?SUycG^G z+<|lC4wzN0=2FoJ%efQ=hT6t+viMupunbb^>xWM4s^L=(7bLnBB~?Yj92$v9X8Ho> zZVXu$`ip>J&)$B>4wxJuJ91$Z#XUT%` zixmPhhhXvbGpl-IPK-)*DM?l~IE7v$pUWNdj=U^C0rG$l-rWC5_6I#FEYd|Bn&--X zvKE)g^-wpm)>7BNT26vt$&$t$<;9U|^4aHs!_Uy4-H>zyjf2pMC9DMYvJyST#RUFe zSzFUOo+SHd%mV<`idoBVFYJzbc{q||FrTinJ?%}fWV742QJ=m8sXo|Evbt4#ri5t? z;nkpi?iw9rcnpY?9Y)hd_Yh{1^cLy2so%6!eN^O0`8hR0Mt~_<>wHETv5*=qvqXs= zFn4NuB-=tZ0p8?tT!~MO3SW(r9FC`%8QG^Cr&!ct$efoXMtk3$BhI6ZO-W1G$)gv$0Vcp%j=)A%6}^`mR`As{Ef}fp zm`}2bW+X{AQG09iRgiEc^A1CoZ&@R5`cak=qD0+Om__obkSo(0jd=ytNs>2I0c|y) z)NVP8DXA?8W*=2t8>zg?5kuTlBa)mTEJkZ_hs{-U9!*xkhW$_`hH1qtq(`uUUGae) z0RBu6%RXlVaf~~#j%06aN^_nI~t91XNTJ{qmq18$lSSH{}3M4?4_-uGJA zb2WtIU~Hznz_Sv}4HCYWPu0avlARobNG4lpJ*ptfS(lZO<-KTyjN#~E49VmwCUN)C zm!zj~y5BuiA?E_6+cZ^Izxe{q$k0E5&O-3VOhI0Xu{HYS?40w_4K32h(Sm3&9G6*F z^;2ftqv1|+o<0xiWiDE1+^JNYT#G&IfVbpiU{9KkEy*uCtg)TsN+ST(-f%9UT?Z%; z>RS9B$tGDx!q&k%*<78>;;8xlyq3L!~pGtCQ*(6%nyb=Bi6c5xp< zUPyPf<}luJeRo~)dHP@g$*7)hNb(g%lH5^I6)1h355&t!a;*x6CXl3$AzB}Od$A&p zvxsfZM_08=A&*Q>Cdof=o3-Fr#u!kMpE&hEjI}%lH5?+hU?;obBk2RYNz>6=)3b&? z>~;zhdu+w2@#Ixtd1fQWx6&HW+x})g8puWHYOL7(98G66xZFFrAdDAy}JKU z$ltQMM7`zU!@&ueIi((96vsSDs{sHOm&vGE3b{Tq566(CH0X(EJeeeOc-*6O#MQ}bOZngx!xCJswwclYp5Wz3TW@w*O}V8nlX@1BK^N*p9))+ zmDC{q$nuRRz7+C4-&OLp=i$;08)FM|OY*yr9LmMzV3NV3!$|sfd6uK9cSu2|f#dxv zwa(gV@K$w>z%!p>9(kr+jiIN*(n^4Y`Z{Q z{J($XXjEw2^Fklj?Z2{#y^%Js=!XEKN^`kvwP20utJ_MqHB)oQu6gv zMc+)s{Jji(`#Cx;J|^mIRkl9XP1kQB3FcYLaZ{w)EZ zLe|{BqzP4Z=L`JORSdf|%;;RKJDvFYzD&~5h# zkL5bJ*>fb115mN3+Iixo7z@qQBzIyUN5Xa3iV?bKoWvyec6D4TwSWIjq}*Y8gA7aH zN-Y2|-E|IAN%Exup#Oo4FNM=W%`NEl=GFFkz$B(wz?&&$VWEXVB%jEw8}rowxcHMy zkR{ORD1YmO43bPKk59AAu92B*wE>J}^tO+|fUPxPRor>FJ@h)PnK)c-aCm8k0scXm zWCy6}0XO+l@VB%BD!)*HwvoQBt$NluwBd$0LXtcTt2C1S8csea&YLK6HZ+MOKWak% zIe8khSOITHJ}HW|62Z%(`S@E%eKu5kEOWmAI&G^%G*MMZUD+HZF`68fb&~WboFmD@ znz)#try0mi7!do~8V~rzQ0y zoI0sFW-`nSn{zzpqb3fLoQ~sMjU}88gL`CD4A-ME`Tprp3=6!zU0pLk@t$3PY2Q6( zf|Bj9ljLv=W@!MeRqe;2-8&hI*(7Oz*`7zPWO6BH@XH)gabHTx=_N&IluQ&%USEuL zbg+?^(1+cHBczgw0{I;JlpCp3dn4gH42LYMbtAwbndEA0<_4H#XWqytVJUE+lW6t? z#d#V1NZvv$B3mSwN?Z>|jh`sGn7m1MA~x_5Pr7KU(qw*y_@iHbgW=Ji(bDMD%tUyg-M-C`IL zL$bOkEZ!LQ4QiI&MJ8D|?}K@`7do;xw#u9}KySCJiQh)*tZYa{>DkH*Edv3Sa&4PC zV=^gi0n`2Rs)?p$Br*@hWN*~wIGiCFq+^yGsH*>&WT2k;7uhu3W~p+A^tyCtM^@^M zkLot5yo$x-Vq8CqGJ$@d1ns^qNzTAgxvv7Hv!%O$yQ}+G5kYo}z*KB%q@5i0D29vh zM}TmOLO+FSV5_tL?{aFL9y*k#kfXF9MTq2F+ zUb!E$ulfd5C0_zSm8`|jMdc%#s0NUI)uSSc1JG)4k9;P)>quNAzYK7yHe{nK7Ub;X zF!^aL197fG(GsVi7^TlS7UYxNoi#j3urDSN!C{TB5^Q%{Q6T0$z&@> z30+ZDy+~>!2zgI;qx<*}ci?NbXzsusLt*v^v4Bp9k!>CJoDXAMI2=YZLvkczIjDbQr1c|HdM~y1k zE2C<=?8=?Rfn-8sG)U%798A(x8@Tbro7f?^7a}m%rzjyI>8(*F>4B`d?h3gBR$<()0yc-)jvv8GNj3A$)-*uHvu{?Y;~f9k6KqE zpSkl%#^sBYq=JwWKeZ3c!bx&0uFW9zT3jIc2}ROLX$1@-Jy2gN zAB8Bt?6{r&XlSmyzd_f`lc!||*mT2?2sQnl%^fuWE%|wm(e}b;_e3RKO0DY)Y}D+%m2E1?8zy?*TvI2Id~Gd$&WAP3-{(8~#-Q51YkS)HImkGeSu}OBDdnJI&{zB$+Yd zVHnCV<;i@p{uOd5O8j-i65Gids2-UwQsF{6mXcS~`T=6tUM1PFd!CN4KcHt_b-nNb zo7sQlTS4cGU-^_b%>g|}Zyv2vtmg7BBIV}3AYoFz#Wq*9{b^E2_ zNfDpRwHa>>w0uJ{V}?3lb)Dpu<0%D5p2eWKPrjAtQ)%;5T8EWLDUnhl^dRg3&1ONa4 M07*qoM6N<$f=|l0 +
+

Remember the Real Goals

+
    +
  1. Accurately rank the plans.
  2. +
  3. Don't spend more time optimizing than you get back.
  4. +
  5. Don't pick a plan that uses more memory than you have.
  6. +
+
+ +
+

Accounting

+

Figure out the cost of each individual operator.

+

Only count the number of IOs added by each operator.

+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationRATotal IOs (#pages)Memory (#tuples)
Table Scan$R$$\frac{|R|}{\mathcal P}$$O(1)$
Projection$\pi(R)$$\textbf{io}(R)$$O(1)$
Selection$\sigma(R)$$\textbf{io}(R)$$O(1)$
Union$R \uplus S$$\textbf{io}(R) + \textbf{io}(S)$$O(1)$
Sort (In-Mem)$\tau(R)$$\textbf{io}(R)$$O(|R|)$
Sort (On-Disk)$\tau(R)$$\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$$O(\mathcal B)$
(B+Tree) Index Scan$Index(R, c)$$\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$$O(1)$
(Hash) Index Scan$Index(R, c)$$1$$O(1)$
+ +
    +
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. +
  3. Size of $R$ ($|R|$)
  4. +
  5. Pages of Buffer ($\mathcal B$)
  6. +
  7. Keys per Index Page ($\mathcal I$)
  8. +
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationRATotal IOs (#pages)Mem (#tuples)
Nested Loop Join (Buffer $S$ in mem)$R \times_{mem} S$$\textbf{io}(R)+\textbf{io}(S)$$O(|S|)$
Block NLJ (Buffer $S$ on disk)$R \times_{disk} S$$\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$$O(1)$
Block NLJ (Recompute $S$)$R \times_{redo} S$$\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$$O(1)$
1-Pass Hash Join$R \bowtie_{1PH, c} S$$\textbf{io}(R) + \textbf{io}(S)$$O(|S|)$
2-Pass Hash Join$R \bowtie_{2PH, c} S$$\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$$O(1)$
Sort-Merge Join $R \bowtie_{SM, c} S$[Sort][Sort]
(Tree) Index NLJ$R \bowtie_{INL, c}$$|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$$O(1)$
(Hash) Index NLJ$R \bowtie_{INL, c}$$|R| \cdot 1$$O(1)$
(In-Mem) Aggregate$\gamma_A(R)$$0$$adom(A)$
(Sort/Merge) Aggregate$\gamma_A(R)$[Sort][Sort]
+ +
    +
  1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
  2. +
  3. Size of $R$ ($|R|$)
  4. +
  5. Pages of Buffer ($\mathcal B$)
  6. +
  7. Keys per Index Page ($\mathcal I$)
  8. +
  9. Number of distinct values of $A$ ($adom(A)$)
  10. +
+
+
+ +
+
+

Cardinality Estimation

+

(The Hard Parts)

+ +
+
$\sigma_c(Q)$ (Cardinality Estimation)
+
How many tuples will a condition $c$ allow to pass?
+ +
$\delta_A(Q)$ (Distinct Values Estimation)
+
How many distinct values of attribute(s) $A$ exist?
+
+
+ +
+

Remember the Real Goals

+
    +
  1. Accurately rank the plans.
  2. +
  3. Don't spend more time optimizing than you get back.
  4. +
+
+ +
+

(Some) Estimation Techniques

+ +
+
+
Guess Randomly
+
Rules of thumb if you have no other options...
+
+ +
+
Uniform Prior
+
Use basic statistics to make a very rough guess.
+
+ +
+
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+
+ +
+
Histograms
+
Using more detailed statistics for improved guesses.
+
+ +
+
Constraints
+
Using rules about the data for improved guesses.
+
+
+
+
+ + +
+
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+ +
+

Idea 1: Pick 100 tuples at random from each input table.

+
+ +
+ +
+ +
+

The Birthday Paradox

+ +

+ Assume: $\texttt{UNIQ}(A, R) = \texttt{UNIQ}(A, S) = N$ +

+ +

+ It takes $O(\sqrt{N})$ samples from both $R$ and $S$
to get even one match. +

+
+ +
+

To be resumed later in the term when we talk about AQP

+
+ +
+

How DBs Do It: Instrument queries while running them.

    +
  • The first time you run a query it might be slow.
  • +
  • The second, third, fourth, etc... times it'll be fast.
  • +

+
+
+ +
+ +
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+ +
+

Limitations of Uniform Prior

+ +
+
+
Don't always have statistics for $Q$
+
For example, $\pi_{A \leftarrow (B \times C)}(R)$
+
+ +
+
Don't always have clear rules for $c$
+
For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+ +
+
Attribute values are not always uniformly distributed.
+
For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+ +
+
Attribute values are sometimes correlated.
+
For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+ +
+
+ +
+

+ Ideal Case: You have some + $$f(x) = \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$ + (and similarly for the other aggregates) +

+

+ Slightly Less Ideal Case: You have some + $$f(x) \approx \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$ +

+
+ +
+

If this sounds like CDF-based indexing... you're right!

+ +

... but we're not going to talk about NNs today

+
+
+ +
+
+

+ Simpler/Faster Idea: Break $f(x)$ into chunks +

+
+ +
+

Example Data

+ + + + + + + + + + +
Name YearsEmployed Role
'Alice' 3 1
'Bob' 2 2
'Carol' 3 1
'Dave' 1 3
'Eve' 2 2
'Fred' 2 3
'Gwen' 4 1
'Harry' 2 3
+
+ +
+

Histograms

+ + + + + + +
YearsEmployedCOUNT
1 1
2 4
3 2
4 1
+ + + + + + +
COUNT(DISTINCT YearsEmployed) $= 4$
MIN(YearsEmployed) $= 1$
MAX(YearsEmplyed) $= 4$
COUNT(*) YearsEmployed = 2 $= 4$
+
+ +
+

Histograms

+ + + + +
YearsEmployedCOUNT
1-2 5
3-4 3
+ + + + + + +
COUNT(DISTINCT YearsEmployed) $= 4$
MIN(YearsEmployed) $= 1$
MAX(YearsEmplyed) $= 4$
COUNT(*) YearsEmployed = 2 $= \frac{5}{2}$
+
+ +
+

The Extreme Case

+ + + +
YearsEmployedCOUNT
1-4 8
+ + + + + + +
COUNT(DISTINCT YearsEmployed) $= 4$
MIN(YearsEmployed) $= 1$
MAX(YearsEmplyed) $= 4$
COUNT(*) YearsEmployed = 2 $= \frac{8}{4}$
+
+ +
+

More Example Data

+ + + + + + + + + + +
Value COUNT
1-10 20
11-20 0
21-30 15
31-40 30
41-50 22
51-60 63
61-70 10
71-80 10
+ + + + + + + + + + + +
SELECT … WHERE A = 33 $= \frac{1}{40-30}\cdot 30 = 3$
SELECT … WHERE A > 33 $= \frac{40-33}{40-30}\cdot 30+22$ $\;\;\;+63+10+10$ $= 126$
+
+
+ +
+
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+ +
+

Key / Unique Constraints

+

+      CREATE TABLE R ( 
+        A int,
+        B int UNIQUE
+        ... 
+        PRIMARY KEY A
+      );
+    
+

+ No duplicate values in the column. + $$\texttt{COUNT(DISTINCT A)} = \texttt{COUNT(*)}$$ +

+
+ +
+

Foreign Key Constraints

+

+      CREATE TABLE S ( 
+        B int,
+        ... 
+        FOREIGN KEY B REFERENCES R.B
+      );
+    
+

+ All values in the column appear in another table. + $$\pi_{attrs(S)}\left(S \bowtie_B R\right) \subseteq S$$ +

+
+ +
+

Functional Dependencies

+ +

+      Not expressible in SQL
+    
+ +

+ One set of columns uniquely determines another.
+ $\pi_{A}(\delta(\pi_{A, B}(R)))$ has no duplicates and... + $$\pi_{attrs(R)-A}(R) \bowtie_A \delta(\pi_{A, B}(R)) = R$$ +

+
+ +
+

Constraints

+ +

The Good

+
    +
  • Sanity check on your data: Inconsistent data triggers failures.
  • +
  • More opportunities for query optimization.
  • +
+ +

The Not-So Good

+
    +
  • Validating constraints whenever data changes is (usually) expensive.
  • +
  • Inconsistent data triggers failures.
  • +
+ +
+ +
+

Foreign Key Constraints

+ +

Foreign keys are like pointers. What happens with broken pointers?

+
+ +
+

Foreign Key Enforcement

+ +

Foreign keys are defined with update triggers ON INSERT [X], ON UPDATE [X], ON DELETE [X]. Depending on what [X] is, the constraint is enforced differently:

+ +
+
CASCADE
+
Create/delete rows as needed to avoid invalid foreign keys.
+ +
NO ACTION
+
Abort any transaction that ends with an invalid foreign key reference.
+ +
SET NULL
+
Automatically replace any invalid foreign key references with NULL
+
+
+ +
+

+ CASCADE and NO ACTION ensure that the data never has broken pointers, so +

+ $$\pi_{attrs(S)}\left(S \bowtie_B R\right) = S$$ +
+ +
+

Functional Dependencies

+ +

A generalization of keys: One set of attributes that uniquely identify another.

+ +
    +
  • SS# uniquely identifies Name.
  • +
  • Employee uniquely identifies Manager.
  • +
  • Order number uniquely identifies Customer Address.
  • +
+ +

Two rows with the same As must have the same Bs

+

(but can still have identical Bs for two different As)

+
+ +
+

Normal Forms

+

"All functional dependencies should be keys."

+

(Otherwise you want two separate relations)

+

(for more details, see CSE 560)

+
+ +
+ +

+ $$P(A = B) = min\left(\frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}, \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } B)}\right)$$ +

+ +
+
+ +

+ $$R \bowtie_{R.A = S.B} S = \sigma_{R.A = S.B}(R \times S)$$ + (and $S.B$ is a foreign key referencing $R.A$) +

+ +

+ The (foreign) key constraint gives us two things... + $$\texttt{COUNT}(\texttt{DISTINCT } A) \approx \texttt{COUNT}(\texttt{DISTINCT } B)$$ + and + $$\texttt{COUNT}(\texttt{DISTINCT } A) = |R|$$ +

+ +

+ Based on the first property the total number of rows is roughly... + $$|R| \times |S| \times \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}$$ +

+ +

+ Then based on the second property... + $$ = |R| \times |S| \times \frac{1}{|R|} = |S|$$ +

+ +

(Statistics/Histograms will give you the same outcome... but constraints can be easier to propagate)

+
+
+ diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg b/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg new file mode 100644 index 00000000..b84b5f70 --- /dev/null +++ b/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg @@ -0,0 +1,279 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + σ + R + S + T + + + + + + + + 100 Tuples + + + + 10 Tuples + + + + 100 Tuples + + + + 0 Tuples + + + + 0 Tuples + + +