From 7f013655b43a1c3512cec53c66d5edf9bdffc76e Mon Sep 17 00:00:00 2001
From: Oliver
Date: Sun, 7 Mar 2021 22:39:50 -0500
Subject: [PATCH] slides
---
src/teaching/cse-562/2021sp/index.erb | 12 +-
.../slide/2021-02-18-QueryAlgorithms.erb | 17 +
.../2021sp/slide/2021-03-04-Indexing2.html | 2 +-
.../2021sp/slide/2021-03-09-CostOpt1.erb | 555 ++++++++++++++++
.../slide/2021-03-09/EstimationXKCD.png | Bin 0 -> 22646 bytes
.../2021sp/slide/2021-03-11-CostOpt2.erb | 606 ++++++++++++++++++
.../2021sp/slide/2021-03-11/JoinIssue.svg | 279 ++++++++
7 files changed, 1466 insertions(+), 5 deletions(-)
create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb
create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png
create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-11-CostOpt2.erb
create mode 100644 src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg
diff --git a/src/teaching/cse-562/2021sp/index.erb b/src/teaching/cse-562/2021sp/index.erb
index f06aac31..87ec2bbe 100644
--- a/src/teaching/cse-562/2021sp/index.erb
+++ b/src/teaching/cse-562/2021sp/index.erb
@@ -51,12 +51,16 @@ schedule:
materials:
slides: slide/2021-03-04-Indexing2.html
- date: "Mar. 9"
- topic: "Spark's Optimizer + Checkpoint 2"
- due: "Checkpoint 1"
- - date: "Mar. 11"
topic: "Cost-Based Optimization"
- - date: "Mar. 16"
+ due: "Checkpoint 1"
+ materials:
+ slides: slide/2021-03-09-CostOpt1.html
+ - date: "Mar. 11"
topic: "Cost-Based Optimization (contd.)"
+ materials:
+ slides: slide/2021-03-11-CostOpt2.html
+ - date: "Mar. 16"
+ topic: "Spark's Optimizer + Checkpoint 2"
- date: "Mar. 18"
topic: "Distributed Queries: Challenges + Partitioning"
- date: "Mar. 23"
diff --git a/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb b/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb
index b38ce150..d7be63f8 100644
--- a/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb
+++ b/src/teaching/cse-562/2021sp/slide/2021-02-18-QueryAlgorithms.erb
@@ -17,6 +17,23 @@ textbook: "Ch. 15.1-15.5, 16.7"
More similar examples with Union and Cross would also help.
Might help to tighten up the time spent a little too. I had to cut out before introducing Sort-Merge Joins
+
+
+-------
+ 2021 by OK:
+
+ Applied changes above. Things went better.
+
+ Looking at costs in terms of the "overhead" of each operator is proving to be *really*
+ hard for the students to grasp. I suspect it might be easier for the students to grasp
+ a recursive definition.
+
+ e.g., cost(\pi(R)) = cost(R)
+
+ This would, among other things, make the (B)NLJ cost a lot easier to specify.
+
+ I made these changes already to 03-09-CostOpt1, so they should probably be backported here next time I teach the class.
+
-->
diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html b/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html
index c8208fd1..8b63c9c8 100644
--- a/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-04-Indexing2.html
@@ -1,5 +1,5 @@
---
-template: templates/cse4562_2019_slides.erb
+template: templates/cse4562_2021_slides.erb
title: "Indexing (Part 2) and Views"
date: March 4, 2021
textbook: "Papers and Ch. 8.1-8.2"
diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb b/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb
new file mode 100644
index 00000000..a47dbf22
--- /dev/null
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-09-CostOpt1.erb
@@ -0,0 +1,555 @@
+---
+template: templates/cse4562_2021_slides.erb
+title: "Cost-Based Optimization"
+date: March 9, 2021
+textbook: Ch. 16
+---
+
+
+
+
+
+ General Query Optimizers
+
+ Apply blind heuristics (e.g., push down selections)
+ Enumerate all possible execution plans by varying (or for a reasonable subset)
+
+ Join/Union Evaluation Order (commutativity, associativity, distributivity)
+ Algorithms for Joins, Aggregates, Sort, Distinct, and others
+ Data Access Paths
+
+
+ Estimate the cost of each execution plan
+ Pick the execution plan with the lowest cost
+
+
+
+
+
+
+ Idea 1: Run each plan
+
+
+
+
+ © Paramount Pictures
+
+
+
+ If we can't get the exact cost of a plan, what can we do?
+
+
+
+ Idea 2: Run each plan on a small sample of the data.
+ Idea 3: Analytically estimate the cost of a plan.
+
+
+
+ Plan Cost
+
+
+
CPU Time
+ How much time is spent processing.
+
+
+
+
# of IOs
+ How many random reads + writes go to disk.
+
+
+
+
Memory Required
+ How much memory do you need.
+
+
+
+
+
+
+
+ Remember the Real Goals
+
+ Accurately rank the plans.
+ Don't spend more time optimizing than you get back.
+ Don't pick a plan that uses more memory than you have.
+
+
+
+
+
+
+
+
+ Accounting
+ Figure out the IO cost of the entire * subtree.
+
+ Only count the amount of memory added by each operator.
+
+
+ * Different from earlier in the semester.
+
+
+
+
+
+ Operation RA Total IOs (#pages) Memory (#tuples)
+
+ Table Scan
+ $R$
+ $\frac{|R|}{\mathcal P}$
+ $O(1)$
+
+
+ Projection
+ $\pi(R)$
+ $\textbf{io}(R)$
+ $O(1)$
+
+
+ Selection
+ $\sigma(R)$
+ $\textbf{io}(R)$
+ $O(1)$
+
+
+ Union
+ $R \uplus S$
+ $\textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Sort (In-Mem)
+ $\tau(R)$
+ $0$
+ $O(|R|)$
+
+
+ Sort (On-Disk)
+ $\tau(R)$
+ $\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$
+ $O(\mathcal B)$
+
+
+ (B+Tree) Index Scan
+ $Index(R, c)$
+ $\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$
+ $O(1)$
+
+
+ (Hash) Index Scan
+ $Index(R, c)$
+ $1$
+ $O(1)$
+
+
+
+
+ Tuples per Page ($\mathcal P$) – Normally defined per-schema
+ Size of $R$ ($|R|$)
+ Pages of Buffer ($\mathcal B$)
+ Keys per Index Page ($\mathcal I$)
+
+
+
+
+ Operation RA Total IOs (#pages) Mem (#tuples)
+
+ Nested Loop Join (Buffer $S$ in mem)
+ $R \times_{mem} S$
+ $\textbf{io}(R)+\textbf{io}(S)$
+ $O(|S|)$
+
+
+ Block NLJ (Buffer $S$ on disk)
+ $R \times_{disk} S$
+ $\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Block NLJ (Recompute $S$)
+ $R \times_{redo} S$
+ $\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$
+ $O(1)$
+
+
+ 1-Pass Hash Join
+ $R \bowtie_{1PH, c} S$
+ $\textbf{io}(R) + \textbf{io}(S)$
+ $O(|S|)$
+
+
+ 2-Pass Hash Join
+ $R \bowtie_{2PH, c} S$
+ $\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Sort-Merge Join
+ $R \bowtie_{SM, c} S$
+ [Sort]
+ [Sort]
+
+
+ (Tree) Index NLJ
+ $R \bowtie_{INL, c}$
+ $|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$
+ $O(1)$
+
+
+ (Hash) Index NLJ
+ $R \bowtie_{INL, c}$
+ $|R| \cdot 1$
+ $O(1)$
+
+
+ (In-Mem) Aggregate
+ $\gamma_A(R)$
+ $\textbf{io}(R)$
+ $adom(A)$
+
+
+ (Sort/Merge) Aggregate
+ $\gamma_A(R)$
+ [Sort]
+ [Sort]
+
+
+
+
+ Tuples per Page ($\mathcal P$) – Normally defined per-schema
+ Size of $R$ ($|R|$)
+ Pages of Buffer ($\mathcal B$)
+ Keys per Index Page ($\mathcal I$)
+ Number of distinct values of $A$ ($adom(A)$)
+
+
+
+
+
+ Symbol Parameter Type
+
+ $\mathcal P$ Tuples Per Page
+ Fixed ($\frac{|\text{page}|}{|\text{tuple}|}$)
+
+
+ $|R|$ Size of $R$
+ Precomputed$^*$ ($|R|$)
+
+
+ $\mathcal B$ Pages of Buffer
+ Configurable Parameter
+
+
+ $\mathcal I$ Keys per Index Page
+ Fixed ($\frac{|\text{page}|}{|\text{key+pointer}|}$)
+
+
+ $adom(A)$ Number of distinct values of $A$
+ Precomputed$^*$ ($|\delta_A(R)|$)
+
+
+ * unless $R$ is a query
+
+
+
+
+
+
+
+ Estimating IOs requires Estimating $|Q(R)|$, $|\delta_A(Q(R))|$
+
+
+
+ Cardinality Estimation
+ Unlike estimating IOs, cardinality estimation doesn't care about the algorithm, so we'll just be working with raw RA.
+
+ Also unlike estimating IOs, we care about the cardinality of $|Q(R)|$ as a whole, rather than the contribution of each individual operator.
+
+
+
+
+
+ Operator
+ RA
+ Estimated Size
+
+
+
+ Table
+ $R$
+ $|R|$
+
+
+
+ Projection
+ $\pi(Q)$
+ $|Q|$
+
+
+
+ Union
+ $Q_1 \uplus Q_2$
+ $|Q_1| + |Q_2|$
+
+
+
+ Cross Product
+ $Q_1 \times Q_2$
+ $|Q_1| \times |Q_2|$
+
+
+
+ Sort
+ $\tau(Q)$
+ $|Q|$
+
+
+
+ Limit
+ $\texttt{LIMIT}_N(Q)$
+ $N$
+
+
+
+ Selection
+ $\sigma_c(Q)$
+ $|Q| \times \texttt{SEL}(c, Q)$
+
+
+
+ Join
+ $Q_1 \bowtie_c Q_2$
+ $|Q_1| \times |Q_2| \times \texttt{SEL}(c, Q_1\times Q_2)$
+
+
+
+ Distinct
+ $\delta_A(Q)$
+ $\texttt{UNIQ}(A, Q)$
+
+
+
+ Aggregate
+ $\gamma_{A, B \leftarrow \Sigma}(Q)$
+ $\texttt{UNIQ}(A, Q)$
+
+
+
+
+ $\texttt{SEL}(c, Q)$: Selectivity of $c$ on $Q$, or $\frac{|\sigma_c(Q)|}{|Q|}$
+ $\texttt{UNIQ}(A, Q)$: # of distinct values of $A$ in $Q$.
+
+
+
+
+ Cardinality Estimation
+ (The Hard Parts)
+
+
+ $\sigma_c(Q)$ (Cardinality Estimation)
+ How many tuples will a condition $c$ allow to pass?
+
+ $\delta_A(Q)$ (Distinct Values Estimation)
+ How many distinct values of attribute(s) $A$ exist?
+
+
+
+
+
+
+ Idea 1: Assume each selection filters down to 10% of the data.
+
+
+
+
+ no... really!
+ © Paramount Pictures
+
+
+
+ ... there are problems
+
+
Inconsistent estimation
+
$|\sigma_{c_1}(\sigma_{c_2}(R))| \neq |\sigma_{c_1 \wedge c_2}(R)|$
+
+
+
Too consistent estimation
+
$|\sigma_{id = 1}(\texttt{STUDENTS})| = |\sigma_{residence = 'NY'}(\texttt{STUDENTS})|$
+
+ ... but remember that all we need is to rank plans.
+
+
+
+ Many major databases (Oracle, Postgres, Teradata, etc...) use something like 10% rule if they have nothing better.
+
+
+ (The specific % varies by DBMS.)
+
+ (Teradata uses 10% for the first AND
clause, cut by another 75% for every subsequent clause)
+
+
+
+ (Some) Estimation Techniques
+
+
+
+
The 10% rule
+ Rules of thumb if you have no other options...
+
+
+
+
Uniform Prior
+ Use basic statistics to make a very rough guess.
+
+
+
+
Sampling / History
+ Small, Quick Sampling Runs (or prior executions of the query).
+
+
+
+
Histograms
+ Using more detailed statistics for improved guesses.
+
+
+
+
Constraints
+ Using rules about the data for improved guesses.
+
+
+
+
+
+
+
+
+
+
+ Uniform Prior
+
+ We assume that for $\sigma_c(Q)$ or $\delta_A(Q)$...
+
+ Basic statistics are known about $Q$:
+ COUNT(*)
+ COUNT(DISTINCT A)
(for each A)
+ MIN(A)
, MAX(A)
(for each numeric A)
+
+ Attribute values are uniformly distributed.
+ No inter-attribute correlations.
+
+
+ If necessary statistics aren't available (point 1), fall back to the 10% rule.
+
+
+ If statistical assumptions (points 2, 3) aren't perfectly true, we'll still likely be getting a better estimate than the 10% rule.
+
+
+
+
+ COUNT(DISTINCT A)
+ $\texttt{UNIQ}(A, \pi_{A, \ldots}(R)) = \texttt{UNIQ}(A, R)$
+ $\texttt{UNIQ}(A, \sigma(R)) \approx \texttt{UNIQ}(A, R)$
+ $\texttt{UNIQ}(A, R \times S) = \texttt{UNIQ}(A, R)$ or $\texttt{UNIQ}(A, S)$
+ $$max(\texttt{UNIQ}(A, R), \texttt{UNIQ}(A, S)) \leq\\ \texttt{UNIQ}(A, R \uplus S)\\ \leq \texttt{UNIQ}(A, R) + \texttt{UNIQ}(A, S)$$
+
+
+
+ MIN(A), MAX(A)
+ $min_A(\pi_{A, \ldots}(R)) = min_A(R)$
+ $min_A(\sigma_{A, \ldots}(R)) \approx min_A(R)$
+ $min_A(R \times S) = min_A(R)$ or $min_A(S)$
+ $min_A(R \uplus S) = min(min_A(R), min_A(S))$
+
+
+
+ Estimating $\delta_A(Q)$ requires only COUNT(DISTINCT A)
+
+
+
+ Estimating Selectivity
+
+ Selectivity is a probability ($\texttt{SEL}(c, Q) = P(c)$)
+
+
+ $P(A = x_1)$
+ $=$
+ $\frac{1}{\texttt{COUNT(DISTINCT A)}}$
+
+
+
+ $P(A \in (x_1, x_2, \ldots, x_N))$
+ $=$
+ $\frac{N}{\texttt{COUNT(DISTINCT A)}}$
+
+
+
+ $P(A \leq x_1)$
+ $=$
+ $\frac{x_1 - \texttt{MIN(A)}}{\texttt{MAX(A)} - \texttt{MIN(A)}}$
+
+
+
+ $P(x_1 \leq A \leq x_2)$
+ $=$
+ $\frac{x_2 - x_1}{\texttt{MAX(A)} - \texttt{MIN(A)}}$
+
+
+
+ $P(A = B)$
+ $=$
+ $\textbf{min}\left( \frac{1}{\texttt{COUNT(DISTINCT A)}}, \frac{1}{\texttt{COUNT(DISTINCT B)}} \right)$
+
+
+
+ $P(c_1 \wedge c_2)$
+ $=$
+ $P(c_1) \cdot P(c_2)$
+
+
+
+ $P(c_1 \vee c_2)$
+ $=$
+ $1 - (1 - P(c_1)) \cdot (1 - P(c_2))$
+
+
+
+ (With constants $x_1$, $x_2$, ...)
+
+
+
+ Limitations
+
+
+
+
Don't always have statistics for $Q$
+ For example, $\pi_{A \leftarrow (B \cdot C)}(R)$
+
+
+
+
Don't always have clear rules for $c$
+ For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+
+
+
Attribute values are not always uniformly distributed.
+ For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+
+
+
Attribute values are sometimes correlated.
+ For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+
+ ...but handles most usage patterns
+
+
+
+
+
\ No newline at end of file
diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png b/src/teaching/cse-562/2021sp/slide/2021-03-09/EstimationXKCD.png
new file mode 100644
index 0000000000000000000000000000000000000000..f31d42c6f84105970d809f7be0bd8d3ad81ff938
GIT binary patch
literal 22646
zcmYg$1x%bx)b#?ztrRF$eDUJ$E{hah+_kv7mjcCMafjmW?#|-w+TvQ=;phGGf5|t=
z3#nZ5{mC83A^8
zD=Dd|?2zo>6JQY(bo*rmMg^t@{x`^3TbjFB$+|=T!_30Wi7>bzg}#%YvZ$Z3
zF~|z#J$McS;eF^QKYxP1>mL(ZceU?kYNYW|yDz7NN
zTl%_OSmSGgX|Xw;+dn;U5zHQU1)EH{_}Nb33|>G*K-DwYgXRMHO5tzvw!zmN%4+c`
z@V{`S(kaYseJtF#j^EqA4f?6Wh{6owienSv1*j9585w=W3gy-HZj*Gg+asPygDb*=
zWxZtW6BkQyiqnc;)%c4y3b0B`@*hNx18#wtYI+QUq8%EK(rBOY7YGONSJQ^m+0r{X
z+6+5qjl*_7l<^(wrt_?d@N0NmBbrDP({Nf);lXQoa>KiuVY({nF=N{kk-Bdhc71qz
z4tVSi6O{{c57tHVC78e=$Fs!6#nQuKXLnOv3;NI~v7TSO6OzDL!ad8&_LYsjZqM9c
z%xto5vWj*|?Y>g84Or#^UQS1O8G)4QaYi=iyQ&_OavgEbb7b2lS~J)Puh6Z7=X_f=
z8memE&Lq-7bu@38guN+PuDc)Y44?hYbBtADWJv4rGe&|RfFZ)*IsMTiQ
zu^VEWF3)AZY*lOfgi)6HPKQJOzr1006JFPp8#gRxcbdVp2|sA
z0rg+peJ2#sb==SAHdQ{m`4#4DiOuEo}zc?Q(L={8+-$-+>R{21M
z`@>vZy3UG|$~WFcEk{pV>>ZTC@yX_+1ScTz6|g7p(8j4y6iCR1C48czQjz@+AKi&q
z=`k$FT#@23TjUqy@0F(zBO#&8xzD8Z^*EkX
zdK=%3Kj!`ZotKx__g5xlv;8?0m=}7N)qbbEBiv8d8iNebK?d!4703r~X0O@6Tw}jp
zh}uJ-Z%A|?v($!M<7-EGZe`NT)artB^2Sq|=FPl!fCunO)N
z4YZhGV9O^xRgN_7i|z7g|5W$VlG5bvZsf>{$f-P}vlL*s^x$f9CmTi7MATwvm(x!4
zdhG4~MQ|b>_@Kh|@xV><_U~id35J|buj$mZo(0IRRNXwwN2F`H$!d^xztJ|Wl?e#i
zVEXYpKDiY`pF17cr=}CnesbhH1{ZiuQ$&JmCBN?QR3{`)t^3wHu15+>dp>VqE3gm?
z{Cj)S`X^dj)X04vF@DDZ4g_i^}Y3_YJs!l>sD58C5$2l@<`>aF^wlJ(Z+VBMq20&;Nc#a@o!-DloqBp5@4e
zzNJRo5g@qvIa)Du5O#Y=>njPI;XcuqF(yXs==MNIV5*_I*d@t$ZMgYXxR30T@tU^1
zBI3)$lBg4^n}?_8v5ruyT{uiryo^*?nKYOML`JZTg}w53sxP9=4f*BjLnJZF-9?EG
z=GPq8Yu*(k+p~c)-aj*KGOb5L7J|+n&ya-mt-h0b^5d+7AapwBW6MhFzAxY^ci5r<
ze28g+7be6f7*wQu_uUn1p(fIZ{sAS32Pw;xkt3t2k^;>Q2c^W8hw@o7u&7R%~~ER?Zh;B#MWsR%sQ5+
zCv)JO3nNSn)GI4>^)PTpxwf}WJz%Ly5BOKVw6u1S6nXnNx=%%(h8K?+1R1S)S(O1X
zBqzT(CDo7>nWGFitC{eK<4{C0E0aMFX2I1zfg0kCDPM8@`j1?di9wJihhUdJ0=9o=
zy8~b&BH|xHK2fPa0$^Ixr^FRBo^r#V#Cf;Dq1ACaT+M%|LA3>D^^%B%;Qm&rL1YwS
z*_UTaCf`-XFCIMN|p;
zh2q?QeMG&FUm2%0?`5t?;_O}cNLe4IdsAU*go6#z|8Fw&?SfAm!4vG
zlzGH5geELKGANgn7Hq#!#%Srs=cFCEi@?D|nafP-54kZmfOQtOj$qqxejk}68!<@9
zL?hOw1ij!Xq4|fWxE**T>w>uwSkR0jBIio!qK~u_Lz-}mKkAH1gy7H
z;d6-alslL?T;QP3JJha@0wt6J?PQRsIH3p)6ubSgzMhovSi!d76@6yv-+%1H4c)Pt
zamh1f3Pnsc5};OC{D!*Xo<`I!_^EKgu-JD?tOEd%KU%r79TdIZ0;d>Io)4QRUT=>-
zRVcSv_9>Ifunmx$Hf7T8j;yGs0u6HO5*2trXY=LJ@d;@Ca3lk1hMUP&c_5)Oqx*Xu
zmEIsJjN84b0I!eGU#_)ORqPu`aw{i%QR-%~`7R&H#P%h!db?A0hH;l9Rjk(8+6v(2
zZO!FmplEQvT8Cm2lH8J1wDA}f&j#hG2^M#C*x_`o4hnAW7KoR0Cz$Ie
z#$U!tr?TN#wXafdBq5PX%%8`Df8>1HhjnzEiv;2U^zaDWa2BA+PCa#vq#)S^4FshJ
z#-hk1ls|J^JT^Ce4O+IVh;gUSM_xl9QV}%6!KV&JrSYe9$Ub$0@nE4@8pb9h@1N|&
zs=>TZgVly7(xD|LHQdhE$7~vVVy;&2
z^Vg9<{XDMfwPGkvw*VzYpN6I{D&6oYEn&QzeZ$ljJFtvqoKYuV6(4lQjM_Y61+2uM
zMmD=$_fKq)7wkGnENJzAWlyn|1k@+TT?MrI0bAA^tWH~}0AG{L{B+WxXo^uWa&JdY
zvM}M#kK}^V`d{0{(*4Y#bP$_D+Hq=0pq%p@%J|2GraWb!uN$QOLbm2vQjq|YiZ1Ss
z>E_HexH1&uS<_Hjze>NI`6n=2c(U_eeZXgFzJGI6sKG?7SGd_*v85taI8S57!bnyw
zYi6%R;)tJ)>r8_z@t@FZQ73rhZu9cqnJFT#n#q(sCFGRqjZ3A6K@Z)qH7=u?MUZY}
z^IY{2)6YX&S~^)s?p9BrhB58%C4Dmn!+%fclxV9XX*qd#QqK)#kUkF+*taWZp4I@8
zE!Ch-_2dW$Yf{HJoc2HIEwU(TEY)m;iYP+d7Zg*w=8q965^Ls>5DsKtqwnq-P_iwR5Of=a^rVzn&3WApXf&
zdn*ZWDG!+S*?D^vXE7
zKHPhYxqsOp68Rc3LQ3AgR-2aV=ebBYP$x*LNL?T+<|Ond%Ac!yB3b)~&ejeC4uGHJ8*7t)CV2^8IQk+Ha74f)sUsfm@fx;@m*bRR8e$SR
z#`h1%t^;RYFnzhOjv6#|V`wN}dneh+CVd)JgyB{d_xd+BoQ*^RV#(+4gmFfXA6a*Z
z1HzD=5+Cz&@|sqeRLd*HiqqiBE$Am;sb)vetuxCslHs+g{soNJkQl*zTkpjr>WfrD
zi=%$0hXAV3iMd4ru2yiWld^{Lz^R+kj|4gl+Ds84yhv;y^KXuNjr6oGZYP<)h5}uv
z#rhPpHpF>fzs7^kx>J@AQ$%Yy^+)ZFLP&sgVUK}zidOuQ@)5cg|FR;$wW*X4?LTOdhaE-_$DI3f+Q{_hq+-sm-{gljiD
zC($s(U=%M!-ii0NKU6wy2je>@iO^Fo=I%ksUPK=vk5xBai0_<~`(d?s?wAcO)J`1;
z#rm}oGbp}T6i?9T3Jo}_WMwV&_bDJ=j{Nf_M;5`;0O2F4N&LUO-`JZb9p&b+-(fuA
zMoBAIsK`55T(0tO42GGdIKwslwk?MRHOm~*^H?>#0Ag!>KQ&Wrfp!YV#L^ePZc;5}vShP}y4n|+2ZOB_|&8hw~b
zxeTB86oFO7!Y!Xl0lm;sK)^#j=^Can#htllXd!ukG9FZTOHN%wjz4hpBs45FZ5$8f
zw%Ej@cV829$y}V8P(n3(q$0fboebkXP$u|jZEK&UB%fJ-b-v>bWJKbaErs)-43|dD
zH7^^UR^9Oc?kZEcti#GV%#sQ@B!vAaf6LZddKq_PZKggK_6H^PFa-0%roK)1#z%b^
zfM#gC&P9Sv1Z2G_D-J?J+jVpscDF-mW9Aywgjm}ybczP=TLUiy!Tp+F_V339m_?=R
z3Q2$*8m$g(57z@W8Cquij&;RP_9s&jBK$ESq<@uqbd(`!G6K04F>*yjd9EYY6&uKk_*9`*YrW68my1hu#hb32AB+pR7O{k=eH|wb6M$^_L_)(?x*<
zi7ja4rryOmBl6_rrM2hmIN~Xmnt=-2(wCjo-%fuL5twgCh)o1=Lm5;2$8F`))t%YD
zmAJh0kJ(v~K4AU2XO-!|F)_IBAQ(i?M>G=)OdWejR<=Qg26CxD#ephD>7B^kfj3MQCYFC57s;35Csn^}UnA=?Y+)Ju|DslU1+7S+fdL)|H
zQ0)?z(2z49m1SCIQ#qk0bZC_CDQ)zUv-N&hqA}1XE=683C~?&NusoAlzW7V`=iegk
zv2;PVEy!pgz;LhgLxjv_b)3&Pe-X15L-4|@{n`0T4chNb#IIX7(ew>TOC*S}M7nW&*?y8mPy@Z67e9sfWW
zExw(kar~>-?ZI>XV{1dzf^7UCU=k47APsRpYEQ6iA5V@|-I2jG`envB6uy;4f
zCMUaiHc^}9XEy5Hc5L)|o^jn|L*qYm^vOF581uBR@p|oiukrcU^%*w-_dYi0wdH*E
zHhA4EJzBed@3-Qw8KE1-$CuXpde2w)xDho9FfBy>R{7nu!edgY1pn?v!3caWnMmFl
zxu}P!<$iBcc)x+t9p5Q(vA~kiO^Hm4MIRuJAcI&vFm8)Q|GizM!!@^sEQ$qVx975yPlJa_mwQ^Y7{;N;D>Jq!+-bW3`haC9j}yE9z^`tib(O)X{Sazu%h
zZUqQ;4FuTIS*-dAnt%&u0GZ`0+&h!_s2;0x+>F5&ce_e~d-8#C6dw{~NGG+7!d1wk
zRIxOXDZjue_cwJWpbwF4c91E`6t#mRBvC4W|CZir+-e;pRklD5watoWrPvzfpLBr`
zIR@vu@YJ(}?0oDdPZInW_k2P!0rmx^3aW*|Q{>Ms6rU@RU{C%)ZHu2|xlVdf+c%&}(J+2&f>s
zLEl6LS$|n5EqqyD(xLiMC0BYs3#PiMBP1+{OW*Rwnm?255cGI(mpsaI)1XdkWvNVw
z8q0ixN#`{<#fFM|xDwB%Zija@lPb#S586Tw3vx62RvkaJfkMJ*;(*+yG2A73I*@HX
zut~}^Ox*}L>hcgH=2cz5zSuqb3osmG0k$!AR?T2ld*E%sr<6!PE{J0m!*t;B-iW_!
zo(!tmy!wqj+Na`hY{QEcXi+8=zr$uEtnxXt*Al{PmJW5#dAtwmEzEOB1OwN%L2RW+
z4xy3UW#En#KF44@AkE(^=|mW=1%sul0!$oXXhmnH!EC*Rf{KYyQ%?hY=NG*?lN4B-)Nv1v^3GPA1+4qy(&7_z6g2=t*-H%~5mzi~@i5&z^F5FhP#!aD)o43k|5L3B%34E9Lo
zdGdtP^6P!DA*Su(C13cZ3j1Pq*vAs9K4GeBmkCp~sPChu8qI_Psdj%>4@Cetl9uzC(NH88}z!n?@n2M?eN?r
zBri==IJj3FM--U$AAhql@RQ3IDnZHJxj8xiEei1T_4@Rru`hk0387(8=ZFTjU=c-K
z;~D%ztIYQocv;luK4avq-uh512Na1tq5-b#)oSt
zJDcPabH$Z?e8pK!IWa7q71zcV-;6{5!uEdPtq#)fE4BNHP`@{~-=cIU7OA~ne-tc*
zLiFAsR3bdVjQn#jE(S0Mhn(_%DJp=Qu39}^hXtd9@jKo<*!SuN*3%?cPL8J9$Z@3|
z9eR>E35<`D%S#Vea_pgZR&pRk@e9{VEGpd|M<*(l)NS$Hv%Mr-!iRM7`Nooe?o~X$
zuF#c^0ML2KzlUb9L=DE<$}}JN=h#PQeVqSQ)saI?I*z$WvwJ2ad9at62)i=c&6-G*
zxB`G3zP+=vD(sl`w2QBVo`AHdF?4+ymY?}Fr5CT?^aq_U-0)rA~R9u
z?1|XtYZ+NcO)jv5j~8>e(OUMm^ZMOX{Hem_=Fw-5ZJVsMJx$^wtiXiwJ8CE@jSSnZ
z8UnQC^h3EI$@5(~Qy?K?!fi<}^!BMlLe}z1jpF8xu0XB}2(cKVD-4MZ>F`Z{o?MA~
zdNvTJm%>E<{mz;ghu7z=^vF}If43u53zGS|Iz5$wSM7x(UPibwsHE3*Sgp6yOtM;R
z?Zf)UT=<0F_)3}W-XL0|OsF6=eYdpxROlPO4AsgnCI0G?&M*RI(Z$vSf?AsEr}0cn
zV;3#C!b_<8aep@6Pc`m-JA-ddPi(jFfbSvuHXJQ<;Va_IzS0lLyr@4FOsVW4@(wV%
zdwjY;&)hW9Dn(MBi#hu!HOOK0G|K>#kLx0u@AW13kEC@Q3#9Dgcl~%FqC4r?G9~5$
z4~9`V7d;3M(TVBv6>P}*Qq#Zhl~NkHX7VF}$cI-oEIFIbPX3IkJ%UR9v6DesRtw5I
zE;fWHIiCqsaV8&O0Be30`6}Z>waA~ROV{;n_Jz0`DqZNsRc6NN8^Wz8TAk$~AhwQS
z5p~yHrLR;SHGhw}{#J|thsV7CR^=gRNp&J_QI|s2!r-3s=Oa(x44INmwoQ3RFSV=m
zAKj(smrXfdFxl~gh*r#FPx6x&B#NwyDybwb%m&+StY}Rc!M&ccVJr~Xam{0OkN67LeVIMsPGi%&P+=go!SF@P9NL6EPi9sx2M1+$!F7eYUsT9ID
zw#iTOyBPZ$orK)b`#{t>b0f57el{;D8+>L=rGO%mo*S*<^NHyowV2o)T(re63XswH
zyi9sE34REOi$acnbzBR~9)ZNaqN?Cs=cLS!M&^ki-e}*NSf&8Qz;k|b!tY(@5zP{H
z+bf|-5rvrPorMJt^y;>vB@w_R1uWfkpGVDc>f#O#s{btnF6X=Dd$?&Lev3>QC@kI)AD81`Otb>hoRcVb|QpAGyPeI6ACq}oFt`ZZWCe0TXG
z;^^`tE#fEXg5qg@rrF?|VB89&jix{h7<~W_y2|s2;y*ovCjG#q@yj8h(HfB^^pZ+r
z-a&>la}aZHsT<~29@*E|b@8l9+_Y@&kD9=qi6a5nqQ#R#f+Jyv%ca4Fx=tQ(Fr07n
z1p3(wxEb#9mMEhz7efuTX!Si8^IoJL-%)@g=%&jv4--QJ;TK$)U=@B>DS7RRf1dcX}C)t
zG2uLwpc}-6o*voGfcwy0l3t>*ik>X`$u=Z1b^?f>t^
z-5ke00zDcRv&h-o#S>+A6;*3uO~dWrL)1h|e7JJfTqw`K_-DtEwe*bJKAd3>DK`7h
zCJ}h=%nBdb$geR=l=9CB#~r#cS>9}gLr{j+{@F5nQqS%Y3Lq5QQWCh>FK)~}EimG*
z739X!(#=yaU>?XrdrAp|~awhj^ccQR~V5Usd^8H11nhU%a43^Rjp?BZSrOAJ2O
zvzKs$s^euviPIVCb6nV-je1B|zyhg`X6WUa
zT~4RiG$B?l!swd)kR76n%#&I=`rOp%H?G~jrJuLZ$GsP
zL7VG8xbyQb%MuI)Mqxt{;U<(&W$g=K3+YtyfVa&I01&x@f)A@Xz`Q~Xn$dAP>ybdK
zhhK;{x#m^Dk$r!SX>v}3n+?L$8=F*FHWgnCY|IzsF9Y#@
z>nCN(ai-2Etr+P1?{QSZzxM{*~
zA$4YEmJd$Ame2aY)D3AWCf9*BgP&@uco6sK*yAYW4FW{ElJDubBOHUbNG7^2ZZ&6w
z5Er5>4^qtG^kxaC4uu7Yn{M1&I!bbANLoou4LO!G`@*3SC-f-k5J*sDpwGnd*nNil
zw=G?QQRPPNqNCaD31#jM&2!5f4HJUJ7!lmrL+AvYidblnc|5+}IaL6*r8_q|6yGw7
z&;uF#75gjWut^|owU{3)M(`j8IKqFxL5imN-h@zv5G@&7m4~l2A2ALsZ9YEPIr@B{
zj@PG7QBY5^#kNy2h*o=K%SlUp6}MBV_znkf4gq{jOtUQH@=`Xq(hh77=QK4(JQ=NL
z)vsx-a;xLH?HNgl{~dhtG8+3B&Q--c>_&7a_A0l+q9fzm9+PVOt`M*nqy|B^Wgv=H
znH0h!e~y3PU^KRCkCyhRIx!onSnQ0X6U<~{HwumK(^w~ryUjt=
zo(9-y2g%UMJ2rp3d=xqVgnJrfK2;gE!`F0O)HiScZFVzca^%h!U-<85QTTP|ZKPK^
zWH%JiUBYyXp6~fWhVFkg9FYHm-*d>zVr+OT#eani?V!K!X+|8m0BrZ3kkIr7m8i1m
zanO5Q-&0jCkM*e9f7Ufc=G~Bi;4cvWTS^|}CJ3O4>z5AnCCcFb$ZwErDwZhy=)%0{
zNG_XyNu+WksrddOWbLA8CP@k3s{~B19HvHo&=7P4M<4gdMLa#w<*3&4CZLkNm)EQJ
zpQ}P5;=}8kIQ_`+%D71c4JLk<&y!Y6CaU^r=fjy!`J-6V_g7(2dh)&TLE%y^5{^pm
zBI?3ng)}#XGBvw3+t&fi8rHC`Ylc_je2N$@2CK?cI3hhc!61m4Ya}}7mr_Dgg9R4u
zCQA<2p}OjXH+PwWhYW+68QB2QHnEOZPT53TK9|EN)N^B@-~!VR!J+pw?A=#KN93ng
z2@EF1dDL?_1c~3B1a{xhRSviHi=GlbZLAj`9*HM!fMg1HSEI(Rr*iJnE#%A0s$c=M
zf`VJZQ3*dw;xIABw0M4N5g6T3m%0=Ca-Vpo$#^xonVPTyhNR!f^E>-N1*Ya!=GXfl
z74NkAh?1)Gd%vci^VB~c`>kwREm=zvi1@0__ybB@3}SBBtyA$_-70m9bi*$922mmj
zR-`Zr(wMT`_jQ=lv;v;U9Lp$K4*b8A?9{QRCxPFrYC=UzE^qi1(>nM1Lx1g@0@p!y;m!{f|CR*`X9Ze%n{!mp_G4ApjL&P&Yd}C3G
zDXgbs6ql;9>DyRMinLs=l%8}Xiq`8OO-N!l@Af~pY6`6UY&C@dsEI3Ruq3&MonqGd
z-CQ6S-?Hz7lJbfh5cQTWGhm>pq{{Paq1?7g3lHwfntgUZZfL{EwILRhKr1aplV`qU
z*0}fH5zZQCG^<(CEAz5#mzacZRxhhm^k(FZM8zIM!3C#TDZ;fV=&A~@vEALQq1=7Z
zlErOkyKU&s&Lx|-e)H?yD5jwOMjIXp5dkj4xzfnhDHUMo!g>%~3&=r<%MCjn_FW8O
zA6^bn-Ff@0axs*KqSDxRZDNm7-m>~FK&C=%pht&9kj`VM@Oq-wEjon$UH!8S|F^QL
ztRj<>YLuxcACcwKAE_i^=xPKIklU9-S>uWHCw#0E50GHD=KN4rZ%>2LqORlktyDpV
zg5dK0^(<^X_`LQ>BwLg5x)r;&eK+a{SLohasF;kT;umARghK}^zSFp2#u>11v
z0N^O&EeVZEff|&Z1dvF|r1{GGW@^Gs1lT390AC2QNq(57vV$85d1EHSZEFs9_^ecR
z8eXf|YolZIrFeKS=*SjgDy4N`%g`VH4YKzASL<%F7c{x87?(O=We!cU4Ce%UnLwEP
z;{hUh!y*jeYf)C*>D|~8IS_JyiN8d+tTF%d+rvs<==>a3bi9Q*)!dtUdMd%9MWp3I
z8A3H{(au9BcK{QLdGeXa>YU<0ZtbDF^pTS_Kyok}Ladg0`RFFBIS9F45TVTO|7H3}
z;{X28Gab;LSQ5kK{ZN&9P^e~v?aoD#KlcGQixMAXb=dQ?2(ko_mgjig+({i}pyY7M
zcY5F7L^@Fb2IGKtVmpa>M|_6dpv%wITgP-6f-X`3Td8l@QQQ9n%tlaGx-Eloo4!;>
z?Dx~YymCv?JN_h`Qpq~|`--n8QX%h}AWP$E0Gue(#sgh7m~CP;SlG8F5lMx0M7@N(Zf;{NKyd;2`Fv}bSs6**B1!^>xxy#Y$6XX#kE
zW`v;2ud&Bt6G}Q`)F2XAKBm(7^7H0p_K+s$xmY;>li1oULIxEti1C7eaQ{UYub6@4
zbK_?gzr+QA1Im*^++`I2N0^qq;_G~f)iQ`#rf_Ks9xAAGH6dspU;<2?{#x|s_nB_p
zQye40cSdq(CBg-eKlO@rW&dSMVK^0|1Xqcgs}fs@^~_y@-Zq_cN=dUTZ|w_A1w-xX
z>J~Y)g!W>-#!GN3K^B5}15geQ0oxECyKDJ9$7TNnCijqq0KHT4V@npsTZm1r;+#%%
zJ(ZQYO9AqxvXqAB_P^A=>j8X!+ItKl1dJBgMEeArGI)Msl=LkEzEXhf5T+94)i{14
zT9zTk?obHOz}3Vu)No~oDgv%?8b;UydWb@FdRQNu#OX`PbEtNzJlG^r58UWgMf+Ri
zYe^P-eOJkskl}3W=vb&Uwt3blp_v}?2qsh-CURqA1B#a(kMregL2Ou+xKrF+zv9x-u2KTs2Z2O}S0lJ;`)j$hVvLt6tDGm7k
zvMxJnlV(xWdUH
z6MbLIbXfF6j>rij!MZTroq`!xQ@{VBPYDhMBWh3i_9K}jL_Olg&d`Hp^2JU9!fNNL
zA_5Xy!k_*UwtOJIYF|X>y6CO)4)BaMR;IiLLMuEZ2aoUaveBy<#T5l(ax4uv2lCj0
z+mwXkj{{m+g$YWmXa!W_0~VZGYVc)ugvehN*MKzUV5nqfo$d$g01OoPh=7P*;F?rz
z1t;8f;Db#hqvr^6OZ>A7{6su^fLj-)3Pfcpp(yeaAhpj#8Zf#8h#Y=8i{Xo{fXsv_
z@f7_5n#P4fF)*MzpOka>11Q1}r=O3dw8g3JWV~|&!wC|&5}-C+@G9v62$!1*M%Uo8
zUh*aB2`H;RjwSG`spGfT*5ZpJPuhpwo8X?6?@8pOx>Zo_->Ij7EyMCGzq5Qby1NQ8
z5<&0H{Vy8n0ROm7!fKR9g;*5U;8$_T-47@doXcxz@27GqynS*H6dYw3TrXPOIxQhpE1lsp!pu>)wFFo5q0
zq-(<;S-@zs?g2-XDl_Q4gpbE@M<_2!vR<$}qdil;x8UGFO%R5tYOvfS_nGgbzUXU0
zy56@4GQ!4&czQrLxRc`p7$(Oa^?TduZ%k`*s%JSKRru<%0ys75!$nkl->c~-jYYR|
zJ)et#<3x)3WJrkudFPXJ+zeBKK%(XcR~bw|bGltTV%zvvhFhAZZv8XqF5%WzLcpn&
z9u27wt|(gxI$2%YshadShzbxS%GDiLj%;OUV>2^N7}bKKZG^oAk%GjjhQqOFvVkxI
z6qaWaKVZhBmC8@rQG{qwLzSd?2Hp@L`zVG4%P0YUC#UVY>rgHakbY2j_3G9-4_3g}
z{j6O|2Yfs}=GXW>kD?ew1@7SYf*d51{BCK*hMdhnVEs{N7{Ybs0qAZ8J`S8r$@`;{Rw*((~5qg8&E
z{|)=rZXGWnX}hMG5KzSTO$u<$Fi*#Xy3EeX!SP}4tK9r)$ByZ?QHQADZdwO?7ip%2oey?@?Z^UdLkc#$NH
zqm44^P3sU)VD(m0ryuE<9mM@LmaK*zj@Y5C_jBEZX)f6~Bv$>eWz9%l3jzl-5TZ4^VJ
z{&j1Y#c3{$Hy&g5+D1crV$fP(VSz!vYvj}HBweK3H;;#NGBD;KmQWhcd*Lz8JR!N_
z^kZv$Bzqr5E&cV`E`4WJ7{eM;hmG<IE`;r
zrw705Xd;4wGD`)p?w7bXkF3mX!p}pHZLJ#VlW=j3HM)v}@}%B47sdVfR3_)9qVkl<
zqmFBiD^QaB^>#ne%YlU7o37Raj@3;Me}yu_=DqNn;KZ0*sHI!@Gc9nk3a
z)Y#U}91eaLB~_=^mnkzbE2(k76o;LL^Kl;WIFDb?r%!GOdA@2rZhk>S?i3O#x@%P=
zz8?zklYH09QyYZeYssJk^eec}cPGz+*?7pv!U6!!?~Bn`u7K`l=<9HHpqjZS
z&iMBbGt=J%XUK_b>T-dNc}|4(zovaB0&)vQYMfha^90$c1Q=EhU{(^T8W
zrW;lq!18^rF`_JwBp{9S-aGCZnjw26vYVGGj8U>j``ISO{1_uG*QnL-&acaUxB63n
z;}F#y8~*kY%h_?={BzMfYx1N-MbgvE!&6+pdlJ2pr|TZ3*ik99`4_y6W}jbkFqKu9
z?sEuam5w5XMrPFez7}&0U~%iFGtYky{I8WRMO*<{!U!eBwJc~?B`TE`lJm&dY`h0J96!UN`#%wB!nKFOz-82tgIYZz4&8ee;A?1^l#$uR_Br{kp#SARe}cFuLLuwfTr5wC}nFpv`yv{F!d5FgFO}jYx*^p6}1uzH0qe0FS&=
z{GUCYxZt3|a$&m0W^u^Oqb;mJVR5+g9xrPH2Odm-lxQ@niP4wBpU#Te!li0jWSCk0
zJZm%6D_)206u4&5pGw~NR6x;F4%0tWFGqU-hKtTey+K2h>n-QFNzWrHEbA6YLt!sx
zI+ga)yQ8oBYH|o&JoTOtLz#7Vlf+v5r6of3kCWOS+n)VbwiQ#}O9|pJz7q#l3T~?`
z=T+iGli(lo{m~N?NYa{^>?PD!c5|7WezOsC6Jq_-SLYH(UoV|hU#^!m&~FbL0yP=}
zP)132hv~tnE7>x&78C31P7k|2A3?1aLy6@d2T9n`zwRWb8Ph+=08iwh)o_OTEy{Yn
zJrN6NB9lTNy?tp^j&b@>-#Vx!2yj1rQN9V`Yu(jf*@Ii3k@+Th$c2J7R=QD)OX639
zIkn*ZwkK|qBV=xPm)8kw`a=(W$s;TqcE2#?`OEu&<{<*nkeDL43)=}*=pf?w_*`?Q
z64Xx7k1A1$S1$A>gGPbU08d*7o1w~Jc;@)ae&2|qPY9B(6ntLkB2|YNV}HUJWvo!%
z0I(H27|ATx*8bjZ-{p<%vesdN@_xx%bei<##4-ccO|_}UDCtU~<4~rPW9F4aQT4PW
z8!hXmWXuvKaBp5F4~!YG^W3;Q^_Z8ja0^_gPm$2N+y>S^$PUCS=3rev_kSd>zMy!F
z539m#L7ze7m>+}xR%vaX8y^u7K@XpQW{s3A??g^-1}Rh4RX)U=?wO-F=v&ad^p$yU
zJTU|A?T6>uj*vSl*m|#sW`5_x+T{BxJF?4PI%qe-yT#K{)qqF?zUAd!xkvx_Ti1Qx
z%mZp=7ua~{Z|BX+Yg=Xq(oxbqvpq_5b$LIX-SF#8-=z9eL)}ZieVUOg$D4T4rsoxF
z7Pd0k|DG~5H<*7eew0YR_e_@aM~Bu&JJ}xXs|HRH|22Y0M(AnpCK@Nzj>okSXC-3}
zIb@$t72A?%l>6`8Ofm%7$Ox-Ra{gT4y8Z57+#-EFQjFi#bLt`-Tq2EWM$0gFomA!4
zjfL?VA4UUWfhiFdYSf&8unkd4vURjpj=F=>7L$_RHUioCP*s4dfG2!`4Bv{x&ki4t
zI(9yJJ#WLO1$OZ%x_DrlR=#6yUUcP0l;6~!ECjXZOMiVHjIT-ZxPbjF*HYBD$y;18
z+mm}PZiVFn=Z~b6HQ0hLWw!aF^gIc^p?F8AN548kWIYD$<0^Ihng1{obmN}yr{N#7
zC@6E4-6p?#vsorXm2ju`xwnr7`&??<{^jiCw4$rdyAH)+pRL@Py3q&aUH}2jkod({
zhB#nih!kedyQ~UmYTfn2V?{&|@Q4`EM~<0vOVeKk)nOU&OaFKRp`<#|eN8C%0dL;X
z!#dp1H=%B6if6MCX3VS5(`nV5cmT-RaJOP-&po@Nj6KQ@WmC=*se{@Gb_0t^rw)ET
zs~Dq!WL26X46#5YqZ<~YgWy)WryAY}))4>8t6St7@0WjTl|ef0LY}z3;&NximZc<;
z?_~qd>JiJd$fj-r5_Px?5x-)EQCZ{_l4ke2#v@iVN?i4}_gaFZv
zQa~DM%gb4B{$ZvKqxE93Zk-q%(nH}&uhrZ&)oSAjZRKx>!Y93Jy&W`&djL5eajep)_Q>yI=a(D|(^|oBz3p|X3`d}BkrY1(LbRC&i#5GBN@1g(nP6+
z14xUGH?#Xc#yYwi$LG)dYb%P`ym&JD7L%fwfg}PpFc1yAZjR6<@Xm;cU}^a$L)8RN
z!tnf7u&lvcc5$En!W-WeOzXe}I6N+Mr3Bpy*Qwg~+IkB0M4PNoOU^3|HO9TJF8{@{gb6V|2TY
z<3WQ+kB-+v)p&^J(P`5oBKf(nzqoy4T$@SzM{%^IFWR%0%MEjUZcs%^HT6?BgUVoN
z!R<@=^<7IY{d947Tt6^wdz>cR}>6
zhHR}sHS_TK+AeA4BDJe~vMk1m!K88C<@(U`a+4|zVvgnZ=|PS9L4}f+Xu@VUzDCYV
zKB1a9-41>w-uSiG^4p%HS4Ir}U~M^3*hY3b=ZhA?!A?NMjU4w8kEK<|2(fzehOoWy0DS_;UPO|Jn8bKc0;Y#cKxcuHXDr}G}u7c$uBN@II
zo4X5Rm=QSiZOr53dCZiQBX>{ddwxE;k`}9K#AEgYSt5IvGE*z&HF3P2YrcH>*zYPg
z$$}MFJ0S0;fC=o^PvI=XbR@n(dEDZe3+B<9(&4$;LVfyVvf6GC(M
zk4f4NEE#ZcLjHS=)JqlGrT
zo~DHx`~~OA35f|;w=^cJ$?|R;bGp|ZR2MX0;ip~a_R_M-jXB0c_0r^sLW+drW2j^I
zUUQnK($eo*A#l<`MSz^7IV;&}X!>HCh>>_U+w1b19mFD^Noxa=OD*g)k--s96bcVceZNFYS9g#8Dp>vkWK!Ti5eZnbS1cZrT5*ND|5Q@c`
z>V{s3vJes6{$BvT7eVN0Sg&K(Y6bvwt&5bY-);PiQPY;~e&BOw-`tTsy7n49YO>q%
z@rza*ymQYxG&npm`rVs10Z~ccAHAD?tR4XELj~8$g;Yv?SPD7IxEk92Pvny4T#~$p
zLX&n__S)wrTU7I+uHze+>NI|kM7pT}&=VJ;ZPmQ4iB;8w{oF;%Ht*G`Pv5?M`wW@6
z>cnlocu87(T6%g)yj-bIjSUP42zc%P_G3VBd~C>_&0{M7P#^q^--Q&Ps={FTo3M~T
z4-?eoA$do9NYX#ga2xHCRh%5FckD83)}qbFPoLlM=g_7_jP6P1WK|eTl9If((Q5uh
zGAd9At9jBZ7fd?Eo3yLdZrR1#cTexwzG}v>_La*zS(mS40V4y8lC|s9sZ*!2Q`Hvr
zTXyc$*11yEn$?|LTwN;}>Eu)^)QeRpRitF;GLF`ztJi5a^p8%}OIqk^Yip}%YG`Zg
zSzE{^O^t5{X?|sOnF2sVUrz_Xc8-(pD32$}2&3Hk`*4rTj{{Pfk!z`nenz@y)#^=>
z!Fpc<<@h@1ZMUWlYV9&`>GnU0iS_$#IrH(W1u!E$PQqZ%h|m~W%u)`+iU
zfIB}ccnYhx95rdp%Bj=F{jqG$)bZnYc!m~k$3`C3;lzVu#ts~|D~^=blC|DQC~5Q6
zW5?#(@ZGF0YY~@AoRZT|%9nVw;31^*>F+gLV3f7AhTIsCA5!@T&1N$-H6N0#^XT(W
zlEk-sW47J@2}nt*${H_~me2Y%C*Ovnv#l=c$uDMkbonRkbpNYhpu{hJ22yfUwh;9x
zasKMD1=v%YWL4@wUe2S-qQ^f6sa*vlrSa}3AVvB494;iuk6#-M_vGL5lx&CPJ&w++
z%Wr-TQkx4#$~f|8ASE{l*r$=?-tS_{m$GZwUAE0@5%T*bNVVn9Kx&elDL*1f*RR)+
z0%Br6+g_s^d>G0befgsV)gW1*z9^
z9ZPlp!?t-mwm=8jf^Q&o?I+m<5^V|^$S?R)kouAxRF{{Epk82z-E-9ffinGR+26Ys
z@KmehpMw;K75K7d1%_2fDi?V3!k=brI#s|^=kX^Y^+Btkm;J`p-LnM>fIrW%Anla@
z!k6^3kmAUK-gbMzZD*7(;NG!6(Eyy$E}w_${euJlzJ*j^0TBlF-->Cvl|QGd`BP7S
zjxfmkssALe(>L=aY|4NB9sg$QOY*x_^x)?@5KeZ>>#5Z;ekM|U*(#s6onMIN=j3y5
z!=LVK$pb2ve`572RIZ@wi`su>923_roOLPh6(4l|b9kFZeFn6y
z>N4!eKMQ~?WbVcNHX_X|iAuFh&&^(i+*6v;RsC-c$KQ<#nFxv
zOMM3Cye1Lfj>(QZWr%nS_-ucRtdQp5J8`Dwl4DMF1VYl@P
zL0QtAM(Bix2e$6X%GoLPKRUT>wGt&){4+jP%pjO#UA0h>tdDgT=C7nAU*AxuGZsL#
zC}vi~6Y}%$&W&2M>o9P{_&+)eN6uD*Cwbo@iXa2-kB$E2)K+PZ=w5!yRzx7-e_N+tKL%pe&Kr;J8|)2L4f
zN$zjmxwmtrgACs-1d3#Y4k!iyLu5Bc*Olc$ssDN{`^yDw!0iAF{30dS(nme#med7$hx&dT+^O
zYsU;{obfw9y(r3lA{WX6BtP~SQbn97R|>FbF{8&S+08-fuilKW`I8B`CJ^^LYNYVwb
zIUhCRc+DBGgA+;4&HAK)8YMsSMI#h>Wv}{zNna9ggpg#|qf2v8XYNoZxRc~V%x53u
z2y*n!PPfoN?TMkjq$6M(%nV1vw5E7OGO9K@&`Sj9JDu%;PUVH5>fQh*=Lkcp1R=rN
zC@f6V7XrRBOTf9bj)rl)w}qd5n=prDGx0Uwz(D_=xX}qcXD`~S=+5-VMv}d8E>^8n
zCdql&dKZ02Mu{quBwfT|>2^3yGRaL7s6{fY2|Q%Cx4=D;No~-hmTG*i73BcwA;hqq
z#7xe`)wt-`tl|I+kh3OR>-QR!q#f04(25!M#_5fPcmHIr;@1=XJc7UQgEmA$sBk|3
zIrpA)lI2Dnfthp^GG%0x7QIUk{uuo|d}=c`lI#UQECNI?+6(SFg|%iUd`MoywhXPZ
z5|X|u^)8?b$+s$od(RxcaM8jn<2j6lw4l?0QpYw7))C1w&t6A2fm^ASYJ?SUycG^G
z+<|lC4wzN0=2FoJ%efQ=hT6t+viMupunbb^>xWM4s^L=(7bLnBB~?Yj92$v9X8Ho>
zZVXu$`ip>J&)$B>4wxJuJ91$Z#XUT%`
zixmPhhhXvbGpl-IPK-)*DM?l~IE7v$pUWNdj=U^C0rG$l-rWC5_6I#FEYd|Bn&--X
zvKE)g^-wpm)>7BNT26vt$&$t$<;9U|^4aHs!_Uy4-H>zyjf2pMC9DMYvJyST#RUFe
zSzFUOo+SHd%mV<`idoBVFYJzbc{q||FrTinJ?%}fWV742QJ=m8sXo|Evbt4#ri5t?
z;nkpi?iw9rcnpY?9Y)hd_Yh{1^cLy2so%6!eN^O0`8hR0Mt~_<>wHETv5*=qvqXs=
zFn4NuB-=tZ0p8?tT!~MO3SW(r9FC`%8QG^Cr&!ct$efoXMtk3$BhI6ZO-W1G$)gv$0Vcp%j=)A%6}^`mR`As{Ef}fp
zm`}2bW+X{AQG09iRgiEc^A1CoZ&@R5`cak=qD0+Om__obkSo(0jd=ytNs>2I0c|y)
z)NVP8DXA?8W*=2t8>zg?5kuTlBa)mTEJkZ_hs{-U9!*xkhW$_`hH1qtq(`uUUGae)
z0RBu6%RXlVaf~~#j%06aN^_nI~t91XNTJ{qmq18$lSSH{}3M4?4_-uGJA
zb2WtIU~Hznz_Sv}4HCYWPu0avlARobNG4lpJ*ptfS(lZO<-KTyjN#~E49VmwCUN)C
zm!zj~y5BuiA?E_6+cZ^Izxe{q$k0E5&O-3VOhI0Xu{HYS?40w_4K32h(Sm3&9G6*F
z^;2ftqv1|+o<0xiWiDE1+^JNYT#G&IfVbpiU{9KkEy*uCtg)TsN+ST(-f%9UT?Z%;
z>RS9B$tGDx!q&k%*<78>;;8xlyq3L!~pGtCQ*(6%nyb=Bi6c5xp<
zUPyPf<}luJeRo~)dHP@g$*7)hNb(g%lH5^I6)1h355&t!a;*x6CXl3$AzB}Od$A&p
zvxsfZM_08=A&*Q>Cdof=o3-Fr#u!kMpE&hEjI}%lH5?+hU?;obBk2RYNz>6=)3b&?
z>~;zhdu+w2@#Ixtd1fQWx6&HW+x})g8puWHYOL7(98G66xZFFrAdDAy}JKU
z$ltQMM7`zU!@&ueIi((96vsSDs{sHOm&vGE3b{Tq566(CH0X(EJeeeOc-*6O#MQ}bOZngx!xCJswwclYp5Wz3TW@w*O}V8nlX@1BK^N*p9))+
zmDC{q$nuRRz7+C4-&OLp=i$;08)FM|OY*yr9LmMzV3NV3!$|sfd6uK9cSu2|f#dxv
zwa(gV@K$w>z%!p>9(kr+jiIN*(n^4Y`Z{Q
z{J($XXjEw2^Fklj?Z2{#y^%Js=!XEKN^`kvwP20utJ_MqHB)oQu6gv
zMc+)s{Jji(`#Cx;J|^mIRkl9XP1kQB3FcYLaZ{w)EZ
zLe|{BqzP4Z=L`JORSdf|%;;RKJDvFYzD&~5h#
zkL5bJ*>fb115mN3+Iixo7z@qQBzIyUN5Xa3iV?bKoWvyec6D4TwSWIjq}*Y8gA7aH
zN-Y2|-E|IAN%Exup#Oo4FNM=W%`NEl=GFFkz$B(wz?&&$VWEXVB%jEw8}rowxcHMy
zkR{ORD1YmO43bPKk59AAu92B*wE>J}^tO+|fUPxPRor>FJ@h)PnK)c-aCm8k0scXm
zWCy6}0XO+l@VB%BD!)*HwvoQBt$NluwBd$0LXtcTt2C1S8csea&YLK6HZ+MOKWak%
zIe8khSOITHJ}HW|62Z%(`S@E%eKu5kEOWmAI&G^%G*MMZUD+HZF`68fb&~WboFmD@
znz)#try0mi7!do~8V~rzQ0y
zoI0sFW-`nSn{zzpqb3fLoQ~sMjU}88gL`CD4A-ME`Tprp3=6!zU0pLk@t$3PY2Q6(
zf|Bj9ljLv=W@!MeRqe;2-8&hI*(7Oz*`7zPWO6BH@XH)gabHTx=_N&IluQ&%USEuL
zbg+?^(1+cHBczgw0{I;JlpCp3dn4gH42LYMbtAwbndEA0<_4H#XWqytVJUE+lW6t?
z#d#V1NZvv$B3mSwN?Z>|jh`sGn7m1MA~x_5Pr7KU(qw*y_@iHbgW=Ji(bDMD%tUyg-M-C`IL
zL$bOkEZ!LQ4QiI&MJ8D|?}K@`7do;xw#u9}KySCJiQh)*tZYa{>DkH*Edv3Sa&4PC
zV=^gi0n`2Rs)?p$Br*@hWN*~wIGiCFq+^yGsH*>&WT2k;7uhu3W~p+A^tyCtM^@^M
zkLot5yo$x-Vq8CqGJ$@d1ns^qNzTAgxvv7Hv!%O$yQ}+G5kYo}z*KB%q@5i0D29vh
zM}TmOLO+FSV5_tL?{aFL9y*k#kfXF9MTq2F+
zUb!E$ulfd5C0_zSm8`|jMdc%#s0NUI)uSSc1JG)4k9;P)>quNAzYK7yHe{nK7Ub;X
zF!^aL197fG(GsVi7^TlS7UYxNoi#j3urDSN!C{TB5^Q%{Q6T0$z&@>
z30+ZDy+~>!2zgI;qx<*}ci?NbXzsusLt*v^v4Bp9k!>CJoDXAMI2=YZLvkczIjDbQr1c|HdM~y1k
zE2C<=?8=?Rfn-8sG)U%798A(x8@Tbro7f?^7a}m%rzjyI>8(*F>4B`d?h3gBR$<()0yc-)jvv8GNj3A$)-*uHvu{?Y;~f9k6KqE
zpSkl%#^sBYq=JwWKeZ3c!bx&0uFW9zT3jIc2}ROLX$1@-Jy2gN
zAB8Bt?6{r&XlSmyzd_f`lc!||*mT2?2sQnl%^fuWE%|wm(e}b;_e3RKO0DY)Y}D+%m2E1?8zy?*TvI2Id~Gd$&WAP3-{(8~#-Q51YkS)HImkGeSu}OBDdnJI&{zB$+Yd
zVHnCV<;i@p{uOd5O8j-i65Gids2-UwQsF{6mXcS~`T=6tUM1PFd!CN4KcHt_b-nNb
zo7sQlTS4cGU-^_b%>g|}Zyv2vtmg7BBIV}3AYoFz#Wq*9{b^E2_
zNfDpRwHa>>w0uJ{V}?3lb)Dpu<0%D5p2eWKPrjAtQ)%;5T8EWLDUnhl^dRg3&1ONa4
M07*qoM6N<$f=|l0
+
+ Remember the Real Goals
+
+ Accurately rank the plans.
+ Don't spend more time optimizing than you get back.
+ Don't pick a plan that uses more memory than you have.
+
+
+
+
+ Accounting
+ Figure out the cost of each individual operator.
+ Only count the number of IOs added by each operator.
+
+
+
+
+ Operation RA Total IOs (#pages) Memory (#tuples)
+
+ Table Scan
+ $R$
+ $\frac{|R|}{\mathcal P}$
+ $O(1)$
+
+
+ Projection
+ $\pi(R)$
+ $\textbf{io}(R)$
+ $O(1)$
+
+
+ Selection
+ $\sigma(R)$
+ $\textbf{io}(R)$
+ $O(1)$
+
+
+ Union
+ $R \uplus S$
+ $\textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Sort (In-Mem)
+ $\tau(R)$
+ $\textbf{io}(R)$
+ $O(|R|)$
+
+
+ Sort (On-Disk)
+ $\tau(R)$
+ $\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P} + \textbf{io}(R)$
+ $O(\mathcal B)$
+
+
+ (B+Tree) Index Scan
+ $Index(R, c)$
+ $\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$
+ $O(1)$
+
+
+ (Hash) Index Scan
+ $Index(R, c)$
+ $1$
+ $O(1)$
+
+
+
+
+ Tuples per Page ($\mathcal P$) – Normally defined per-schema
+ Size of $R$ ($|R|$)
+ Pages of Buffer ($\mathcal B$)
+ Keys per Index Page ($\mathcal I$)
+
+
+
+
+ Operation RA Total IOs (#pages) Mem (#tuples)
+
+ Nested Loop Join (Buffer $S$ in mem)
+ $R \times_{mem} S$
+ $\textbf{io}(R)+\textbf{io}(S)$
+ $O(|S|)$
+
+
+ Block NLJ (Buffer $S$ on disk)
+ $R \times_{disk} S$
+ $\frac{|R|}{\mathcal B} \cdot \frac{|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Block NLJ (Recompute $S$)
+ $R \times_{redo} S$
+ $\textbf{io}(R) + \frac{|R|}{\mathcal B} \cdot \textbf{io}(S)$
+ $O(1)$
+
+
+ 1-Pass Hash Join
+ $R \bowtie_{1PH, c} S$
+ $\textbf{io}(R) + \textbf{io}(S)$
+ $O(|S|)$
+
+
+ 2-Pass Hash Join
+ $R \bowtie_{2PH, c} S$
+ $\frac{2|R| + 2|S|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$
+ $O(1)$
+
+
+ Sort-Merge Join
+ $R \bowtie_{SM, c} S$
+ [Sort]
+ [Sort]
+
+
+ (Tree) Index NLJ
+ $R \bowtie_{INL, c}$
+ $|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$
+ $O(1)$
+
+
+ (Hash) Index NLJ
+ $R \bowtie_{INL, c}$
+ $|R| \cdot 1$
+ $O(1)$
+
+
+ (In-Mem) Aggregate
+ $\gamma_A(R)$
+ $0$
+ $adom(A)$
+
+
+ (Sort/Merge) Aggregate
+ $\gamma_A(R)$
+ [Sort]
+ [Sort]
+
+
+
+
+ Tuples per Page ($\mathcal P$) – Normally defined per-schema
+ Size of $R$ ($|R|$)
+ Pages of Buffer ($\mathcal B$)
+ Keys per Index Page ($\mathcal I$)
+ Number of distinct values of $A$ ($adom(A)$)
+
+
+
+
+
+
+ Cardinality Estimation
+ (The Hard Parts)
+
+
+ $\sigma_c(Q)$ (Cardinality Estimation)
+ How many tuples will a condition $c$ allow to pass?
+
+ $\delta_A(Q)$ (Distinct Values Estimation)
+ How many distinct values of attribute(s) $A$ exist?
+
+
+
+
+ Remember the Real Goals
+
+ Accurately rank the plans.
+ Don't spend more time optimizing than you get back.
+
+
+
+
+ (Some) Estimation Techniques
+
+
+
+
Guess Randomly
+ Rules of thumb if you have no other options...
+
+
+
+
Uniform Prior
+ Use basic statistics to make a very rough guess.
+
+
+
+
Sampling / History
+ Small, Quick Sampling Runs (or prior executions of the query).
+
+
+
+
Histograms
+ Using more detailed statistics for improved guesses.
+
+
+
+
Constraints
+ Using rules about the data for improved guesses.
+
+
+
+
+
+
+
+
+ (Some) Estimation Techniques
+
+
+ Guess Randomly
+ Rules of thumb if you have no other options...
+
+ Uniform Prior
+ Use basic statistics to make a very rough guess.
+
+ Sampling / History
+ Small, Quick Sampling Runs (or prior executions of the query).
+
+ Histograms
+ Using more detailed statistics for improved guesses.
+
+ Constraints
+ Using rules about the data for improved guesses.
+
+
+
+
+ Idea 1: Pick 100 tuples at random from each input table.
+
+
+
+
+
+ The Birthday Paradox
+
+
+ Assume: $\texttt{UNIQ}(A, R) = \texttt{UNIQ}(A, S) = N$
+
+
+
+ It takes $O(\sqrt{N})$ samples from both $R$ and $S$ to get even one match.
+
+
+
+
+ To be resumed later in the term when we talk about AQP
+
+
+
+ How DBs Do It : Instrument queries while running them.
+ The first time you run a query it might be slow.
+ The second, third, fourth, etc... times it'll be fast.
+
+
+
+
+
+
+
+ (Some) Estimation Techniques
+
+
+ Guess Randomly
+ Rules of thumb if you have no other options...
+
+ Uniform Prior
+ Use basic statistics to make a very rough guess.
+
+ Sampling / History
+ Small, Quick Sampling Runs (or prior executions of the query).
+
+ Histograms
+ Using more detailed statistics for improved guesses.
+
+ Constraints
+ Using rules about the data for improved guesses.
+
+
+
+
+ Limitations of Uniform Prior
+
+
+
+
Don't always have statistics for $Q$
+ For example, $\pi_{A \leftarrow (B \times C)}(R)$
+
+
+
+
Don't always have clear rules for $c$
+ For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+
+
+
Attribute values are not always uniformly distributed.
+ For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+
+
+
Attribute values are sometimes correlated.
+ For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+
+
+
+
+
+
+ Ideal Case: You have some
+ $$f(x) = \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$
+ (and similarly for the other aggregates)
+
+
+ Slightly Less Ideal Case: You have some
+ $$f(x) \approx \left(\texttt{SELECT COUNT(*) WHERE A = x}\right)$$
+
+
+
+
+ If this sounds like CDF-based indexing... you're right!
+
+ ... but we're not going to talk about NNs today
+
+
+
+
+
+
+ Simpler/Faster Idea: Break $f(x)$ into chunks
+
+
+
+
+ Example Data
+
+ Name YearsEmployed Role
+ 'Alice' 3 1
+ 'Bob' 2 2
+ 'Carol' 3 1
+ 'Dave' 1 3
+ 'Eve' 2 2
+ 'Fred' 2 3
+ 'Gwen' 4 1
+ 'Harry' 2 3
+
+
+
+
+ Histograms
+
+ YearsEmployed COUNT
+ 1 1
+ 2 4
+ 3 2
+ 4 1
+
+
+
+ COUNT(DISTINCT YearsEmployed)
$= 4$
+ MIN(YearsEmployed)
$= 1$
+ MAX(YearsEmplyed)
$= 4$
+ COUNT(*) YearsEmployed = 2
$= 4$
+
+
+
+
+ Histograms
+
+ YearsEmployed COUNT
+ 1-2 5
+ 3-4 3
+
+
+
+ COUNT(DISTINCT YearsEmployed)
$= 4$
+ MIN(YearsEmployed)
$= 1$
+ MAX(YearsEmplyed)
$= 4$
+ COUNT(*) YearsEmployed = 2
$= \frac{5}{2}$
+
+
+
+
+ The Extreme Case
+
+ YearsEmployed COUNT
+ 1-4 8
+
+
+
+ COUNT(DISTINCT YearsEmployed)
$= 4$
+ MIN(YearsEmployed)
$= 1$
+ MAX(YearsEmplyed)
$= 4$
+ COUNT(*) YearsEmployed = 2
$= \frac{8}{4}$
+
+
+
+
+ More Example Data
+
+ Value COUNT
+ 1-10 20
+ 11-20 0
+ 21-30 15
+ 31-40 30
+ 41-50 22
+ 51-60 63
+ 61-70 10
+ 71-80 10
+
+
+
+
+ SELECT … WHERE A = 33
+ $= \frac{1}{40-30}\cdot 30 = 3$
+
+
+
+ SELECT … WHERE A > 33
+ $= \frac{40-33}{40-30}\cdot 30+22$ $\;\;\;+63+10+10$ $= 126$
+
+
+
+
+
+
+
+ (Some) Estimation Techniques
+
+
+ Guess Randomly
+ Rules of thumb if you have no other options...
+
+ Uniform Prior
+ Use basic statistics to make a very rough guess.
+
+ Sampling / History
+ Small, Quick Sampling Runs (or prior executions of the query).
+
+ Histograms
+ Using more detailed statistics for improved guesses.
+
+ Constraints
+ Using rules about the data for improved guesses.
+
+
+
+
+ Key / Unique Constraints
+
+ CREATE TABLE R (
+ A int,
+ B int UNIQUE
+ ...
+ PRIMARY KEY A
+ );
+
+
+ No duplicate values in the column.
+ $$\texttt{COUNT(DISTINCT A)} = \texttt{COUNT(*)}$$
+
+
+
+
+ Foreign Key Constraints
+
+ CREATE TABLE S (
+ B int,
+ ...
+ FOREIGN KEY B REFERENCES R.B
+ );
+
+
+ All values in the column appear in another table.
+ $$\pi_{attrs(S)}\left(S \bowtie_B R\right) \subseteq S$$
+
+
+
+
+ Functional Dependencies
+
+
+ Not expressible in SQL
+
+
+
+ One set of columns uniquely determines another.
+ $\pi_{A}(\delta(\pi_{A, B}(R)))$ has no duplicates and...
+ $$\pi_{attrs(R)-A}(R) \bowtie_A \delta(\pi_{A, B}(R)) = R$$
+
+
+
+
+ Constraints
+
+ The Good
+
+ Sanity check on your data: Inconsistent data triggers failures.
+ More opportunities for query optimization.
+
+
+ The Not-So Good
+
+ Validating constraints whenever data changes is (usually) expensive.
+ Inconsistent data triggers failures.
+
+
+
+
+
+ Foreign Key Constraints
+
+ Foreign keys are like pointers. What happens with broken pointers?
+
+
+
+ Foreign Key Enforcement
+
+ Foreign keys are defined with update triggers ON INSERT [X]
, ON UPDATE [X]
, ON DELETE [X]
. Depending on what [X] is, the constraint is enforced differently:
+
+
+ CASCADE
+ Create/delete rows as needed to avoid invalid foreign keys.
+
+ NO ACTION
+ Abort any transaction that ends with an invalid foreign key reference.
+
+ SET NULL
+ Automatically replace any invalid foreign key references with NULL
+
+
+
+
+
+ CASCADE
and NO ACTION
ensure that the data never has broken pointers, so
+
+ $$\pi_{attrs(S)}\left(S \bowtie_B R\right) = S$$
+
+
+
+ Functional Dependencies
+
+ A generalization of keys: One set of attributes that uniquely identify another.
+
+
+ SS# uniquely identifies Name.
+ Employee uniquely identifies Manager.
+ Order number uniquely identifies Customer Address.
+
+
+ Two rows with the same As must have the same Bs
+ (but can still have identical Bs for two different As)
+
+
+
+ Normal Forms
+ "All functional dependencies should be keys."
+ (Otherwise you want two separate relations)
+ (for more details, see CSE 560)
+
+
+
+
+
+ $$P(A = B) = min\left(\frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}, \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } B)}\right)$$
+
+
+
+
+
+
+ $$R \bowtie_{R.A = S.B} S = \sigma_{R.A = S.B}(R \times S)$$
+ (and $S.B$ is a foreign key referencing $R.A$)
+
+
+
+ The (foreign) key constraint gives us two things...
+ $$\texttt{COUNT}(\texttt{DISTINCT } A) \approx \texttt{COUNT}(\texttt{DISTINCT } B)$$
+ and
+ $$\texttt{COUNT}(\texttt{DISTINCT } A) = |R|$$
+
+
+
+ Based on the first property the total number of rows is roughly...
+ $$|R| \times |S| \times \frac{1}{\texttt{COUNT}(\texttt{DISTINCT } A)}$$
+
+
+
+ Then based on the second property...
+ $$ = |R| \times |S| \times \frac{1}{|R|} = |S|$$
+
+
+ (Statistics/Histograms will give you the same outcome... but constraints can be easier to propagate)
+
+
+
diff --git a/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg b/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg
new file mode 100644
index 00000000..b84b5f70
--- /dev/null
+++ b/src/teaching/cse-562/2021sp/slide/2021-03-11/JoinIssue.svg
@@ -0,0 +1,279 @@
+
+
+
+
+
+
+
+
+
+ image/svg+xml
+
+
+
+
+
+
+ ⋈
+ ⋈
+ σ
+ R
+ S
+ T
+
+
+
+
+
+
+
+ 100 Tuples
+
+
+
+ 10 Tuples
+
+
+
+ 100 Tuples
+
+
+
+ 0 Tuples
+
+
+
+ 0 Tuples
+
+
+