From 361900d19dd10a8c585b07c20180d81676f0f744 Mon Sep 17 00:00:00 2001 From: Oliver Kennedy Date: Sun, 4 Mar 2018 16:37:37 -0500 Subject: [PATCH] temp --- .../2018-03-05-CostBasedOptimization2.html | 112 +++++++++++++++++- 1 file changed, 108 insertions(+), 4 deletions(-) diff --git a/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html b/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html index aac71888..7d001ee6 100644 --- a/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html +++ b/slides/cse4562sp2018/2018-03-05-CostBasedOptimization2.html @@ -359,8 +359,8 @@

(Some) Estimation Techniques

-
Guess Randomly
-
Rules of thumb if you have no other options...
+
Guess Randomly
+
Rules of thumb if you have no other options...
Uniform Prior
Use basic statistics to make a very rough guess.
@@ -415,8 +415,8 @@
Guess Randomly
Rules of thumb if you have no other options...
-
Uniform Prior
-
Use basic statistics to make a very rough guess.
+
Uniform Prior
+
Use basic statistics to make a very rough guess.
Sampling / History
Small, Quick Sampling Runs (or prior executions of the query).
@@ -500,6 +500,110 @@

(With constants $x_1$, $x_2$, ...)

+ +
+

Limitations

+ +
+
+
Don't always have statistics for $Q$
+
For example, $\pi_{A \leftarrow (B \times C)}(R)$
+
+ +
+
Don't always have clear rules for $c$
+
For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+ +
+
Attribute values are not always uniformly distributed.
+
For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+ +
+
Attribute values are sometimes correlated.
+
For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+ +
+
+ +
+
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+ +
+

Idea 1: Pick 100 tuples at random from each input table.

+
+
+ +
+
+

Limitations

+ +
+
+
Don't always have statistics for $Q$
+
For example, $\pi_{A \leftarrow (B \times C)}(R)$
+
+ +
+
Don't always have clear rules for $c$
+
For example, $\sigma_{\texttt{FitsModel}(A, B, C)}(R)$
+
+ +
+
Attribute values are not always uniformly distributed.
+
For example, $|\sigma_{SPC\_COMMON = 'pin\ oak'}(T)|$ vs $|\sigma_{SPC\_COMMON = 'honeylocust'}(T)|$
+
+ +
+
Attribute values are sometimes correlated.
+
For example, $\sigma_{(stump < 5) \wedge (diam > 3)}(T)$
+
+ +
+
+ +
+

(Some) Estimation Techniques

+ +
+
Guess Randomly
+
Rules of thumb if you have no other options...
+ +
Uniform Prior
+
Use basic statistics to make a very rough guess.
+ +
Sampling / History
+
Small, Quick Sampling Runs (or prior executions of the query).
+ +
Histograms
+
Using more detailed statistics for improved guesses.
+ +
Constraints
+
Using rules about the data for improved guesses.
+
+
+