From cee46ab57b86de23ba9743fa7e68fcaed417e44f Mon Sep 17 00:00:00 2001 From: Oliver Kennedy Date: Mon, 5 Feb 2018 10:39:46 -0500 Subject: [PATCH] Finshing slides --- .../2018-01-31-SQL+Physical.html | 14 +- .../cse4562sp2018/2018-02-05-RA-Basics.html | 267 +++++++++++++++++- 2 files changed, 273 insertions(+), 8 deletions(-) diff --git a/slides/cse4562sp2018/2018-01-31-SQL+Physical.html b/slides/cse4562sp2018/2018-01-31-SQL+Physical.html index 5845f396..e7237f29 100644 --- a/slides/cse4562sp2018/2018-01-31-SQL+Physical.html +++ b/slides/cse4562sp2018/2018-01-31-SQL+Physical.html @@ -126,13 +126,13 @@

What is the ID, Commmon Name and Borough of Trees in Brooklyn?

- - - - - - - + + + + + + +
TREE_IDSPC_COMMONBORONAME
204026'honeylocust''Brooklyn'
204337'honeylocust''Brooklyn'
189565'American linden''Brooklyn'
192755'London planetree''Brooklyn'
189465'London planetree''Brooklyn'
... and 177287 more
TREE_IDSPC_COMMONBORONAME
204026'honeylocust''Brooklyn'
204337'honeylocust''Brooklyn'
189565'American linden''Brooklyn'
192755'London planetree''Brooklyn'
189465'London planetree''Brooklyn'
... and 177287 more
diff --git a/slides/cse4562sp2018/2018-02-05-RA-Basics.html b/slides/cse4562sp2018/2018-02-05-RA-Basics.html index 320fc696..2fda5e39 100644 --- a/slides/cse4562sp2018/2018-02-05-RA-Basics.html +++ b/slides/cse4562sp2018/2018-02-05-RA-Basics.html @@ -29,6 +29,8 @@ document.getElementsByTagName( 'head' )[0].appendChild( link ); + + @@ -298,11 +300,274 @@ For each week:

First we focus on sets and bags.

+ +
+

Selection ($\sigma_{c}$)

+

Delete rows that fail the condition $c$.

+
+ $$\sigma_{(BORONAME = \texttt{'Brooklyn'})} \textbf{Trees}$$ + + + + + + + + +
TREE_IDSPC_COMMONBORONAME...
204026'honeylocust''Brooklyn'...
204337'honeylocust''Brooklyn'...
189565'American linden''Brooklyn'...
192755'London planetree''Brooklyn'...
189465'London planetree''Brooklyn'...
... and 177287 more
+
+
+ +
+

Projection ($\pi_{A}$)

+

Delete attributes not in the projection list $A$.

+ +
+ $$\pi_{BORONAME}(Trees)$$ + + + + + + + +
BORONAME
Queens
Brooklyn
Manhatten
Bronx
Staten Island
+
+ +

Only 5 results... not 683788?

+

Set and Bag Projection are different

+
+ +
+

Reminder: Queries are Relations

+

What are these queries schemas?

+
$$\pi_{TREEID, SPC\_COMMON, BORONAME} \textbf{Trees}$$
+
$$\sigma_{(BORONAME = \texttt{'Brooklyn'})} \textbf{Trees}$$
+
$$\sigma_{(BORONAME = \texttt{'Brooklyn'})}(\pi_{TREEID, SPC\_COMMON, BORONAME} \textbf{Trees})$$
+
+ +
+

Union ($\cup$)

+

Takes two relations that are union-compatible...

+

(Both relations have the same number of fields with the same types)

+

... and returns all tuples appearing in either relation

+
+ $$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} \textbf{Trees}) \cup (\sigma_{(BORONAME=\texttt{'Manhattan'})} \textbf{Trees})$$ +
+

We use $\uplus$ if we explicitly mean bag union

+
+ +
+

Intersection ($\cap$)

+

Return all tuples appearing in both
of two union-compatible relations

+
+ $$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} (\pi_{SPC\_COMMON} \textbf{Trees})) \\ ~~~~~~~~~\cap (\sigma_{(BORONAME=\texttt{'Manhattan'})} (\pi_{SPC\_COMMON} \textbf{Trees}))$$ +

What is this query asking?

+
+
+ +
+

Set Difference

+

Return all tuples appearing in the first, but not the second
of two union-compatible relations

+
+ $$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} (\pi_{SPC\_COMMON} \textbf{Trees})) \\ ~~~~~~~~~- (\sigma_{(BORONAME=\texttt{'Manhattan'})} (\pi_{SPC\_COMMON} \textbf{Trees}))$$ +

What is this query asking?

+
+
+ +
+

Union, Intersection, Set Difference

+ +

What is the schema of the result of any of these operators?

+
+ +
+

Cross (Cartesian) Product ($\times$)

+ +

Create all pairs of tuples.

+ +
+
+ $$\pi_{SPC\_COMMON, BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON, AVG\_HEIGHT} (\textbf{TreeInfo})$$ +
+ + + + + + +
SPC_COMMONAVG_HEIGHT
cedar elm60
lacebark elm45
... and more
+ + + + + + + + + + + + + + + +
SPC_COMMONBORONAMESPC_COMMONAVG_HEIGHT
'honeylocust''Brooklyn'cedar elm60
'honeylocust''Brooklyn'cedar elm60
'American linden''Brooklyn'cedar elm60
'London planetree''Manhattan'cedar elm60
'London planetree''Manhattan'cedar elm60
...
'honeylocust''Brooklyn'lacebark elm45
'honeylocust''Brooklyn'lacebark elm45
'American linden''Brooklyn'lacebark elm45
'London planetree''Manhattan'lacebark elm45
'London planetree''Manhattan'lacebark elm45
... and more
+
+
+ +
+

Cross (Cartesian) Product ($\times$)

+
+ $$\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})$$ +
+

What is the schema of the resulting relation?

+

The relation has a naming conflict
(two attributes with the same name)

+
+ +
+

Renaming ($\rho$)

+
+ $$\rho_{TNAME,\ BORO,\ INAME,\ HEIGHT}\left( \pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})\right)$$ +
+

What is the schema of the resulting relation?

+

When writing cross-products on the board,
I will use implicit renaming

+
+ +
+

Join ($\bowtie_c$)

+

Pair tuples according to a condition c.

+
+ $$\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \bowtie_{T.SPC\_COMMON = TI.SPC\_COMMON} \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})$$ +
+ +
+
+ Identical to... + $$\sigma_{T.SPC\_COMMON = TI.SPC\_COMMON}\left(\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})\right)$$ +
+
+ +
+ $$R \bowtie_c S \equiv \sigma_c(R \times S)$$ +
+
+ +
+

Join Shorthands

+

Equi-joins are joins with only equality tests in the condition.

+
+
Join on attribute(s)
+
$R \bowtie_{A} S \equiv R \bowtie_{R.A = S.A} S$
+
Same values on the listed attributes
+
Natural Join
+
$R \bowtie S \equiv R \bowtie_{attrs(R) \cap attrs(S)} S$
+
Same values on all shared attributes
+
+
+ +
+

Which operators can create duplicates?

+

(Which operators behave differently in Set- and Bag-RA?)

+ + + + + + + + + +
Operator Symbol Duplicates?
Selection $\sigma$ No
Projection $\pi$ Yes
Cross-product $\times$ No
Set-difference$-$ No
Union $\cup$ Yes
Join $\bowtie$No
+
+ + +
+

Group Work

+

Find the BORONAMEs of all boroughs that do have trees with an average height of below 45 inches

+ + + + + + +
SPC_COMMONAVG_HEIGHT
cedar elm60
lacebark elm45
... and more
+ + + + + + + + + +
SPC_COMMONBORONAME
'honeylocust''Brooklyn'
'honeylocust''Brooklyn'
'American linden''Brooklyn'
'London planetree''Manhattan'
'London planetree''Manhattan'
... and more
+ +
+ $$\pi_{BORONAME}(\sigma_{AVG\_HEIGHT < 45}(\textbf{Trees}\bowtie\textbf{TreeInfo}))$$ +
+
+ $$\pi_{BORONAME}(\textbf{Trees}\bowtie\sigma_{AVG\_HEIGHT < 45}(\textbf{TreeInfo}))$$ +
+
+ +
+

Division ($/$)

+ +

Not typically supported as a primitive operator,
but useful for expressing queries like:

+

Find species that appear in all boroughs

+
+ $$\pi_{BORONAME,\ SPC\_COMMON}(\textbf{Trees}) \;\;/\;\;\pi_{SPC\_COMMON}(\textbf{Trees})$$ + (using set relational algebra) +
+

+ $$R / S \equiv \{\; \left<\vec t\right> \;|\; \forall \left<\vec s\right> \in S, \left< \vec t \vec s \right> \in R \;\}$$ +

+
+ +
+ + + + + + + + + + + +
BORO SPC_COMMON
Brooklyn honeylocust
Brooklyn American linden
Brooklyn London planetree
Manhattan honeylocust
Manhattan American linden
Manhattan pin oak
Queens honeylocust
Queens American linden
Bronx honeylocust
+ + + + + +
/ { honeylocust } = Brooklyn, Manhattan, Queens, Bronx
/ { honeylocust, American linden } = Brooklyn, Manhattan, Queens
/ { honeylocust, American linden, pin oak }= Manhattan
+
+ +
+

Group Work

+

If time permits: Implement division using other operators.

+
+ + +
+
+

Relational Algebra

+

+ A simple way to think about and work with
+ computations over collections. +

+

… simple → easy to evaluate

+

… simple → easy to optimize

+

+ Next time, Optimizing RA +

-