From 2b0c02c14fe94298410ff9bdfb4ed71b3f57aef0 Mon Sep 17 00:00:00 2001 From: Oliver Date: Tue, 5 Nov 2019 02:34:37 -0500 Subject: [PATCH] Slides and a few teaching pages updates --- src/teaching/cse-662/2019fa/index.md | 19 +- .../slide/2019-11-05-DifferentialDataflow.erb | 369 +++++++ .../slide/graphics/2019-11-04-2dIteration.svg | 963 ++++++++++++++++++ .../graphics/2019-11-04-DataflowBasic.svg | 184 ++++ .../graphics/2019-11-04-DataflowExpanded.svg | 603 +++++++++++ .../graphics/2019-11-04-FixpointBasic.svg | 339 ++++++ .../graphics/2019-11-04-FixpointExpanded.svg | 525 ++++++++++ src/teaching/index.erb | 3 +- 8 files changed, 2994 insertions(+), 11 deletions(-) create mode 100644 src/teaching/cse-662/2019fa/slide/2019-11-05-DifferentialDataflow.erb create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-2dIteration.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowBasic.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowExpanded.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointBasic.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointExpanded.svg diff --git a/src/teaching/cse-662/2019fa/index.md b/src/teaching/cse-662/2019fa/index.md index 64dfdc0f..b248706f 100644 --- a/src/teaching/cse-662/2019fa/index.md +++ b/src/teaching/cse-662/2019fa/index.md @@ -3,8 +3,6 @@ title: CSE 662 - Languages and Runtimes for Big Data - Fall 2019 paper_ideas: - name: Adaptive Functional Programming url: https://www.cs.cmu.edu/~guyb/papers/popl02.pdf - - name: Differential Dataflow - url: http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf - name: Interactive Checks for Coordination Avoidance (next year) url: http://www.vldb.org/pvldb/vol12/p14-whittaker.pdf details: Next steps of Bloom @@ -102,14 +100,17 @@ After the taking the course, students should be able to: * **Oct 3** - SkyServer on MonetDB presented by SQLatin ([reading](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/4274958/) | slides) * **Oct 8** - Software Transactional Memory ([reading](https://dl.acm.org/citation.cfm?id=1378582)) * **Oct 10** - Skyserver (continued) -* **Oct 15** - NoDB / RAW ([paper 1](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213864) | [paper 2](http://www.vldb.org/pvldb/vol7/p1119-karpathiotakis.pdf)) +* **Oct 15** - NoDB / RAW ([reading 1](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213864) | [reading 2](http://www.vldb.org/pvldb/vol7/p1119-karpathiotakis.pdf)) * **Oct 17** - Group Presentations -* **Oct 22** - Legorithmics ([paper](https://infoscience.epfl.ch/record/186017/files/main-final.pdf) | [slides](slide/2019-10-22-Legorithmics.html)) -* **Oct 24** - Streaming ([paper](https://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf) | [slides](slide/2019-10-24-Cayuga.html)) -* **Oct 29** - Scan Sharing ([paper](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=1687707) | [slides](slide/2019-10-28-ThreadsToData.erb)) -* **Oct 31** - Group Presentations -* **Nov 4** - Differential Dataflow ([paper](http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf)) -* **Nov 6** - Group Presentations +* **Oct 22** - Legorithmics ([reading](https://infoscience.epfl.ch/record/186017/files/main-final.pdf) | [slides](slide/2019-10-22-Legorithmics.html)) +* **Oct 24** - Streaming ([reading](https://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf) | [slides](slide/2019-10-24-Cayuga.html)) +* **Oct 29** - Scan Sharing ([reading](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=1687707) | [slides](slide/2019-10-28-ScanSharing.html)) +* **Oct 31** - Group Presentations (SQLatin / Lannisters) +* **Nov 4** - Differential Dataflow ([reading](http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf) | [slides](slide/2019-11-05-DifferentialDataflow.html)) +* **Nov 6** - Group Presentations (Alpha Nebula / Komlan) +* **Nov 11** - Online Aggregation / Ripple Joins ([reading 1](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=253291) | [reading 2](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=304208)) +* **Nov 13** - Group Presentations (SQLatin / Lannisters) +* **Dec 6** - CSE Demo Day [Friday] --- diff --git a/src/teaching/cse-662/2019fa/slide/2019-11-05-DifferentialDataflow.erb b/src/teaching/cse-662/2019fa/slide/2019-11-05-DifferentialDataflow.erb new file mode 100644 index 00000000..84891464 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/2019-11-05-DifferentialDataflow.erb @@ -0,0 +1,369 @@ +--- +template: templates/cse662_2019_slides.erb +title: Differential Dataflow +date: November 5 +--- + + +
+
+

Recap: AGCA / DBToaster

+ +

Think of relations as functions from tuples to multiplicities

+ $$R(t) \rightarrow \text{multiplicity}$$ +
    +
  • Negative multiplicities are allowed
  • +
  • All possible tuples have multiplicities
  • +
  • Only a finite number of tuples have non-zero multiplicities
  • +
+
+ +
+

Recap: AGCA / DBToaster

+
+
Join
+
$[R \bowtie S](t_1 \bowtie t_2) = [R](t_1) \times [S](t_2)$
+ +
Union
+
$[R \uplus S](t) = [R](t) + [S](t)$
+ +
Projection
+
$[\pi R](t) = \sum_{t'} [R](t \bowtie t')$
+ +
Selection
+
$[\sigma_{\phi} R](t) = [R](t) \times \phi(t)$
+
+
+ +
+

Recap: AGCA / DBToaster

+ +

+ Given $f(R)$, $R$, and $\delta R$, compute $f(R \uplus \delta R)$ as $f(R) \uplus f'(R, \delta R)$ +

+

+ Recursive implementation by cases. +

+

+ Base case: $\delta(R) = \delta^+R - \delta^-R$ +

+
+ +
+

Recap: AGCA / DBToaster

+ +
+
Join
+
$\delta(R \bowtie S) = (\delta(R) \bowtie S) \uplus (R \bowtie \delta(S))$
 $\uplus (\delta(R) \bowtie \delta(S))$
+ +
Union
+
$\delta(R \uplus S) = \delta(R) \uplus \delta(S)$
+ +
Projection
+
$\delta(\pi R) = \pi(\delta(R))$
+ +
Selection
+
$\delta(\sigma R) = \sigma(\delta(R))$
+
+ +
+
+ +
+
+

Dataflow Systems

+

For example...

+
    +
  • Hadoop/MapReduce
  • +
  • Dryad
  • +
  • GraphLab
  • +
  • Spark
  • +
+
+ +
+

Dataflow Systems

+

A graph of deterministic (usually idempotent) operators.

+

Edges between operators representing data flows.

+

(Simple example: Relational Algebra tree)

+
+ +
+

Dataflow System Challenges

+
+
Partitioning
+
How to avoid each instance of an operator needing every record.
+ +
Scheduling
+
Where/When should each operator execute to minimize latency/data transfer.
+
+
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Loops in Dataflow Systems

+ +

Example: Connected Components

+ +
    +
  1. Assign each node an identifier (label).
  2. +
  3. Propagate identifiers along edges.
  4. +
  5. Each node gets the smallest label from itself or peers.
  6. +
  7. Repeat from 2 until no more data
  8. +
+
+ +
+ $$Q := min_{label}\big((Q \bowtie E) \uplus L\big)$$ +
+ +
+ +
+ +
+

... but from the scheduler's perspective ...

+
+ +
+ +
+
+ +
+
+

Or in RA

+ +

+ $$Q_0 := L$$ +

+

+ $$Q_1 := min\big((Q_0 \bowtie E) \uplus Q_0\big)$$ +

+

+ $$Q_2 := min\big((Q_1 \bowtie E) \uplus Q_1\big)$$ +

+

+ $$Q_3 := min\big((Q_2 \bowtie E) \uplus Q_2\big)$$ +

+

+ ... +

+

+ (until $Q_i = Q_{i-1}$) +

+
+ +
+

Expensive!

+
+
+ +
+
+

The IVM View

+ +

+ $$\delta Q_0 := Q_0 = L$$ +

+

+ $$\delta Q_1 := Q_1 - Q_0 \approx min(\delta Q_0 \bowtie E)$$ +

+

+ $$\delta Q_2 := Q_2 - Q_1 \approx min(\delta Q_1 \bowtie E)$$ +

+

+ $$\delta Q_3 := Q_3 - Q_2 \approx min(\delta Q_2 \bowtie E)$$ +

+

+ ... +

+

+ (until $\delta Q_i = \emptyset$) +

+
+ +
+

+ $$\delta Q_{i+1} = min(Q_{i} \bowtie E \cup Q_{i}) - Q_{i}$$ +

+

+ $$ \approx min(Q_{i} \bowtie E) - Q_{i}$$ +

+

+ $$ = min((Q_{i-1} \cup \delta Q_{i}) \bowtie E) - Q_{i}$$ +

+

+ $$ = min((Q_{i-2} \cup \delta Q_{i-1} \cup \delta Q_{i}) \bowtie E) - Q_{i}$$ +

+

+ $$ = min\big( (\sum_{i' \leq i} \delta Q_{i'}) \bowtie E\big) - Q_{i}$$ +

+

+ Note: $\delta Q_{i'} \bowtie E \subseteq Q_{i} \;\;\; \forall i' < i$ +

+

+ $$ = min\big( \delta Q_{i} \bowtie E\big) - Q_{i}$$ +

+
+ +
+ $$\delta Q_{i+1} \approx min\big( \delta Q_{i} \bowtie E\big) - Q_{i}$$ +
+
+ +
+
+

Now what if we want to modify E?

+
+ +
+

Recall

+ +

+ $$Q_{0,0} := L$$ +

+

+ $$Q_{1,0} := min\big((Q_0 \bowtie E) \uplus Q_0\big)$$ +

+

+ $$Q_{2,0} := min\big((Q_1 \bowtie E) \uplus Q_1\big)$$ +

+

+ $$Q_{3,0} := min\big((Q_2 \bowtie E) \uplus Q_2\big)$$ +

+

+ ... +

+

+ (until $Q_{i,0} = Q_{i-1,0}$) +

+
+ +
+ +
+ +
+

+ $$Q_{0,1} := L$$ +

+

+ $$Q_{1,1} := min\big((Q_0 \bowtie (E \uplus \delta E_1)) \uplus Q_0\big)$$ +

+

+ $$Q_{2,1} := min\big((Q_1 \bowtie (E \uplus \delta E_1)) \uplus Q_1\big)$$ +

+

+ $$Q_{3,1} := min\big((Q_2 \bowtie (E \uplus \delta E_1)) \uplus Q_2\big)$$ +

+

+ ... +

+

+ (until $Q_{i,1} = Q_{i-1,1}$) +

+
+ +
+

+ $$Q_{0,2} := L$$ +

+

+ $$Q_{1,2} := min\big((Q_0 \bowtie (E \uplus \delta E_1 \uplus \delta E_2)) \uplus Q_0\big)$$ +

+

+ $$Q_{2,2} := min\big((Q_1 \bowtie (E \uplus \delta E_1 \uplus \delta E_2)) \uplus Q_1\big)$$ +

+

+ $$Q_{3,2} := min\big((Q_2 \bowtie (E \uplus \delta E_1 \uplus \delta E_2)) \uplus Q_2\big)$$ +

+

+ ... +

+

+ (until $Q_{i,2} = Q_{i-1,2}$) +

+
+ +
+

+ $$Q_{i+1, j} := min\big((Q_{i,j} \bowtie (\sum_{j' \leq j} \delta E_{j'})) \uplus Q_{i,j}\big)$$ +

+

+ observe that ... + $$Q_{i+1, j-1} := min\big((Q_{i,j-1} \bowtie (\sum_{j' < j} \delta E_{j'})) \uplus Q_{i,j-1}\big)$$ +

+

+ so... + $$Q_{i+1, j} := min\big((Q_{i,j-1} \bowtie E_j) \ldots$$ +

+

+ $$\uplus Q_{i, j-1} \uplus Q_{i,j} \ldots$$ +

+

+ $$\uplus (Q_{i,j} - Q_{i, j-1})\bowtie(\sum_{j' \leq j} \delta E_{j'}) \big)$$ +

+
+ +
+

+ Let $\delta Q_{i,j}$ be all newly introduced values relative to all predecessors. +

+ $$\delta Q_{i,j} = Q_{i,j} - Q_{i-1,j} - Q_{i, j-1}$$ +
+ +
+

+ $$Q_{i,j} := min\big((Q_{i-1,j-1} \bowtie E_j) \uplus Q_{i-1, j-1} \uplus Q_{i-1,j} \uplus (Q_{i-1,j} - Q_{i-1, j-1})\bowtie(\sum_{j' \leq j} \delta E_{j'}) \big)$$ +

+

+ $$\delta Q_{i,j} = min\big((Q_{i-1,j-1} \bowtie E_j) \uplus (Q_{i-1,j} - Q_{i-1, j-1})\bowtie(\sum_{j' \leq j} \delta E_{j'}) \big)$$ +

+

+ $$\delta Q_{i,j} = min\big((\delta Q_{i,j-1} \bowtie E_j) \uplus (Q_{i-1,j} - Q_{i-1, j-1})\bowtie(\sum_{j' \leq j} \delta E_{j'}) \big)$$ +

+

+ $$\delta Q_{i,j} = min\big((\delta Q_{i,j-1} \bowtie E_j) \uplus (\delta Q_{i-1,j}\bowtie\sum_{j' \leq j} \delta E_{j'}) \big)$$ +

+

+ Much cheaper to evaluate!
+ (but requires keeping around all $\delta Q_{i,j}$ +

+
+
+ + \ No newline at end of file diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-2dIteration.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-2dIteration.svg new file mode 100644 index 00000000..76efd7db --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-2dIteration.svg @@ -0,0 +1,963 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + Step the fixpoint iteration + + + + + + Q0,0 + + + + + Q1,0 + + + + + + + Q2,0 + + + + + + + Q3,0 + + + + + + Add/Remove Some Edges + + + + + + Q0,1 + + + + + + Q1,1 + + + + + + + Q2,1 + + + + + + + Q3,1 + + + + + + + + + Q0,2 + + + + + + Q1,2 + + + + + + + Q2,2 + + + + + + + Q3,2 + + + + + + + + + Q0,3 + + + + + + Q1,3 + + + + + + + Q2,3 + + + + + + + Q3,3 + + + + + + + + + + + + + + + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowBasic.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowBasic.svg new file mode 100644 index 00000000..ac7828a7 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowBasic.svg @@ -0,0 +1,184 @@ + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowExpanded.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowExpanded.svg new file mode 100644 index 00000000..cad763b9 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-DataflowExpanded.svg @@ -0,0 +1,603 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointBasic.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointBasic.svg new file mode 100644 index 00000000..18f3ab93 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointBasic.svg @@ -0,0 +1,339 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + fixpoint + + + + + + + + + + + + + + + L + + + + + E + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointExpanded.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointExpanded.svg new file mode 100644 index 00000000..befe0969 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-11-04-FixpointExpanded.svg @@ -0,0 +1,525 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + E + + + + + + + + + + + + + + + + + + E + + + + + + + + + + + + + + + + + + E + + + + + + + + + + + + + + + L + + ... + + diff --git a/src/teaching/index.erb b/src/teaching/index.erb index bad4f7b4..ad8e2d15 100644 --- a/src/teaching/index.erb +++ b/src/teaching/index.erb @@ -11,7 +11,6 @@ title: Courses