diff --git a/src/teaching/cse-662/2019fa/index.md b/src/teaching/cse-662/2019fa/index.md index 50af0353..70f29d54 100644 --- a/src/teaching/cse-662/2019fa/index.md +++ b/src/teaching/cse-662/2019fa/index.md @@ -104,8 +104,8 @@ After the taking the course, students should be able to: * **Oct 10** - Skyserver (continued) * **Oct 15** - NoDB / RAW ([paper 1](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213864) | [paper 2](http://www.vldb.org/pvldb/vol7/p1119-karpathiotakis.pdf)) * **Oct 17** - Group Presentations -* **Oct 22** - Legorithmics ([paper](https://infoscience.epfl.ch/record/186017/files/main-final.pdf)) -* **Oct 24** - Streaming ([paper](https://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf)) +* **Oct 22** - Legorithmics ([paper](https://infoscience.epfl.ch/record/186017/files/main-final.pdf) | [slides](slides/2019-10-22-Legorithmics.html)) +* **Oct 24** - Streaming ([paper](https://www.cs.cornell.edu/johannes/papers/2007/2007-CIDR-Cayuga.pdf) | [slides](slides/2019-10-24-Cayuga.html)) * **Oct 29** - Scan Sharing ([paper](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=1687707)) * **Oct 31** - Group Presentations * **Nov 4** - Differential Dataflow ([paper](http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf)) diff --git a/src/teaching/cse-662/2019fa/slide/2019-10-24-Cayuga.erb b/src/teaching/cse-662/2019fa/slide/2019-10-24-Cayuga.erb index 0d62715c..e4103411 100644 --- a/src/teaching/cse-662/2019fa/slide/2019-10-24-Cayuga.erb +++ b/src/teaching/cse-662/2019fa/slide/2019-10-24-Cayuga.erb @@ -3,3 +3,795 @@ template: templates/cse662_2019_slides.erb title: Cayuga date: October 24 --- + +
+
+

Non-Standard Database Workloads

+
+
Stock Markets
+
Alert me when a stock reverses a downward trend.
+
Manufacturing IoT
+
Alert me when two adjacent process steps both signal non-critical errors.
+
Cloud Computing
+
Alert me when the number of errors is more than twice as high as the 2-week average.
+
+
+ +
+ + + + + + + +
Classical DB
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
These ProblemsClassical DB
Expressive QueriesExpressive Queries
Changing DataStatic Data🗶
Static QueriesAd-Hoc Queries
Latency: MsecLatency: Sec/Min🗶
+
+ + +
+ + + + + + + + + +
+
🗶
+
Classical DBPublish/Subscribe
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
These ProblemsPub/Sub
Expressive QueriesFilter Queries🗶
Changing DataChanging Data
Static QueriesStatic Queries
Latency: MsecLatency: Msec
+ +
+ +
+

Trivial

+

+

Expressive

+
+ +
+

Trivial

+

+

Performant Expressiveness

+

+

Expressive

+
+ +
+

Cayuga

+
+
+
Language
+
Maximize Expressiveness w/o Compromising Performance
+
+ +
+
Compiler
+
Emit a tight, optimized program representation
+
+ +
+
Runtime
+
Necessary support for concurrent, asynchronous execution
+
+
+
+ +
+ +
+
+

Language

+ +

Start with something familiar

+
+ +
+
+
Projection, Selection, Union
+
Single-pass operators: Easy to do efficiently
+
Join
+
Multi-pass operator: Will need to revisit
+
Aggregate
+
Single-pass operator: Probably ok
+
Blocking operator: Not ok
+
+
+ +
+

Projection

+

+                          SELECT A, B, C, ... 
+                          FROM [Query]
+    
+ +

Emit tuples emitted by [Query] with only columns A, B, C

+
+ +
+

Selection

+

+                    FILTER { [Condition] } [Query]
+    
+

Emit only tuples emitted by [Query] that pass [Condition]

+
+ +
+

Union

+

+                        [Query1] UNION [Query2]
+    
+

Emit any tuples emitted by either [Query1] or [Query2]

+
+
+ +
+
+

Join

+ +
    +
  1. $O(N^2)$ complexity doesn't work when $N = \infty$
  2. +
  3. Storage requirements grow infinitely
  4. +
  5. Work per tuple grows with every insertion
  6. +
+ +

How to fix?

+
+ +
+
+
+
RHS tuple has to arrive after LHS tuple
+
Storage requirement only scales in LHS complexity
+
+
+
Each LHS tuple joins at most one RHS tuple
+
$O(N^2) \rightarrow O(N)$
+
Better chance of work staying constant
+
+
+
+ +
+

Join (Next)

+

+                 [Query1] NEXT { [Condition] } [Query2]
+    
+
    +
  1. For each tuple emitted by [Query1],
  2. +
  3. wait until [Query2] emits a tuple that passes [Condition]
  4. +
  5. and emit the cartesian product of the tuples
  6. +
+
+
+ +
+
+

Aggregate

+ +

+ Blocking operators are not ok. Need semantics that allow tuples to be emitted sooner. +

+ +
+ +
+

Group-by...ish Aggregates

+ +
+ +
+

Aggregate (Fold)

+

+      [Query1] FOLD { [Condition1], [Condition2], [Agg] } [Query2]
+    
+
    +
  1. For each tuple emitted by [Query1]
  2. +
  3. Wait until [Query2] emits a tuple that passes [Condition1]
  4. +
  5. Update [Agg]
  6. +
  7. Emit the cartesian product of the [Query1] tuple, the first [Query2] tuple, and the [Agg] value
  8. +
  9. If the [Query2] tuple ALSO passes [Condition2] repeat from 2
  10. +
+
+ +
+

Analogous to...

+

+      [Query1] NEXT { [Condition1] } [Query2]
+               NEXT { [Condition1] } [Query2] 
+               NEXT { [Condition1] } [Query2] 
+               NEXT { [Condition1] } [Query2] 
+               ... until [Condition2] is failed
+    
+
+
+ +
+ +
+

Cayuga

+
+
+
Language
+
Maximize Expressiveness w/o Compromising Performance
+
+ +
+
Compiler
+
Emit a tight, optimized program representation
+
+ +
+
Runtime
+
Necessary support for concurrent, asynchronous execution
+
+
+
+ +
+

Deterministic Finite Automata

+ +

Model a program by a directed graph

+ + +
+ +
+

Deterministic Finite Automata

+ +

The program accepts an input: A string.

+
    +
  1. Start at the start state.
  2. +
  3. Find the transition edge corresponding to the next character and follow it.
  4. +
  5. Repeat from 2 until the end state of end of string
  6. +
  7. Accept the string if the final state is the end state
  8. +
+
+ +
+ +

/Hi+!/ ↣ + "Hi!" + "OHiiiiii!" + "Ha!"

+
+ +
+

Deterministic Finite Automata

+ + + +

... but what if we don't know which edge to take?

+
+ +
+

Nondeterministic Finite Automata

+ +

The program state is a set of active states

+ +
    +
  1. Start in state $\{\texttt{start}\}$
  2. +
  3. Initialize the next state to $\{\}$
  4. +
  5. For each active state, follow each transition edge with a matching letter and add the destination to the active states in the next step
  6. +
  7. Replace the current state with the next state.
  8. +
  9. Repeat from 2 until the end state is active or there are no active states
  10. +
  11. Accept the string if the end state is active
  12. +
+
+ +
+ +

/Ha?i!+/ ↣ + "Hi!" + "OHai!" + "HaHai!" + "HiHaH!" +

+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LetterStart$S_1$$S_2$$S_3$End
H
i
H
a
H
!
+
+ +
+

Nondeterministic Finite Automata

+ +

NDFAs can be compiled down to DFAs

+
+
+ +
+ +
+

Cayuga

+
+
+
Language
+
Maximize Expressiveness w/o Compromising Performance
+
+ +
+
Compiler
+
Emit a tight, optimized program representation
+
+ +
+
Runtime
+
Necessary support for concurrent, asynchronous execution
+
+
+
+ +
+

Cayuga Autometa

+

Each node of the NDFA is a relation.

+

Each transition of the NDFA is a join condition + projection

+
+ +
+

Example

+ +
    +
  1. Look for high-volume (10,000 or more) trades
  2. +
  3. When one happens, check if it's followed by a 10 minute sequence of trades with dropping prices
  4. +
  5. Wait for the stock to rally (5% higher than its lowest point) and alert me
  6. +
+
+ +
+

+  SELECT Name, MaxPrice, MinPrice, Price as FinalPrice
+      -- Only consider aggregates spanning 10 minutes or more
+  FROM FILTER { dur ≥ 10 min } (
+    ( 
+      -- Trigger aggregate when a Stock w/ Volume > 10000 sells
+      SELECT Name, Price_1 AS MaxPrice, Price as MinPrice
+      FROM Filter { Volume > 10000 } Stock
+    ) FOLD { 
+        $2.Name = $.Name,   -- Grouping Condition
+        $2.Price < $.Price  -- Continue Condition
+    } Stock -- Fold over any stock
+  ) NEXT { 
+      -- Find the next upturn after a 10 minute descending run
+      $2.Name = $1.Name AND $2.Price > 1.05 * $1.MinPrice
+  } Stock
+    
+
+ +
+ + +
+ +
+

+      CREATE TABLE A(
+        Name_l STRING,    -- From LHS
+        MaxPrice DECIMAL, -- From LHS
+        MinPrice DECIMAL, -- From LHS
+        Name_r STRING,    -- From RHS
+        Price Decimal,    -- From RHS
+        Start Int,        -- From LHS
+        End Int           -- From RHS
+      )
+    
+ +

+      CREATE TABLE B(
+        Name STRING,      
+        MaxPrice DECIMAL, 
+        MinPrice DECIMAL, 
+        Price Decimal,    
+        Start Int,        
+        End Int           
+      )
+    
+
+ +
+ + +
+ +
+ + +
NamePriceValuationTime
+ + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
+
+ +
+ + + +
NamePriceValuationTime
IBM9015,0009:10
+ + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM90
+
+ +
+ + + + +
NamePriceValuationTime
IBM9015,0009:10
IBM857,0009:15
+ + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM85
+
+ +
+ + + + + +
NamePriceValuationTime
IBM9015,0009:10
IBM857,0009:15
Dell4011,0009:17
+ + + + + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM85
Dell40Dell40
+
+ +
+ + + + + + +
NamePriceValuationTime
IBM9015,0009:10
IBM857,0009:15
Dell4011,0009:17
IBM818,0009:21
+ + + + + + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM81IBM9081
Dell40Dell40
+
+ +
+ + + + + + + +
NamePriceValuationTime
IBM9015,0009:10
IBM857,0009:15
Dell4011,0009:17
IBM818,0009:21
MSFT256,0009:23
+ + + + + + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM81IBM9081
Dell40Dell40
+
+ +
+ + + + + + + + +
NamePriceValuationTime
IBM9015,0009:10
IBM857,0009:15
Dell4011,0009:17
IBM818,0009:21
MSFT256,0009:23
IBM919,0009:24
+ + + + + + + + + + + + + + + + +
State AState BEmitted
Name_lMinPriceName_rPriceNameMinPricePrice 
IBM90IBM81IBM!
Dell40Dell40
+
+
+ +
+
+

Cayuga

+
+
+
Language
+
Maximize Expressiveness w/o Compromising Performance
+
+ +
+
Compiler
+
Emit a tight, optimized program representation
+
+ +
+
Runtime
+
Necessary support for concurrent, asynchronous execution
+
+
+
+ +
+

Challenges

+
+
Asynchronous Arrival
+
Updates may arrive out of order
+
Threading
+
Make sure each thread sees a concurrent view of the state
+
Shallow Copies
+
Need to keep track of which threads are using which state
+
Relational State
+
Lots of work for each event!
+
String Comparisons
+
Expensive!
+
+
+ +
+

Asynchronous Arrival

+ +

Simple Solution: Add a delay to event processing to buffer for out-of-order arrival.

+
+ +
+

Threading

+ +

Mostly Simple Solution: Parallel processing of one event to create a new state, swap in the new state, repeat.

+
+ +
+

Shallow Copies

+ +

Not so Simple Solution: Add an epoch-based garbage collector to detect when an object falls out of scope.

+ +

(Reference counting creates points of contention on every refcount update)

+
+ +
+

Relational State

+ +

Simple Solution: Index the states to make it easier to discover which states a new event interacts with.

+
+ +
+

String Comparison

+ +

Simple Solution: Build a dictionary of strings (can be done asynchronously while the event is waiting to be processed).

+
+
\ No newline at end of file diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaExample.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaExample.svg new file mode 100644 index 00000000..d42f22aa --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaExample.svg @@ -0,0 +1,474 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + Start + A + B + + + + e.volume >10,000 + e.Name = A.Name ANDe.END - A.START ≥ 10 min + e.Name = B.Name ANDe.price > 1.05 * B.MinPrice + e.Name ≠ B.Name + e.Name ≠ A.Name + e.name = A.Name ANDe.Price < A.Price + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUnannotated.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUnannotated.svg new file mode 100644 index 00000000..75ecceb2 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUnannotated.svg @@ -0,0 +1,372 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + Start + A + B + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUpdates.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUpdates.svg new file mode 100644 index 00000000..e034c2d3 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-CayugaUpdates.svg @@ -0,0 +1,464 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + Start + A + B + + + + MinPrice, MaxPrice ← e.PriceName_l, Name_r ← e.Name + Name ← Name_l + No Change + No Change + No Change + Price ← e.Price + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-DFAExample.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-DFAExample.svg new file mode 100644 index 00000000..5b2601b7 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-DFAExample.svg @@ -0,0 +1,480 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + i + + + + H + not H + + + + not i + anything + i + + not i or ! + ! + + + + Start + S1 + S2 + S3 + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-NDFAExample.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-NDFAExample.svg new file mode 100644 index 00000000..a641c394 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-NDFAExample.svg @@ -0,0 +1,430 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + i + + + + H + anything + + + a + + ! + + + + Start + S1 + S2 + S3 + H + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-PubSub.svg b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-PubSub.svg new file mode 100644 index 00000000..0117ad8c --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-24-PubSub.svg @@ -0,0 +1,2979 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/DB.png b/src/teaching/cse-662/2019fa/slide/graphics/DB.png new file mode 100644 index 00000000..90f993ef Binary files /dev/null and b/src/teaching/cse-662/2019fa/slide/graphics/DB.png differ