notes

2019-09-11 23:16:31 -04:00 · 2019-09-11 23:16:31 -04:00 · 97addf7418
parent 660e54468a
commit 97addf7418
1 changed files with 124 additions and 0 deletions
--- a/src/teaching/cse-662/2019fa/slide/2019-09-12-DBToaster.erb
+++ b/src/teaching/cse-662/2019fa/slide/2019-09-12-DBToaster.erb
@ -0,0 +1,124 @@
+=== Programming with Collections === 
+
+- Datalog Recap
+  - Propositional Calculus (0th order logic)
+    - Facts (P), Basic operations: (Not, And, Or), Implication
+    - Example Facts: AliceWentToTheStore, BobWentToTheStore,
+      AliceWentToHome
+    - Implication:
+      -  (if P and Q then R)
+      - === R \or \not P \or \not Q
+      - === “Horn Clause”
+  - 1st order logic
+    - Goal: Quantification 
+    - Challenge, need way to enumerate classes/sets of facts
+    - Groups of facts: WentTo(Store, Alice), WentTo(Store, Bob),
+      WentTo(Home, Alice)
+    - For all facts in a group (\forall) a property holds
+    - There exists a fact in a group (\exists) such that a property
+      holds
+    - New way to discuss implication:
+      - Given one or more facts P(X,Y), Q(X), …., “infer” a new
+        fact R(Y)
+      - If WentTo(X, Y) and ShoppingAt(X) then ShoppingDone(Y)
+      - \forall Y, \exists X: If P(X,Y) and Q(X) then R(Y)
+      - \forall Y, \exists X: \not P(X,Y) \or \not Q(X) \or R(Y)
+      - Which elements of R must be true?
+      - SELECT Y FROM P NATURAL JOIN Q INTO R
+    - Datalog
+      - R(Y) -= P(X,Y), Q(X)
+      - Head, Body
+      - Find values of Y for which R is true?
+          - Find a value of X for which P(X,Y) and Q(X) are true
+      - What about R(Y, Z) -= P(X,Y), Q(X)
+- Alternative View:
+  - R(Y) is a function
+    - Dom => Bool
+  - Given Y, find a value of X for which P(X,Y) and Q(X)
+    evaluate to true.
+  - Support
+    - Support: The set of values of Y for which R(Y)
+      evaluates to true
+    - Finite Support: The support set has a fixed size
+    - If P(X,Y), Q(X) have finite support, so does R(Y)
+    - actually, we can do a bit better… to be discussed
+      shortly
+  - Natural consequence:
+    - R(Y,Z) is true for any value of Z as long as R(Y)
+      would be true.
+    - R(Y,Z) has an infinite support!
+    - Z is “unsafe” or “unbound"
+    - Y is “bound” or “safe”
+  - Safety and Support
+    - Assume we have a S(Y,Z) with finite support.
+    - R(Y,Z) dies not have finite support
+    - What about ( S(Y,Z) and R(Y,Z) )
+    - Interestingly enough, this actually does have finite
+      support: Because Z is safe in S, it does not need to be
+      safe in R.
+    - In general, a variable is safe in a conjunction of terms
+      IFF it is safe in at least one of the terms.
+- What else can you do with this idea?
+  - Functions F(X) -> ???
+  - What about other functions?
+    - F(X,Y) -> true if (X < Y)
+  - How about to natural numbers?
+    - Simple way to express Bags!
+    - R(X) -> N = number of instances of X in R
+      (multiplicity)
+    - Leads to some interesting math:
+      - R(X) U S(X) === R(X) + S(X)
+      - R(X) |><| S(X) === R(X) * S(X)
+    - Q(X) -= R(X,Y) * S(X, Y) === Aggregation: Count 
+      Group-By X of R(X,Y) |><| S(X,Y)
+    - SELECT COUNT(*) FROM R NATURAL JOIN S
+  - How about real numbers?
+    - R(X) -> Multiplicity
+    - F(X) -> {X}
+    - Q() -= SUM[X] ( R(X) * {X} )
+      - SELECT SUM(X) FROM R
+    - Q() -= SUM[X] ( R(X,Y) * S(Y,Z) * {Z < 10} * {X} )
+      - SELECT SUM(R.X) FROM R NATURAL JOIN S WHERE S.Z < 10
+    - OR: Aggregate { Start with R, Join with S, Filter on
+      Z < 10, Multiply multiplicity by X }
+    - Sequence of transformations, each modifying the
+      output of the last
+      - Pipelining technique sometimes referred to as a
+        “Monad"
+  - Key insight: Operations on the values commute through operations 
+    on the relation functions.  Q(F(X)) = F(Q(X))
+- Updates (Given R, S, T, ... -> R', S', T', ...)
+  - Insert-only: 
+    - Given \Delta R s.t. R' = R U \Delta R
+    - Given Q(R, S, T, ...), derive 
+      \Delta_R Q(R, S, T, ..., \Delta R) s.t. 
+      Q(R', S', T', ...) = Q(R, S, T, ...) U \Delta Q(R, S, T, ..., \Delta R)
+    - For single-row inserts, can sometimes reduce to non-collection operations.
+  - Deletion: Allow value domain to be negative
+    - Now R U \Delta R with negative-valued \Delta R entries represents deletions
+- Recursion
+  - Q(A) = R(A,B) x S(B,C) x T(C,D)
+  - \Delta_R[A',B'] Q(A') = S(B',C) x T(C, D)
+  - MaterializedQ[A'] += S(B',C) x T(C, D)
+    - But what if we had a materialized version of the RHS?
+    - Only care about B', can project away C, D
+  - MaterializedQ[A'] += MaterializedDRQ[B']
+    - Requires maintaining DRQ: \Delta S, \Delta T
+  - Full lattice of tables
+  - Slight optimization: Factorize cross-products.
+  - Recursive view maintenance breaks in several cases:
+    - Step functions (e.g., COUNT WHERE SUM(X) > 5) or EXISTS) or non-algebraic aggs (AVERAGE)
+      - Can't "update".  Need to recompute.
+        - Solution: Stage: 
+            1. Update everything that can be updated, 
+            2. compute updated version, 
+            3. Propagate updates and repeat from 1
+        - Outline the Lift operator + Semantics
+    - High treewidth queries. (e.g., dense graphical models)
+      - Bad tradeoff between making updates cheaper and requiring more 
+        updates.  Dense relations/high treewidth queries explode the 
+        number of rows required.  
+      - Alternative approach: Pick one (or more) join path based on update workload
+    - Semiring queries (Min/Max)
+      - No efficient deletions.
+  -