notes
This commit is contained in:
parent
660e54468a
commit
97addf7418
124
src/teaching/cse-662/2019fa/slide/2019-09-12-DBToaster.erb
Normal file
124
src/teaching/cse-662/2019fa/slide/2019-09-12-DBToaster.erb
Normal file
|
@ -0,0 +1,124 @@
|
|||
=== Programming with Collections ===
|
||||
|
||||
- Datalog Recap
|
||||
- Propositional Calculus (0th order logic)
|
||||
- Facts (P), Basic operations: (Not, And, Or), Implication
|
||||
- Example Facts: AliceWentToTheStore, BobWentToTheStore,
|
||||
AliceWentToHome
|
||||
- Implication:
|
||||
- (if P and Q then R)
|
||||
- === R \or \not P \or \not Q
|
||||
- === “Horn Clause”
|
||||
- 1st order logic
|
||||
- Goal: Quantification
|
||||
- Challenge, need way to enumerate classes/sets of facts
|
||||
- Groups of facts: WentTo(Store, Alice), WentTo(Store, Bob),
|
||||
WentTo(Home, Alice)
|
||||
- For all facts in a group (\forall) a property holds
|
||||
- There exists a fact in a group (\exists) such that a property
|
||||
holds
|
||||
- New way to discuss implication:
|
||||
- Given one or more facts P(X,Y), Q(X), …., “infer” a new
|
||||
fact R(Y)
|
||||
- If WentTo(X, Y) and ShoppingAt(X) then ShoppingDone(Y)
|
||||
- \forall Y, \exists X: If P(X,Y) and Q(X) then R(Y)
|
||||
- \forall Y, \exists X: \not P(X,Y) \or \not Q(X) \or R(Y)
|
||||
- Which elements of R must be true?
|
||||
- SELECT Y FROM P NATURAL JOIN Q INTO R
|
||||
- Datalog
|
||||
- R(Y) -= P(X,Y), Q(X)
|
||||
- Head, Body
|
||||
- Find values of Y for which R is true?
|
||||
- Find a value of X for which P(X,Y) and Q(X) are true
|
||||
- What about R(Y, Z) -= P(X,Y), Q(X)
|
||||
- Alternative View:
|
||||
- R(Y) is a function
|
||||
- Dom => Bool
|
||||
- Given Y, find a value of X for which P(X,Y) and Q(X)
|
||||
evaluate to true.
|
||||
- Support
|
||||
- Support: The set of values of Y for which R(Y)
|
||||
evaluates to true
|
||||
- Finite Support: The support set has a fixed size
|
||||
- If P(X,Y), Q(X) have finite support, so does R(Y)
|
||||
- actually, we can do a bit better… to be discussed
|
||||
shortly
|
||||
- Natural consequence:
|
||||
- R(Y,Z) is true for any value of Z as long as R(Y)
|
||||
would be true.
|
||||
- R(Y,Z) has an infinite support!
|
||||
- Z is “unsafe” or “unbound"
|
||||
- Y is “bound” or “safe”
|
||||
- Safety and Support
|
||||
- Assume we have a S(Y,Z) with finite support.
|
||||
- R(Y,Z) dies not have finite support
|
||||
- What about ( S(Y,Z) and R(Y,Z) )
|
||||
- Interestingly enough, this actually does have finite
|
||||
support: Because Z is safe in S, it does not need to be
|
||||
safe in R.
|
||||
- In general, a variable is safe in a conjunction of terms
|
||||
IFF it is safe in at least one of the terms.
|
||||
- What else can you do with this idea?
|
||||
- Functions F(X) -> ???
|
||||
- What about other functions?
|
||||
- F(X,Y) -> true if (X < Y)
|
||||
- How about to natural numbers?
|
||||
- Simple way to express Bags!
|
||||
- R(X) -> N = number of instances of X in R
|
||||
(multiplicity)
|
||||
- Leads to some interesting math:
|
||||
- R(X) U S(X) === R(X) + S(X)
|
||||
- R(X) |><| S(X) === R(X) * S(X)
|
||||
- Q(X) -= R(X,Y) * S(X, Y) === Aggregation: Count
|
||||
Group-By X of R(X,Y) |><| S(X,Y)
|
||||
- SELECT COUNT(*) FROM R NATURAL JOIN S
|
||||
- How about real numbers?
|
||||
- R(X) -> Multiplicity
|
||||
- F(X) -> {X}
|
||||
- Q() -= SUM[X] ( R(X) * {X} )
|
||||
- SELECT SUM(X) FROM R
|
||||
- Q() -= SUM[X] ( R(X,Y) * S(Y,Z) * {Z < 10} * {X} )
|
||||
- SELECT SUM(R.X) FROM R NATURAL JOIN S WHERE S.Z < 10
|
||||
- OR: Aggregate { Start with R, Join with S, Filter on
|
||||
Z < 10, Multiply multiplicity by X }
|
||||
- Sequence of transformations, each modifying the
|
||||
output of the last
|
||||
- Pipelining technique sometimes referred to as a
|
||||
“Monad"
|
||||
- Key insight: Operations on the values commute through operations
|
||||
on the relation functions. Q(F(X)) = F(Q(X))
|
||||
- Updates (Given R, S, T, ... -> R', S', T', ...)
|
||||
- Insert-only:
|
||||
- Given \Delta R s.t. R' = R U \Delta R
|
||||
- Given Q(R, S, T, ...), derive
|
||||
\Delta_R Q(R, S, T, ..., \Delta R) s.t.
|
||||
Q(R', S', T', ...) = Q(R, S, T, ...) U \Delta Q(R, S, T, ..., \Delta R)
|
||||
- For single-row inserts, can sometimes reduce to non-collection operations.
|
||||
- Deletion: Allow value domain to be negative
|
||||
- Now R U \Delta R with negative-valued \Delta R entries represents deletions
|
||||
- Recursion
|
||||
- Q(A) = R(A,B) x S(B,C) x T(C,D)
|
||||
- \Delta_R[A',B'] Q(A') = S(B',C) x T(C, D)
|
||||
- MaterializedQ[A'] += S(B',C) x T(C, D)
|
||||
- But what if we had a materialized version of the RHS?
|
||||
- Only care about B', can project away C, D
|
||||
- MaterializedQ[A'] += MaterializedDRQ[B']
|
||||
- Requires maintaining DRQ: \Delta S, \Delta T
|
||||
- Full lattice of tables
|
||||
- Slight optimization: Factorize cross-products.
|
||||
- Recursive view maintenance breaks in several cases:
|
||||
- Step functions (e.g., COUNT WHERE SUM(X) > 5) or EXISTS) or non-algebraic aggs (AVERAGE)
|
||||
- Can't "update". Need to recompute.
|
||||
- Solution: Stage:
|
||||
1. Update everything that can be updated,
|
||||
2. compute updated version,
|
||||
3. Propagate updates and repeat from 1
|
||||
- Outline the Lift operator + Semantics
|
||||
- High treewidth queries. (e.g., dense graphical models)
|
||||
- Bad tradeoff between making updates cheaper and requiring more
|
||||
updates. Dense relations/high treewidth queries explode the
|
||||
number of rows required.
|
||||
- Alternative approach: Pick one (or more) join path based on update workload
|
||||
- Semiring queries (Min/Max)
|
||||
- No efficient deletions.
|
||||
-
|
Loading…
Reference in a new issue