This commit is contained in:
Oliver Kennedy 2019-09-11 23:16:31 -04:00
parent 660e54468a
commit 97addf7418
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60

View file

@ -0,0 +1,124 @@
=== Programming with Collections ===
- Datalog Recap
- Propositional Calculus (0th order logic)
- Facts (P), Basic operations: (Not, And, Or), Implication
- Example Facts: AliceWentToTheStore, BobWentToTheStore,
AliceWentToHome
- Implication:
- (if P and Q then R)
- === R \or \not P \or \not Q
- === “Horn Clause”
- 1st order logic
- Goal: Quantification
- Challenge, need way to enumerate classes/sets of facts
- Groups of facts: WentTo(Store, Alice), WentTo(Store, Bob),
WentTo(Home, Alice)
- For all facts in a group (\forall) a property holds
- There exists a fact in a group (\exists) such that a property
holds
- New way to discuss implication:
- Given one or more facts P(X,Y), Q(X), …., “infer” a new
fact R(Y)
- If WentTo(X, Y) and ShoppingAt(X) then ShoppingDone(Y)
- \forall Y, \exists X: If P(X,Y) and Q(X) then R(Y)
- \forall Y, \exists X: \not P(X,Y) \or \not Q(X) \or R(Y)
- Which elements of R must be true?
- SELECT Y FROM P NATURAL JOIN Q INTO R
- Datalog
- R(Y) -= P(X,Y), Q(X)
- Head, Body
- Find values of Y for which R is true?
- Find a value of X for which P(X,Y) and Q(X) are true
- What about R(Y, Z) -= P(X,Y), Q(X)
- Alternative View:
- R(Y) is a function
- Dom => Bool
- Given Y, find a value of X for which P(X,Y) and Q(X)
evaluate to true.
- Support
- Support: The set of values of Y for which R(Y)
evaluates to true
- Finite Support: The support set has a fixed size
- If P(X,Y), Q(X) have finite support, so does R(Y)
- actually, we can do a bit better… to be discussed
shortly
- Natural consequence:
- R(Y,Z) is true for any value of Z as long as R(Y)
would be true.
- R(Y,Z) has an infinite support!
- Z is “unsafe” or “unbound"
- Y is “bound” or “safe”
- Safety and Support
- Assume we have a S(Y,Z) with finite support.
- R(Y,Z) dies not have finite support
- What about ( S(Y,Z) and R(Y,Z) )
- Interestingly enough, this actually does have finite
support: Because Z is safe in S, it does not need to be
safe in R.
- In general, a variable is safe in a conjunction of terms
IFF it is safe in at least one of the terms.
- What else can you do with this idea?
- Functions F(X) -> ???
- What about other functions?
- F(X,Y) -> true if (X < Y)
- How about to natural numbers?
- Simple way to express Bags!
- R(X) -> N = number of instances of X in R
(multiplicity)
- Leads to some interesting math:
- R(X) U S(X) === R(X) + S(X)
- R(X) |><| S(X) === R(X) * S(X)
- Q(X) -= R(X,Y) * S(X, Y) === Aggregation: Count
Group-By X of R(X,Y) |><| S(X,Y)
- SELECT COUNT(*) FROM R NATURAL JOIN S
- How about real numbers?
- R(X) -> Multiplicity
- F(X) -> {X}
- Q() -= SUM[X] ( R(X) * {X} )
- SELECT SUM(X) FROM R
- Q() -= SUM[X] ( R(X,Y) * S(Y,Z) * {Z < 10} * {X} )
- SELECT SUM(R.X) FROM R NATURAL JOIN S WHERE S.Z < 10
- OR: Aggregate { Start with R, Join with S, Filter on
Z < 10, Multiply multiplicity by X }
- Sequence of transformations, each modifying the
output of the last
- Pipelining technique sometimes referred to as a
“Monad"
- Key insight: Operations on the values commute through operations
on the relation functions. Q(F(X)) = F(Q(X))
- Updates (Given R, S, T, ... -> R', S', T', ...)
- Insert-only:
- Given \Delta R s.t. R' = R U \Delta R
- Given Q(R, S, T, ...), derive
\Delta_R Q(R, S, T, ..., \Delta R) s.t.
Q(R', S', T', ...) = Q(R, S, T, ...) U \Delta Q(R, S, T, ..., \Delta R)
- For single-row inserts, can sometimes reduce to non-collection operations.
- Deletion: Allow value domain to be negative
- Now R U \Delta R with negative-valued \Delta R entries represents deletions
- Recursion
- Q(A) = R(A,B) x S(B,C) x T(C,D)
- \Delta_R[A',B'] Q(A') = S(B',C) x T(C, D)
- MaterializedQ[A'] += S(B',C) x T(C, D)
- But what if we had a materialized version of the RHS?
- Only care about B', can project away C, D
- MaterializedQ[A'] += MaterializedDRQ[B']
- Requires maintaining DRQ: \Delta S, \Delta T
- Full lattice of tables
- Slight optimization: Factorize cross-products.
- Recursive view maintenance breaks in several cases:
- Step functions (e.g., COUNT WHERE SUM(X) > 5) or EXISTS) or non-algebraic aggs (AVERAGE)
- Can't "update". Need to recompute.
- Solution: Stage:
1. Update everything that can be updated,
2. compute updated version,
3. Propagate updates and repeat from 1
- Outline the Lift operator + Semantics
- High treewidth queries. (e.g., dense graphical models)
- Bad tradeoff between making updates cheaper and requiring more
updates. Dense relations/high treewidth queries explode the
number of rows required.
- Alternative approach: Pick one (or more) join path based on update workload
- Semiring queries (Min/Max)
- No efficient deletions.
-