110 lines
9.2 KiB
HTML
110 lines
9.2 KiB
HTML
---
|
||
title: "The Viewlet Transform (Part 2: The Naive Viewlet Transform)"
|
||
author: Oliver Kennedy
|
||
---
|
||
Last week, I introduced you to how deltas work in AGCA. To recap
|
||
<pre>∂T(A,B,C,…) = {+/- 1} * (A ^= {X}) * (A ^= {Y}) * (A ^= {Z}) * …</pre>
|
||
<pre>∂S(…) = {0}</pre>
|
||
<pre>∂(Q1 + Q2) = ∂Q1 + ∂Q2</pre>
|
||
<pre>∂(Q1 * Q2) = (Q1 * ∂Q2) + (∂Q1 * Q2) + (∂Q1 * ∂Q2)</pre>
|
||
<pre>∂(AggSum([…], Q)) = AggSum([…], ∂Q)</pre>
|
||
<pre>∂({…}) = {0}</pre>
|
||
<pre>∂(V ^= {…}) = {0} (for the simplified Lift operation only)</pre>
|
||
Now, we've been down in the nitty gritty of AGCA for a while now. Let's pop our heads up for a moment to remember where we're going with all of this.
|
||
|
||
We have a query (let's call it Q) and we want to be able to incrementally maintain it. That is, we want to store a copy of the results of evaluating Q on a database (stored on disk, in memory, anywhere really), and <strong>every time the database changes in some way, we want to update the stored results to match</strong>.
|
||
<h3>Applying the Delta Transform</h3>
|
||
That's fairly easy to do somewhat efficiently if we have these deltas. Let's say we have the query
|
||
<pre>Q := AggSum([], R(A,B) * S(B,C) * A)</pre>
|
||
or in SQL
|
||
<pre>SELECT SUM(A) AS Q FROM R, S WHERE R.B = S.B;</pre>
|
||
Let's say R contains
|
||
<pre><span style="text-decoration: underline;">___A__B______#__</span></pre>
|
||
<pre> < 1, 1 > -> 1</pre>
|
||
<pre> < 1, 2 > -> 1</pre>
|
||
<pre> < 2, 2 > -> 1</pre>
|
||
and S contains
|
||
<pre><span style="text-decoration: underline;">___B__C______#__</span></pre>
|
||
<pre> < 1, 1 > -> 2</pre>
|
||
<pre> < 2, 2 > -> 1</pre>
|
||
For this data, Q = 5.
|
||
|
||
Let's say we insert a new row: S(2,1). We could certainly re-evaluate the entire query from scratch and discover that the new result is Q = 8, but this would be pretty inefficient. Even in the best case, where everything fits in memory, re-evaluating the join requires O(|R| + |S|) work. That's where the deltas come in. The delta of Q tells us how the query results change with respect to a change in the table. So let's take the delta of Q with respect to an insertion of tuple <@Y,@Z> into S.
|
||
<pre>∂Q := ∂(AggSum([], R(A,B) * S(B,C) * {A})</pre>
|
||
<pre> := AggSum([], ∂(R(A,B) * S(B,C) * {A}))</pre>
|
||
<pre> := AggSum([], R(A,B) * ∂(S(B,C) * {A})</pre>
|
||
<pre> + ∂R(A,B) * (S(B,C) * {A})</pre>
|
||
<pre> + ∂R(A,B) * ∂(S(B,C) * {A}))</pre>
|
||
<pre> := AggSum([], R(A,B) * ∂(S(B,C) * {A)</pre>
|
||
<pre> + {0} * (S(B,C) * {A})</pre>
|
||
<pre> + {0} * ∂(S(B,C) * {A}))</pre>
|
||
<pre> := AggSum([], R(A,B) * ∂(S(B,C) * {A}))</pre>
|
||
<pre> := AggSum([], R(A,B) * (S(B,C) * ∂{A}</pre>
|
||
<pre> + ∂S(B,C) * {A}</pre>
|
||
<pre> + ∂S(B,C) * ∂{A}))</pre>
|
||
<pre> := AggSum([], R(A,B) * (S(B,C) * {0}</pre>
|
||
<pre> + ((B ^= {@Y}) * (C ^= {@Z}) * {A})</pre>
|
||
<pre> + ∂S(B,C) * {0}))</pre>
|
||
<pre> := AggSum([], R(A,B) * (B ^= {@Y}) * (C ^= {@Z}) * {A})</pre>
|
||
I'm not going to get into the details of optimizing AGCA expressions (yet), but trust me for now that the following (simpler) query is equivalent
|
||
<pre>∂Q := AggSum([], R(A,@Y) * {A})</pre>
|
||
or in SQL
|
||
<pre>SELECT SUM(A) FROM R WHERE R.B = @Y;</pre>
|
||
Note, by the way, that @Y is a parameter to this delta query. When you evaluate a delta query (for example our delta for insertions into S), these parameters take their value from the tuple being modified (so when you insert <2,1> into S, then @Y = 2). That said, @Y is just a normal variable/column. There's nothing special about it (other than the @ in the name).
|
||
|
||
The delta query tells us how the query results change. If we insert <2,1> into S, then we evaluate the delta query for insertions into S (∂Q above), setting @Y to 2, and @Z to 1.
|
||
<pre>AggSum([], R(A, 2) * {A})</pre>
|
||
… which, for our initial dataset above, gives us ∂Q = 3. To figure out what Q will give us on the modified database (after inserting <2,1> into S), we just add ∂Q to our initial result (5 + 3 = 8).
|
||
<h3>Parameters and AggSums</h3>
|
||
I keep saying that parameters just normal variables, and that there's nothing special about them.
|
||
|
||
That's mostly true. I actually oversimplified a bit on the delta rules.
|
||
|
||
We want these parameters to be visible from the outside so that evaluating ∂Q for a specific insertion (or deletion) essentially amounts to selecting a single row from the output of ∂Q. In other words, the AggSums need to be rewritten slightly so that the parameters appear in the group-by variables (where appropriate). That is, the correct delta with respect to an insertion into S : +S(@Y,@Z) is
|
||
<pre style="margin: 8px;">AggSum([@Y], R(A, @Y) * {A})</pre>
|
||
or in SQL
|
||
<pre style="margin: 8px;">SELECT R.B, SUM(R.A) FROM R GROUP BY R.B;</pre>
|
||
That little hiccup out of the way, let's get to the actual viewlet transform
|
||
<h3>Auxiliary Views</h3>
|
||
The delta query is an improvement over evaluating the entire query from scratch. For this particular example though, we still need to scan over multiple rows of R (even if it is only a small subset of R). We can do even better.
|
||
|
||
Right now, every time the database changes we update Q with ∂Q.
|
||
<pre>ON +S(@X, @Y)</pre>
|
||
<pre> Q += AggSum([@Y], R(A,@Y) * {A})</pre>
|
||
But recall that for any AGCA query Q, ∂Q is just an ordinary, simple, unexceptional AGCA query (no funny business is introduced by the ∂). If we can store ('materialize' to use the technical term) the results of Q, what's to stop us from storing the results of ∂Q? Nothing!
|
||
|
||
Let's say we had another view materialized (let's call it M_S), this time with a group by variable:
|
||
<pre style="margin: 8px;">M_S[Y] := AggSum([Y], R(A, Y) * {A})</pre>
|
||
For the initial dataset above, this would contain
|
||
<pre><span style="text-decoration: underline;">___Y______#__</span></pre>
|
||
<pre> < 1 > -> 1</pre>
|
||
<pre> < 2 > -> 3</pre>
|
||
This view can help us substantially when we need to update Q after an insertion into S. Expressing this update as a trigger:
|
||
<pre>ON +S(@Y, @Z)</pre>
|
||
<pre> Q += M_S[@Y]</pre>
|
||
In other words, to update Q, we just need to look up one row of M_S. The update can be done in constant time! That said, we now have an extra view that we need to maintain. Fortunately, M_S is simpler than Q, and has only one table, in this case R. Whenever R changes, we need to update M_S. Since M_S is defined in terms of a normal, ordinary AGCA expression, we update it in exactly the same way that we update Q, using the delta of M_S. For an insertion of <@X,@Y> into R, this would be:
|
||
<pre>∂M_S[Y] := AggSum([Y], R(A,Y) * {A})</pre>
|
||
<pre> := AggSum([@X, @Y, Y], (A ^= {@X}) * (Y ^= {@Y}) * {A})</pre>
|
||
<pre> := (Y ^= {@Y}) * {@X}</pre>
|
||
Or, expressed as a trigger
|
||
<pre>ON +R(@X, @Y)</pre>
|
||
<pre> M_S[Y] += (Y ^= {@Y}) * {@X}</pre>
|
||
This can be simplified a bit, since the update operation only produces one row
|
||
<pre>ON +R(@X, @Y)</pre>
|
||
<pre> M_S[@Y] += {@X}</pre>
|
||
(also a constant time operation)
|
||
|
||
<span class="Apple-style-span" style="font-size: 14px; font-weight: bold;">Recursive Deltas</span>
|
||
|
||
What's happening here is that we're saving the results of Q and maintaining them with (several instantiations of) ∂Q. The key idea of the viewlet transform is that we can also save the results of ∂Q and maintain them with (several instantiations of) ∂(∂Q). This process repeats recursively, giving us ∂(∂(∂Q)), ∂(∂(∂(∂Q))), and so on.
|
||
|
||
Every time we add another ∂, another table drops out, making it the delta query simpler. After enough repetitions, we end up with a query that doesn't depend on the database at all (e.g., ∂M_S above). At this point, we can stop, since the query can be evaluated in constant time.
|
||
|
||
This is the viewlet transform. <strong>Start by materializing the original query, and then alternate computing its delta(s), and recursively materializing the delta(s)</strong>.
|
||
|
||
I've obscured the issue a bit by not subscripting my ∂s, remember that each delta is taken with respect to a particular event. Q has four deltas: for both insertion and deletion of both R and S (∂<span class="Apple-style-span" style="font-size: 9px;">+R</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">-R</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">+S</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">-S</span>). Similarly, M_S has two deltas: for both the insertion and deletion of R and S.
|
||
|
||
The viewlet transform of Q produces five views: one for Q, and one for each of the first-tier deltas (the second-tier deltas are all constants).
|
||
|
||
As it turns out, we can be even more efficient than that! Furthermore, the procedure I describe above runs into problems with special tables (as a bit of a teaser for this, take a close look at the delta of Q with respect to insertions into R). This <em>naive</em> viewlet transform is insufficient. Next week, I'll start discussing some changes to the naive viewlet transform that make it more practical and efficient.
|