Website/src/rants/2012-07-29-The_Viewlet_Tran...

---
title: "The Viewlet Transform (Part 2: The Naive Viewlet Transform)"
author: Oliver Kennedy
---
Last week, I introduced you to how deltas work in AGCA.  To recap
<pre>∂T(A,B,C,…)       = {+/- 1} * (A ^= {X}) * (A ^= {Y}) * (A ^= {Z}) * …</pre>
<pre>∂S(…)             = {0}</pre>
<pre>∂(Q1 + Q2)        = ∂Q1 + ∂Q2</pre>
<pre>∂(Q1 * Q2)        = (Q1 * ∂Q2) + (∂Q1 * Q2) + (∂Q1 * ∂Q2)</pre>
<pre>∂(AggSum([…], Q)) = AggSum([…], ∂Q)</pre>
<pre>∂({…})            = {0}</pre>
<pre>∂(V ^= {…})       = {0} (for the simplified Lift operation only)</pre>
Now, we've been down in the nitty gritty of AGCA for a while now.  Let's pop our heads up for a moment to remember where we're going with all of this.

We have a query (let's call it Q) and we want to be able to incrementally maintain it.  That is, we want to store a copy of the results of evaluating Q on a database (stored on disk, in memory, anywhere really), and <strong>every time the database changes in some way, we want to update the stored results to match</strong>.
<h3>Applying the Delta Transform</h3>
That's fairly easy to do somewhat efficiently if we have these deltas.  Let's say we have the query
<pre>Q := AggSum([], R(A,B) * S(B,C) * A)</pre>
or in SQL
<pre>SELECT SUM(A) AS Q FROM R, S WHERE R.B = S.B;</pre>
Let's say R contains
<pre><span style="text-decoration: underline;">___A__B______#__</span></pre>
<pre> &lt; 1, 1 &gt; -&gt; 1</pre>
<pre> &lt; 1, 2 &gt; -&gt; 1</pre>
<pre> &lt; 2, 2 &gt; -&gt; 1</pre>
and S contains
<pre><span style="text-decoration: underline;">___B__C______#__</span></pre>
<pre> &lt; 1, 1 &gt; -&gt; 2</pre>
<pre> &lt; 2, 2 &gt; -&gt; 1</pre>
For this data, Q = 5.

Let's say we insert a new row: S(2,1).  We could certainly re-evaluate the entire query from scratch and discover that the new result is Q = 8, but this would be pretty inefficient.  Even in the best case, where everything fits in memory, re-evaluating the join requires O(|R| + |S|) work.  That's where the deltas come in.  The delta of Q tells us how the query results change with respect to a change in the table.  So let's take the delta of Q with respect to an insertion of tuple &lt;@Y,@Z&gt; into S.
<pre>∂Q := ∂(AggSum([], R(A,B) * S(B,C) * {A})</pre>
<pre>   := AggSum([], ∂(R(A,B) * S(B,C) * {A}))</pre>
<pre>   := AggSum([], R(A,B) * ∂(S(B,C) * {A})</pre>
<pre>                 + ∂R(A,B) * (S(B,C) * {A})</pre>
<pre>                 + ∂R(A,B) * ∂(S(B,C) * {A}))</pre>
<pre>   := AggSum([], R(A,B) * ∂(S(B,C) * {A)</pre>
<pre>                 + {0} * (S(B,C) * {A})</pre>
<pre>                 + {0} * ∂(S(B,C) * {A}))</pre>
<pre>   := AggSum([], R(A,B) * ∂(S(B,C) * {A}))</pre>
<pre>   := AggSum([], R(A,B) * (S(B,C) * ∂{A}</pre>
<pre>                           + ∂S(B,C) * {A}</pre>
<pre>                           + ∂S(B,C) * ∂{A}))</pre>
<pre>   := AggSum([], R(A,B) * (S(B,C) * {0}</pre>
<pre>                           + ((B ^= {@Y}) * (C ^= {@Z}) * {A})</pre>
<pre>                           + ∂S(B,C) * {0}))</pre>
<pre>   := AggSum([], R(A,B) * (B ^= {@Y}) * (C ^= {@Z}) * {A})</pre>
I'm not going to get into the details of optimizing AGCA expressions (yet), but trust me for now that the following (simpler) query is equivalent
<pre>∂Q := AggSum([], R(A,@Y) * {A})</pre>
or in SQL
<pre>SELECT SUM(A) FROM R WHERE R.B = @Y;</pre>
Note, by the way, that @Y is a parameter to this delta query.  When you evaluate a delta query (for example our delta for insertions into S), these parameters take their value from the tuple being modified (so when you insert &lt;2,1&gt; into S, then @Y = 2).  That said, @Y is just a normal variable/column.  There's nothing special about it (other than the @ in the name).

The delta query tells us how the query results change.  If we insert &lt;2,1&gt; into S, then we evaluate the delta query for insertions into S (∂Q above), setting @Y to 2, and @Z to 1.
<pre>AggSum([], R(A, 2) * {A})</pre>
… which, for our initial dataset above, gives us ∂Q = 3.  To figure out what Q will give us on the modified database (after inserting &lt;2,1&gt; into S), we just add ∂Q to our initial result (5 + 3 = 8).
<h3>Parameters and AggSums</h3>
I keep saying that parameters just normal variables, and that there's nothing special about them.

That's mostly true.  I actually oversimplified a bit on the delta rules.

We want these parameters to be visible from the outside so that evaluating ∂Q for a specific insertion (or deletion) essentially amounts to selecting a single row from the output of ∂Q.  In other words, the AggSums need to be rewritten slightly so that the parameters appear in the group-by variables (where appropriate).  That is, the correct delta with respect to an insertion into S : +S(@Y,@Z) is
<pre style="margin: 8px;">AggSum([@Y], R(A, @Y) * {A})</pre>
or in SQL
<pre style="margin: 8px;">SELECT R.B, SUM(R.A) FROM R GROUP BY R.B;</pre>
That little hiccup out of the way, let's get to the actual viewlet transform
<h3>Auxiliary Views</h3>
The delta query is an improvement over evaluating the entire query from scratch.  For this particular example though, we still need to scan over multiple rows of R (even if it is only a small subset of R).  We can do even better.

Right now, every time the database changes we update Q with ∂Q.
<pre>ON +S(@X, @Y)</pre>
<pre>  Q += AggSum([@Y], R(A,@Y) * {A})</pre>
But recall that for any AGCA query Q, ∂Q is just an ordinary, simple, unexceptional AGCA query (no funny business is introduced by the ∂).  If we can store ('materialize' to use the technical term) the results of Q, what's to stop us from storing the results of ∂Q?  Nothing!

Let's say we had another view materialized (let's call it M_S), this time with a group by variable:
<pre style="margin: 8px;">M_S[Y] := AggSum([Y], R(A, Y) * {A})</pre>
For the initial dataset above, this would contain
<pre><span style="text-decoration: underline;">___Y______#__</span></pre>
<pre> &lt; 1 &gt; -&gt; 1</pre>
<pre> &lt; 2 &gt; -&gt; 3</pre>
This view can help us substantially when we need to update Q after an insertion into S.  Expressing this update as a trigger:
<pre>ON +S(@Y, @Z)</pre>
<pre>  Q += M_S[@Y]</pre>
In other words, to update Q, we just need to look up one row of M_S.  The update can be done in constant time!  That said, we now have an extra view that we need to maintain.  Fortunately, M_S is simpler than Q, and has only one table, in this case R.  Whenever R changes, we need to update M_S.  Since M_S is defined in terms of a normal, ordinary AGCA expression, we update it in exactly the same way that we update Q, using the delta of M_S.  For an insertion of &lt;@X,@Y&gt; into R, this would be:
<pre>∂M_S[Y] := AggSum([Y], R(A,Y) * {A})</pre>
<pre>        := AggSum([@X, @Y, Y], (A ^= {@X}) * (Y ^= {@Y}) * {A})</pre>
<pre>        := (Y ^= {@Y}) * {@X}</pre>
Or, expressed as a trigger
<pre>ON +R(@X, @Y)</pre>
<pre>  M_S[Y] += (Y ^= {@Y}) * {@X}</pre>
This can be simplified a bit, since the update operation only produces one row
<pre>ON +R(@X, @Y)</pre>
<pre>  M_S[@Y] += {@X}</pre>
(also a constant time operation)

<span class="Apple-style-span" style="font-size: 14px; font-weight: bold;">Recursive Deltas</span>

What's happening here is that we're saving the results of Q and maintaining them with (several instantiations of) ∂Q.  The key idea of the viewlet transform is that we can also save the results of ∂Q and maintain them with (several instantiations of) ∂(∂Q).  This process repeats recursively, giving us ∂(∂(∂Q)), ∂(∂(∂(∂Q))), and so on.

Every time we add another ∂, another table drops out, making it the delta query simpler.  After enough repetitions, we end up with a query that doesn't depend on the database at all (e.g., ∂M_S above).  At this point, we can stop, since the query can be evaluated in constant time.

This is the viewlet transform.  <strong>Start by materializing the original query, and then alternate computing its delta(s), and recursively materializing the delta(s)</strong>.

I've obscured the issue a bit by not subscripting my ∂s, remember that each delta is taken with respect to a particular event.  Q has four deltas: for both insertion and deletion of both R and S (∂<span class="Apple-style-span" style="font-size: 9px;">+R</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">-R</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">+S</span>, ∂<span class="Apple-style-span" style="font-size: 9px;">-S</span>).  Similarly, M_S has two deltas: for both the insertion and deletion of R and S.

The viewlet transform of Q produces five views: one for Q, and one for each of the first-tier deltas (the second-tier deltas are all constants).

As it turns out, we can be even more efficient than that!  Furthermore, the procedure I describe above runs into problems with special tables (as a bit of a teaser for this, take a close look at the delta of Q with respect to insertions into R).  This <em>naive</em> viewlet transform is insufficient.  Next week, I'll start discussing some changes to the naive viewlet transform that make it more practical and efficient.