Oliver's rewrites.

2023-07-20 14:58:38 -04:00 · 2023-07-20 14:58:38 -04:00 · 6b6bcc7cf0
parent e632147da8
commit 6b6bcc7cf0
5 changed files with 74 additions and 67 deletions
--- a/sections/experiments.tex
+++ b/sections/experiments.tex
@ -4,14 +4,20 @@

 \begin{figure}
  \includegraphics[width=\columnwidth]{figures/tpch.pdf}
+  \vspace*{-3mm}
  \trimmedcaption{Compile times on TPC-H using a set of 7 rules.}
  \label{fig:results}
 \end{figure}

-Evaluation was performed on a 3.5 GHz AMD Ryzen 9 5950X 16-Core CPU, with Linux 6.2.6, OpenJDK 11.0.19, Scala 2.12.15.
+Evaluation was performed on Linux 6.2.6 on a 3.5 GHz AMD Ryzen 9 5950X 16-Core CPU with 64GB RAM.  
+The Astral runtime was implemented in Scala 2.12.15 running on OpenJDK 11.0.19.
 Results shown are averaged over 10 runs, with 4 discarded burn-in trials to trigger JIT.
 We implemented the Astral compiler, as well as its work sharing optimization as a rewriter for Spark 3.4.1.
-We evaluated Astral's work sharing optimization by manually translating 7 rewrite rules (selected to be relevant to the 22 queries of the TPC-H workload) into Astral: 
+We evaluated Astral's work sharing optimization by manually translating 7 rewrite rules (selected to be relevant to the 22 queries of the TPC-H workload) into Astral-compatible match syntax\footnote{
+  \url{
+    https://git.odin.cse.buffalo.edu/Astral/astral-compiler/src/branch/main/astral/catalyst/src/com/astraldb/catalyst/Catalyst.scala
+  }
+}: 
 \textbf{PushProjectionThroughUnion}, 
 \textbf{PushProjectionThroughLimit}, 
 \textbf{ReorderJoin}, 
--- a/sections/introduction.tex
+++ b/sections/introduction.tex
@ -18,8 +18,7 @@ reframes common compilation and analysis tasks as database
 operations. \systemlang unifies existing database optimizations (e.g.,
 work sharing from streaming systems) with existing compiler tricks
 (e.g., Tree Toasting~\cite{balakrishnan:2021:sigmod:treetoaster}),
-laying the groundwork for a creating truly scalable, `declarative' compiler
-that leverages a wide array of data processing techniques from the database community.
+laying the groundwork for creating a `declarative' compiler that scales to massive codebases by leveraging techniques from the database community.

 \paragraph{Production Rules}
 Compiler transformations and optimizations are often expressed in
@ -62,14 +61,14 @@ As the optimizer rewrites segments of the tree, these pre-computed sets (i.e., `

 \paragraph{Work Sharing}
 In this paper, we focus on an orthogonal optimization strategy: Work Sharing~\cite{DBLP:journals/ieeecc/KremienKM93} from stream processing.
-When two queries with a common sub-plan are registered with stream processor, the common sub-plan is only executed once.
+When two queries with a common sub-plan are registered with a stream processor, the common sub-plan is only executed once.
 Similarly, we explore a work sharing optimization where pattern matching predicates common to multiple rules are merged.

 \paragraph{\systemlang}
 Although we focus on one specific optimization in this paper, we emphasize that compiling pattern matching down to a query language opens up a range of further optimization opportunities, including 
 (i) cost-based optimization of evaluation strategies, 
 (ii) parallelization for exploration of large search optimization spaces, and 
-(iii) differential dataflow~\cite{DBLP:conf/cidr/McSherryMII13} for incremental `live' compilation.
+(iii) incremental fix-point view maintenance (e.g.,~\cite{DBLP:conf/cidr/McSherryMII13}) for incremental `live' compilation and execution.

 \paragraph{Case Study: Apache Spark}
 We explore this \systemlang in the context of Apache Spark's Catalyst query optimizer.
@ -77,14 +76,7 @@ We explore this \systemlang in the context of Apache Spark's Catalyst query opti
  A similar figure appears in \cite{balakrishnan:2021:sigmod:treetoaster}.  \Cref{fig:sparkBreakdown} has been updated to account for improvements to the optimizer in Spark version 3.2
 }. 
 At least a quarter of its time is spent iterating over trees (`Search'), and a further quarter is spent on bookkeeping (`Fixpoint Loop').
-Both of these are both strong candidates for database-style optimizations.
-
-For this paper, we translated 7 rules from the Catalyst optimizer into ASTral-compatible match syntax\footnote{
-  \url{
-    https://git.odin.cse.buffalo.edu/Astral/astral-compiler/src/branch/main/astral/catalyst/src/com/astraldb/catalyst/Catalyst.scala
-  }
-}.
-We use this fragment to evaluate our optimizations on the 22 TPC-H benchmark queries.
+Both of these are both strong candidates for elimination through database-inspired optimizations.

 \subsection{Contributions}

--- a/sections/macros.tex
+++ b/sections/macros.tex
@ -6,7 +6,7 @@

 %%%%%%%%%%%%%%%%%%%%%%% Basic Logic %%%%%%%%%%%%%%%%

-\newcommand{\production}[2]{\frac{#2}{#1}}
+\newcommand{\production}[2]{#1 \rightarrow #2}

 \newcommand{\inbrackets}[1]{\left[#1\right]}
 \newcommand{\inset}[1]{\left\{#1\right\}}
--- a/sections/query_evaluation.tex
+++ b/sections/query_evaluation.tex
@ -2,7 +2,7 @@
 \section{Evaluating \systemlang}
 \label{sec:queryEvaluation}

-A typical optimizer is typically defined as a collection of rules of the form $\tuple{\matcher_i, \expression_i} \in \matcherdom \times \expressiondom$\footnote{
+A typical optimizer is defined as a collection of rules of the form $\tuple{\matcher_i, \expression_i} \in \matcherdom \times \expressiondom$\footnote{
  A typical optimizer has many `batches' of such rules, applied in sequence.
  As the generalization to multiple batches is straightforward, we assume only one batch.
 }.  
@ -73,14 +73,15 @@ The expand operator is similar to the Unnest operator in nested relational algeb

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Selecting an Execution Plan}
+\label{sec:executionPlan}
 % \DB{Should we call this section differently? Evaluation Workflow or Rewrite Workflow? In my head its conflicting with the physical plan thats evaluated. Ignore if its just me.}
 Starting with a join over atomic relations allows us to explore the space of evaluation plans by leveraging the associativity and commutativity of join.
 \Cref{alg:makePlan} outlines a simple, greedy strategy for eliminating joins.  
 Specifically, given a query of the form: $\atom_1 \bowtie \ldots \bowtie \atom_n = \rewritematcher{\var}{\matcher}$, the return value of $\textsc{MakePlan}(\inset{\atom_1, \ldots, \atom_n}, \db(\var))$ is equivalent to $\query{\matcher}(\db)$\footnote{
  Matchers with disjunctions rewrite to queries with unions; A more robust optimization strategy is likely possible, but for the purposes of this paper we rely on distributivity to produce a set of union-free queries, each individually passed to \textsc{MakePlan}.
-}
+}.
 $\textsc{MakePlan}$ proceeds in three steps: 
-(i) $\textsc{EnumerateCandidates}$ selects atoms that are candidates for a join elimination rewrite according to the relevant constraint on $\schemaOf{Q}$ (listed with the rewrites).
+(i) $\textsc{EnumerateCandidates}$ selects atoms that are candidates for a join elimination rewrite according to the relevant constraint on $\schemaOf{Q}$ (see the rewrites, above).
 (ii) $\textsc{PickCandidate}$ selects one candidate from the set of rewrites.  We discuss this step in greater depth below
 (iii) $\textsc{Rewrite}$ applies the selected join elimination rewrite.
 In summary, the key challenge of selecting an execution plan is (greedily) selecting an order in which to resolve the atoms.
@ -115,7 +116,7 @@ We leave a more thorough cost-based optimizer to future work, and for now adopt
 \subsection{Merging Plans}

 An optimizer is simultaneously interested in matching multiple patterns.
-We find an appropriate optimization opportunity in stream processing systems (e.g., Aurora/Borealis\cite{DBLP:conf/cidr/AbadiABCCHLMRRTXZ05}), where multiple simultaneous streams are rewritten to share overlapping computations~\cite{DBLP:journals/ieeecc/KremienKM93}.  
+We find an appropriate optimization opportunity in stream processing systems (e.g., Aurora/Borealis\cite{DBLP:conf/cidr/AbadiABCCHLMRRTXZ05,DBLP:books/sp/16/CetintemelAABBCHMMRRSTXZ16}), where multiple simultaneous streams are rewritten to share overlapping computations~\cite{DBLP:journals/ieeecc/KremienKM93}.  
 \Cref{alg:makeSharedPlan} generalizes \Cref{alg:makePlan} to detect and leverage such opportunities, rewriting multiple atom sets in parallel.

 \begin{algorithm}[t]
@ -219,11 +220,11 @@ In \cite{balakrishnan:2021:sigmod:treetoaster}, we contrasted two different stor
 In one, each node of a subtree was stored as a tuple in a relation, with one relation per AST node type (i.e., $\nodelabel$).
 In the other, we superimposed relational semantics over an existing AST.
 Both approaches had competitive performance, but the relational representation had significant storage overheads.
-
+%
 We implement the latter approach for \systemlang, and summarize key features of it here.
 A naive realization of this representation stores AST nodes and variable-sized constants as as heap-allocated objects.
 AST nodes are stored as a tuple of their type and the fields, with fields containing fixed-size constants inlined, and child AST nodes stored by reference.

 Enumerating the subtrees is trivial on this structure, requiring only a series of pointer traversals.
 Node expansion is likewise viable on this naive representation, as variables holding AST nodes necessarily store references to the node and its fields on the heap.
-Subtree replacement requires only maintaining a lookup table of parent nodes, and in the case of Spark, an indirection layer~\cite{balakrishnan:2019:dbpl:fluid} to work around immutability constraints.
+%Subtree replacement requires only maintaining a lookup table of parent nodes, and in the case of Spark, an indirection layer~\cite{balakrishnan:2019:dbpl:fluid} to work around immutability constraints.
--- a/sections/specification.tex
+++ b/sections/specification.tex
@ -5,12 +5,13 @@
 We begin by formalizing pattern match semantics, as typically used in the implementation of compilers to realize the pattern part of production rules.
 Match patterns in most functional languages (e.g., Scala, OcaML) can be expressed through these semantics, while compilers implemented in imperative languages (e.g., Orca) typically invent analogous constructs.

-For simplicity of presentation, we assume that constants are drawn from a domain $\constantdom$ that includes primitive values ($\prim$) and domain of abstract syntax tree nodes $\nodelabel(\ldots)$:
+For simplicity of presentation, we assume that constants are drawn from a domain $\constantdom$ that includes primitive values ($\prim$) and abstract syntax tree nodes $\nodelabel(\ldots)$:
 $$\constantdom : \nodelabel(\constantdom, \ldots \constantdom) \oroption \prim$$
-We assume that the primitive value domain includes at least boolean primitives ($\prim$) true ($\top$) and false ($\bot$).
-We refer to $\nodelabel$ as the AST node type.  
+We assume that the primitive value domain includes at least boolean values true ($\top$) and false ($\bot$).
+We refer to $\nodelabel$ as the AST node type, or label.  
 For simplicity, we abstract primitive-valued expressions $\expression \in \expressiondom$ as functions ($\expressiondom : \scopedom \rightarrow \constantdom$) that map from a scope to a constant.
-We model scopes $\scope in \scopedom$ as maps of variable bindings ($\scopedom: \vardom \rightarrow \constantdom$), where $\vardom$ is the set of all variable names.
+We write $\varsOf{\expression}$ to mean all scope variables referenced by $\expression$.
+We model scopes $\scope \in \scopedom$ as maps of variable bindings ($\scopedom: \vardom \rightarrow \constantdom$), where $\vardom$ is the set of all variable names.
 Unbound variables in a scope return a undefined, \texttt{null} value.

 \begin{figure}
@ -33,12 +34,14 @@ Unbound variables in a scope return a undefined, \texttt{null} value.

 \begin{example} 
  \Cref{fig:exampleAST} illustrates a simple abstract syntax tree with three node types (i.e., $\nodelabel \in \inset{\textbf{Filter}, \textbf{Project}, \textbf{Table}}$).
-  The first child of each node type is a primitive valued constant\footnote{Note that we treat collection types like lists as primitive types}, while the second child of the Filter and Project nodes are both ASTs.
+  The first child of each node type is a primitive valued constant\footnote{Note that we treat collection types like lists as primitive types}, while both of the Filter and Project nodes' second child is an AST node.
 \end{example}

 We define a language of match patterns as follows.
-The core of the language is a pattern that matches AST nodes and checks its children against sub-patterns, and a pattern that matches anything.
-The pattern matching language also includes basic boolean operators, as well as a set of rules that exist to manipulate or use the scope, matching based on the result of expression evaluation, matching an element of the scope against another pattern, or binding the result of expression evaluation or a sub-pattern into the scope.
+An AST node pattern $\ell(\matcher, \ldots, \matcher)$ applies a set of patterns to its children.
+The wildcard pattern $\matchany$ matches anything.
+The assignment operator $\var \leftarrow \cdot$ assigns variables.
+An arbitrary boolean-valued expression may be evaluated against bound variables as a pattern, and the operator $\passToMatcher$ applies a pattern to a bound variable.  The remaining patterns are simple boolean operations.
 \begin{align*}
  \matcherdom :=& \nodelabel(\matcherdom, \ldots, \matcherdom) 
                  \oroption \matchany 
@ -46,17 +49,17 @@ The pattern matching language also includes basic boolean operators, as well as
                  \oroption \matcherdom \wedge \matcherdom
                  \oroption \matcherdom \vee \matcherdom\\
                & \oroption \expressiondom
-                  \oroption \vardom @ \matcherdom
+                  \oroption \vardom \passToMatcher \matcherdom
                  \oroption \vardom \gets \expressiondom
                  \oroption \vardom \gets \matcherdom
 \end{align*}

-Where it is clear to do so, for a variable $\var \in \vardom$, we overload $\var$ to mean the matcher $\var \gets \matchany$.  
+Where it is clear to do so, we use variables $\var \in \vardom$ to mean a variable binding match pattern (i.e., the pattern $\var \gets \matchany$).  

 \begin{example}
  \label{ex:pattern}
-  With $\inset{\texttt{cond}, \texttt{tgt}, \texttt{child}} \subset \vardom$, the match pattern for the select pushdown optimization from the introduction is:
-  $\textbf{Filter}( \texttt{cond} , \textbf{Project}( \texttt{tgt}, \texttt{child} ))$.  Note the similarity to the match pattern in the introduction.
+  With $\texttt{cond}, \texttt{tgt}, \texttt{child} \in \vardom$, the match pattern for the select pushdown optimization from the introduction is:
+  $\textbf{Filter}( \texttt{cond} , \textbf{Project}( \texttt{tgt}, \texttt{child} ))$.  Note the similarity to the scala match pattern syntax in the introduction.
 \end{example}


@ -74,10 +77,10 @@ We mark scope updates by $\scope[\var \backslash \constant]$ to mean $\scope$ wi

 \begin{example}
  The pattern from our running example can be equivalently stated as:
-  $$\textbf{Filter}( \texttt{cond}, \texttt{p} ) \wedge \texttt{p} \passToMatcher \textbf{Project}( \texttt{tgt}, \texttt{child} )$$
+  $$\textbf{Filter}( \texttt{cond}, \texttt{p} ) \wedge \left( \texttt{p} \passToMatcher \textbf{Project}( \texttt{tgt}, \texttt{child} ) \right)$$
  If the \textbf{Filter} node is not matched, the conjunction shortcuts.  
  If it is matched, the right half of the conjunction is evaluated with \texttt{cond} and \texttt{p} bound (in the scope) to the \textbf{Filter} node's children.
-  The $@$ operator evaluates the \textbf{Project} matcher on the constant bound to the variable \texttt{p} (i.e., $\scope(\texttt{p})$), and the rest proceeds as before.
+  The $\passToMatcher$ operator evaluates the \textbf{Project} matcher on the constant bound to the variable \texttt{p} (i.e., $\scope(\texttt{p})$), and the rest proceeds as before.
 \end{example}


@ -109,7 +112,7 @@ We mark scope updates by $\scope[\var \backslash \constant]$ to mean $\scope$ wi
        \scope & \textbf{if } \expression(\scope) = \top\\
        \nullresult & \textbf{otherwise}
      \end{cases}\\
-    \evalmatcher{\var @ \matcher}(\constant)(\scope)
+    \evalmatcher{\var \passToMatcher \matcher}(\constant)(\scope)
      & = \evalmatcher{\matcher}(\scope(\var))(\scope)\\
    \evalmatcher{\var \leftarrow \expression}(\constant)(\scope)
      & = \scope[\var \backslash \expression(\scope)]\\
@ -117,19 +120,19 @@ We mark scope updates by $\scope[\var \backslash \constant]$ to mean $\scope$ wi
      & = \evalmatcher{\matcher}(\constant)(\scope[\var \backslash \constant])
  \end{align*}
  \vspace*{-3mm}
-  \trimmedcaption{Operational semantics for match patterns.}
+  \trimmedcaption{Semantics for match patterns.}
  \label{fig:evalMatcherSemantics}
 \end{figure}

 \begin{example}
  \label{ex:sideEffects}
-  When \texttt{tgt} of a \textbf{Project} operator has side-effects or other non-deterministic behavior, systems like Spark will not apply the selection push-down optimization.
-  Let $\textsc{det}$ be an externally provided function that determines whether a target is deterministic.  
+  When a \textbf{Project} operator has side-effects or other non-deterministic behavior, systems like Spark will not apply the selection push-down optimization.
+  Let $\textsc{det}$ be an externally provided function that determines whether a \textbf{Project} is deterministic.  
  Recall that we do not explicitly define expression evaluation semantics, so let $\textsc{det}(\texttt{tgt})$ be an expression that applies $\textsc{det}$ to the variable \texttt{tgt} in the scope.  
  We can then write a `safe' version of the selection push-down pattern as;
  $$\textbf{Filter}(\texttt{cond}, \textbf{Project}(\texttt{tgt}, \texttt{child})) \wedge \textsc{det}(\texttt{tgt})$$
  If the left-half of the conjunction succeeds, the variables \texttt{cond}, \texttt{tgt}, and \texttt{child} will be bound in the scope.  
-  The right-half succeeds if \textsc{det} returns true on the value bound to \texttt{tgt}.
+  The right-half succeeds if the expression $\textsc{det}(\texttt{tgt})$ evaluates to true on the resulting scope.
 \label{eg:MatchPattern}
 \end{example}

@ -139,10 +142,10 @@ We mark scope updates by $\scope[\var \backslash \constant]$ to mean $\scope$ wi
 \subsection{Application of Match Patterns}

 In general, we are interested in match patterns applied to entire ASTs.  
-Let $\db \in \constantdom$ denote an abstract syntax tree instance\footnote{
-  We note that while $\db$ may also be a constant, this case is uninteresting.
+Let $\db \in \constantdom$ be an abstract syntax tree instance\footnote{
+  While $\db$ may also be a constant, this case is not usually interesting.
 }.
-Let the subtrees of $\db$ be defined as:
+We define the subtrees of $\db$ as:
 $$\subtreesOf{\db} = \inset{\db} \cup \begin{cases}
  \bigcup_{i} \subtreesOf{\constant_i} & \textbf{if } \db = \nodelabel(\constant_1, \ldots, \constant_n) \\
  \emptyset & \textbf{otherwise}
@ -150,7 +153,7 @@ $$\subtreesOf{\db} = \inset{\db} \cup \begin{cases}
 \begin{example}
  \label{ex:subtrees}
  The subtrees of the AST in \Cref{fig:exampleAST} are $\textbf{Filter}(\ldots)$, $\texttt{'X>3'}$, $\textbf{Project}(\ldots)$, $\texttt{['X', 'Y']}$, $\textbf{Table}(\ldots)$, and $\texttt{'R'}$.  
-  While this list includes everything, including primitive values, moving forward we will consider only AST nodes like $\textbf{Filter}(\ldots)$ as subtrees.
+  Note that this list includes everything, including primitive values.
 \end{example}

 For a matcher $\matcher \in \matcherdom$, we define $\query{\matcher}(\db)$ as a search over every subtree of $\db$;
@ -216,7 +219,7 @@ $$\query{\matcher}(\db) = \comprehension{

 \subsection{\systemlang}

-We next demonstrate that any match pattern in the language defined above may be rewritten into a simplified ``flat'' form based on relational algebra that we call \systemlang.
+We next demonstrate that any match pattern in the language defined above may be rewritten into a simplified ``flat'' form, based on relational algebra, that we call \systemlang.
 Specifically, we extend positive relational algebra with a set of task-specific relational \textbf{match atoms} $\atom \in \atomdom$ as follows:
 $$\atomdom := 
            \inbrackets{\vardom = \nodelabel(\vardom, \ldots, \vardom)}
@ -226,10 +229,10 @@ $$\atomdom :=
  \oroption \bot
  \oroption \db(\vardom)$$

-Formal semantics for match atoms are defined in \Cref{fig:atomSemantics}.
+Semantics for match atoms are defined in \Cref{fig:atomSemantics}.
 We note that several atom types have infinite cardinalities; we return to this point shortly.
 To summarize, 
-(i)~a \textbf{Match Atom} ($\inbrackets{\var = \nodelabel(\var_1, \ldots, \var_n)}$) defines a (infinite) relation, with schema $\inset{\var, \var_1, \ldots, \var_n}$, consisting of every possible AST node, with the node in attribute $\var$, and the remaining attributes assigned to the node's fields.
+(i)~a \textbf{Match Atom} ($\inbrackets{\var = \nodelabel(\var_1, \ldots, \var_n)}$) defines a (infinite) relation, with schema $\inset{\var, \var_1, \ldots, \var_n}$, consisting of every possible AST node, with the node in attribute $\var$, and the node's fields as the remaining attributes.
 (ii)~a \textbf{Binding Atom} ($\inbrackets{\var = \expression}$) defines a (infinite) relation, with schema $\inset{\var} \cup \varsOf{\expression}$, of possible assignments to $\varsOf{\expression}$ and the result of evaluating $\expression$ on them;
 (iii)~a \textbf{Test Atom} ($\inbrackets{\expression}$) defines the (infinite) relation, with schema $\varsOf{\expression}$, of all assignments to $\varsOf{\expression}$ on which $\expression$ evaluates to true;
 (iv)~a \textbf{Universal Atom} ($\top$) defines the relation, with a nullary schema, consisting of a single tuple; 
@ -258,7 +261,7 @@ We write $\schemaOf{\atom}$ for the schema of $\atom$.
    \rewritematcher{\var}{\expression}
      & =  \inbrackets{\expression}\\
    %
-    \rewritematcher{\var}{\var' @ \matcher}
+    \rewritematcher{\var}{\var' \passToMatcher \matcher}
      & =  \rewritematcher{\var'}{\matcher}\\
    %
    \rewritematcher{\var}{\var' \leftarrow \expression}
@ -267,28 +270,45 @@ We write $\schemaOf{\atom}$ for the schema of $\atom$.
    \rewritematcher{\var}{\var' \leftarrow \matcher}
      & =  \rewritematcher{\var}{\matcher} \bowtie \inbrackets{\var' = \var}
  \end{align*}
-  \trimmedcaption{Reducing match patterns to \systemlang; Each $\genvar$ denotes a freshly allocated variable name}
+  \trimmedcaption{Rewriting match patterns to \systemlang; Each $\genvar$ is a freshly allocated variable name.}
  \label{fig:reductionToFOL}
 \end{figure}

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\paragraph{Query Safety}
+
+While some match atoms are infinite-cardinality relations, we want queries to produce only finite outputs. 
+This concept is typically captured by the \emph{safety} property: a relation or query is safe (even if one of its component relations is unsafe) if it returns a finite set of results.
+If a relation is finite, we know that its attributes have a finite domain and call relation safe.
+If an infinite relation is joined with a finite relation, the result is safe if the keys of the infinite relation participate in the join.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\paragraph{Rewriting Match Patterns}
+
+
 We next address the problem of rewriting `match' style pattern queries into \systemlang.
 \Cref{fig:reductionToFOL} defines the rewrite rule $\rewritematcher{\var}{\matcher}$ which translates a match pattern $\matcher \in \matcherdom$ into a join over match atoms.  
-With $\var$ as a unique variable name, $\query{\matcher}(\db) \equiv \db(\var) \bowtie \rewritematcher{\var}{\matcher}$.
+We specifically target joins to benefit from the commutativity and associativity of the join operator when picking an execution strategy in \Cref{fig:executionPlan}.
+With $\var$ as a unique variable name, $\query{\matcher}(\db) \equiv \db(\var) \bowtie \rewritematcher{\var}{\matcher}$.  
+The result is guaranteed to be safe.

 \begin{example}
  \label{ex:rewrite}
-  Continuing our running example, and recalling that a bare variable $\var$ is shorthand for $\var \leftarrow \matchany$:
+  Continuing our running example, and recalling that a bare variable $\var$ is shorthand for $\var \leftarrow \matchany$, we expand $\query{m}(\db) =$
  {\footnotesize\begin{align*}
-    \query{m}(\db) & = \db(r) \bowtie \rewritematcher{r}{\textbf{Filter}(\texttt{cond}, \textbf{Project}(\texttt{tgt}, \texttt{child}))}\\
-                 & = \db(r) \bowtie \inbrackets{r = \textbf{Filter}(a, b)} \bowtie \rewritematcher{a}{\texttt{cond} \leftarrow \matchany} \bowtie \rewritematcher{b}{\textbf{Project}(\texttt{tgt}, \texttt{child})}\\
-                 & = \db(r) \bowtie \inbrackets{r = \textbf{Filter}(a, b)} \bowtie \top \bowtie \inbrackets{\texttt{cond} = a} \bowtie \rewritematcher{b}{\textbf{Project}(\texttt{tgt}, \texttt{child})}\\
+    &\; \db(r) \bowtie \rewritematcher{r}{\textbf{Filter}(\texttt{cond}, \textbf{Project}(\texttt{tgt}, \texttt{child}))}\\
+                 = &\; \db(r) \bowtie \inbrackets{r = \textbf{Filter}(a, b)} \bowtie \rewritematcher{a}{\texttt{cond} \leftarrow \matchany} \bowtie \rewritematcher{b}{\textbf{Project}(\texttt{tgt}, \texttt{child})}\\
+                 = &\; \db(r) \bowtie \inbrackets{r = \textbf{Filter}(a, b)} \bowtie \top \bowtie \inbrackets{\texttt{cond} = a} \bowtie \rewritematcher{b}{\textbf{Project}(\texttt{tgt}, \texttt{child})}\\
  \end{align*}}\\[-8mm]
  \textbf{Project} is expanded similarly to \textbf{Filter}.  
  The atom $\inbrackets{r = \textbf{Filter}(a, b)}$ has schema $\inset{r, a, b}$ and is defined for every triple where $r = \textbf{Filter}(a, b)$.  
  Because $r$ is a key for this relation, observe that the query $\db(r) \bowtie \inbrackets{r = \textbf{Filter}(a, b)}$ computes the (finite) set of subtrees of $\db$ that are \textbf{Filter}-typed AST nodes (with attributes $r$, $a$, $b$ taking the values of the node and its two children, respectively).
 \end{example}

-Note the following relational algebra equivalences
+\noindent Note the following relational algebra equivalences
 $$ 
 Q \bowtie \top \equiv Q
  \hspace{10mm} %%%%%%%%%%%%%%%%%%%%
@ -296,17 +316,5 @@ Q \bowtie \bot \equiv \bot
  \hspace{10mm} %%%%%%%%%%%%%%%%%%%%
 \db(\var)\bowtie\db(\var) \equiv \db(\var)
 $$
-The first two equivalences follow from the relations $\top$ and $\bot$ being the identity and annihilator values for $\bowtie$ respectively~\cite{DBLP:conf/pods/GreenKT07}.
+The first two equivalences follow from the relations $\top$ and $\bot$ being the identity and annihilator values for $\bowtie$ respectively.
 The third follows from the idempotency of natural join on keyed relations.
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\paragraph{Query Safety}
-
-As we note above, match atoms are infinite-cardinality relations, but naturally, we want queries to produce only finite outputs. 
-This concept is typically captured by the notion of \emph{safety}: a query is safe if it is guaranteed to return a finite set of results.
-This property is derived iteratively: If a relation is finite, we know that its attributes have a finite domain and call the attributes safe.
-If all of the key attributes of a relation (even an infinite one) are safe, then only a finite number of records in the relation can possibly participate in a join and we can call the relation and all of its attributes safe.
-A query is safe when all of its relations are safe.
-The rewrite $\rewritematcher{\var}{\matcher}$ guarantees safety if: (i) $\var$ is safe, and (ii) any attributes referenced by expressions in $\matcher$ are safe.
-