Category Theory for DB-ies

Category Theory for DB-ies

(The Math of "Things" and "Stuff")

People always say that SQL is "Declarative"

Why is the following replacement ok?


            SELECT * FROM R WHERE A = 1 AND B = 2
          

            SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
          

(and how is it the same as computing $x + 1 + 2$?)

Addition is Cool

Commutative $$a + b = b + a$$

Associative $$(a + b) + c = a + (b + c)$$

Neutral Element $$0 + a = a$$

Addition is Cool


            public int add(int a, int b) { ... }

            add(a, b)         == add(b, a)

            add(add(a, b), c) == add(a, add(b, c))

            add(a, 0)         == a
          

Math is about Patterns

CommutativityAssociativityNeutral Element
$a + b = b + a$ $(a + b) + c = a + (b + c)$ 0
$a \texttt{ AND } b = b \texttt{ AND } a$ $(a \texttt{ AND } b) \texttt{ AND } c = a \texttt{ AND } (b \texttt{ AND } c)$ TRUE
$a \texttt{ OR } b = b \texttt{ OR } a$ $(a \texttt{ OR } b) \texttt{ OR } c = a \texttt{ OR } (b \texttt{ OR } c)$ FALSE
$A \cup B = B \cup A$ $(A \cup B) \cup C = A \cup (B \cup C)$ $\emptyset$
$min(A, B) = min(B, A)$ $min(min(A, B), C) = min(A, min(B, C))$ $\infty$

Mathematicians call this a Group


            interface Group<K> {
              public K getZero() { ... }

              public K add(K a, K b) { ... }
            }
          

$$\left<\; K,\; +,\; 0\;\right>$$

$$\left<\; \mathbb R,\; +,\; 0\;\right>$$ $$\left<\; \mathbb N,\; +,\; 0\;\right>$$ $$\left<\; \mathbb B,\; \texttt{AND},\; \texttt{TRUE}\;\right>$$ $$\left<\; \mathbb B,\; \texttt{OR},\; \texttt{FALSE}\;\right>$$ $$\left<\; Set,\; \cup,\; \emptyset\;\right>$$ $$\left<\; Multiset,\; \uplus,\; \emptyset\;\right>$$ $$\left<\; \mathbb Z^{-\infty},\; max,\; -\infty;\right>$$

Fun With Groups

$$\left<\; Set,\; \cup,\; \emptyset\;\right>$$ $$\left<\; \mathbb N,\; max,\; -\infty\;\right>$$


    class SetGroup implements Group<Set<int>> {
      // Empty Set union with anything is a no-op
      public Set<int> getZero() { return Collections.emptySet(); }

      // Compute Set Union
      public Set<int> add(Set<int> a, Set<int> b) { 
        Set<int> tmp = new HashSet<int>(a);
        tmp.addAll(b);
        return tmp;
      }
    }
          

    class IntMaxGroup implements Group<int> {
      // Max(x, -infinity) = x
      public int getZero() { return Integer.MIN_VALUE; }

      // Compute Max of two elements
      public int add(int a, int b) { return Integer.max(a, b) }
    }
          

            //Compute the maxium value in a Set
            public int SetMax(Set<int> set) { 
              int ret = Integer.MIN_VALUE
              for(int i : set) {
                ret = Integer.max(i, ret);
              }
              return ret;
            }
          

$$\texttt{SetMax} : SetGroup \rightarrow IntGroup$$

There's something really cool about SetMax

$$\texttt{SetMax}(A + B) \equiv \texttt{SetMax}(A) + \texttt{SetMax}(B)$$

Mathematicians call this a homomorphism

Homo
"Same"
Morphism
"Function"

So what?


            SELECT MAX(x) FROM (SELECT * FROM R UNION S)
          

            MAX( (SELECT MAX(x) FROM R), (SELECT MAX(x) FROM S) )
          

Without Homomorphisms

(OpenClipArt.org)

With Homomorphisms

(OpenClipArt.org)

Let's dig a little deeper

Multiplication is also Cool!

Commutative $$a \times b = b \times a$$

Associative $$(a \times b) \times c = a \times (b \times c)$$

Neutral Element $$1 \times a = a$$

Combine for epic coolness!

Distributive $$a \times (b + c) = a \times b + a \times c$$

Sphere of Annihilation $$0 \times a = 0$$

More Patterns!

$K$ $+$ $\times$ $0$ $1$
$\mathbb N$ $+$ $\times$ $0$ $1$ Natural Number Arithmetic
$\mathbb B$ $\vee$$\wedge$ F T Boolean Algebra
Tables $\cup$$\bowtie$$\emptyset$$\left<\right>$ SQL

Mathematicians call this a Semiring

$Students$|S_IDName
|111 Alice
|222 Bob
|333 Carol
$Courses$|S_IDCourse
|111 CSE-562
|111 CSE-521
|222 CSE-562

How many courses is each student taking?

$\pi_{S\_ID}(Courses)$|S_ID
|111
|111
|222
SELECT S_ID FROM COURSES

How many courses is each student taking?

$\pi_{S\_ID}(Courses)$|S_ID #
|111 $\rightarrow$2
|222 $\rightarrow$1
|* $\rightarrow$0
SELECT S_ID, COUNT(*) FROM COURSES GROUP BY S_ID

How many courses is each student taking?

$\pi_{Name}(Courses \bowtie Students)$|S_ID #
|111 $\rightarrow$2
|222 $\rightarrow$1
|* $\rightarrow$0
SELECT Name, COUNT(*) FROM COURSES GROUP BY S_ID
$$\left<\mathbb B, \vee, \wedge, F, T\right>$$ Set Databases (SELECT DISTINCT)
$$\left<\mathbb N, +, \times, 0, 1\right>$$ Multiset Databases (Normal SQL)

Other Applications: Provenance, Permissions, Differential Privacy

Database Operation Semiring Operation
Union $+$
Join $\times$
Aggregation $+$

                    SELECT * FROM R WHERE A = 1
          

$$\sigma_{A = 1}(t \rightarrow \#) = \begin{cases} \# & \text{if }t.A = 1 \\ 0 & \text{otherwise}\end{cases}$$


        SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
          

$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \sigma_{A = 1}(t \rightarrow \#) & \text{if }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$


        SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
          

$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{if }t.A = 1\text{ and }t.B \neq 2 \\ 0 & \text{otherwise}\end{cases}$$


        SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
          

$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$


                SELECT * FROM R WHERE A = 1 AND B = 2
          

$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$

Other Resources

How to Bake Pi
Category Theory for Programmers
https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/