People always say that SQL is "Declarative"
SELECT * FROM R WHERE A = 1 AND B = 2
SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
(and how is it the same as computing $x + 1 + 2$?)
Commutative $$a + b = b + a$$
Associative $$(a + b) + c = a + (b + c)$$
Neutral Element $$0 + a = a$$
public int add(int a, int b) { ... }
add(a, b) == add(b, a)
add(add(a, b), c) == add(a, add(b, c))
add(a, 0) == a
Commutativity | Associativity | Neutral Element |
---|---|---|
$a + b = b + a$ | $(a + b) + c = a + (b + c)$ | 0 |
$a \texttt{ AND } b = b \texttt{ AND } a$ | $(a \texttt{ AND } b) \texttt{ AND } c = a \texttt{ AND } (b \texttt{ AND } c)$ | TRUE |
$a \texttt{ OR } b = b \texttt{ OR } a$ | $(a \texttt{ OR } b) \texttt{ OR } c = a \texttt{ OR } (b \texttt{ OR } c)$ | FALSE |
$A \cup B = B \cup A$ | $(A \cup B) \cup C = A \cup (B \cup C)$ | $\emptyset$ |
$min(A, B) = min(B, A)$ | $min(min(A, B), C) = min(A, min(B, C))$ | $\infty$ |
Mathematicians call this a Group
interface Group<K> {
public K getZero() { ... }
public K add(K a, K b) { ... }
}
$$\left<\; K,\; +,\; 0\;\right>$$
$$\left<\; Set,\; \cup,\; \emptyset\;\right>$$ $$\left<\; \mathbb N,\; max,\; -\infty\;\right>$$
class SetGroup implements Group<Set<int>> {
// Empty Set union with anything is a no-op
public Set<int> getZero() { return Collections.emptySet(); }
// Compute Set Union
public Set<int> add(Set<int> a, Set<int> b) {
Set<int> tmp = new HashSet<int>(a);
tmp.addAll(b);
return tmp;
}
}
class IntMaxGroup implements Group<int> {
// Max(x, -infinity) = x
public int getZero() { return Integer.MIN_VALUE; }
// Compute Max of two elements
public int add(int a, int b) { return Integer.max(a, b) }
}
//Compute the maxium value in a Set
public int SetMax(Set<int> set) {
int ret = Integer.MIN_VALUE
for(int i : set) {
ret = Integer.max(i, ret);
}
return ret;
}
$$\texttt{SetMax} : SetGroup \rightarrow IntGroup$$
There's something really cool about SetMax
Mathematicians call this a homomorphism
SELECT MAX(x) FROM (SELECT * FROM R UNION S)
MAX( (SELECT MAX(x) FROM R), (SELECT MAX(x) FROM S) )
Let's dig a little deeper
Commutative $$a \times b = b \times a$$
Associative $$(a \times b) \times c = a \times (b \times c)$$
Neutral Element $$1 \times a = a$$
Distributive $$a \times (b + c) = a \times b + a \times c$$
Sphere of Annihilation $$0 \times a = 0$$
$K$ | $+$ | $\times$ | $0$ | $1$ | |
---|---|---|---|---|---|
$\mathbb N$ | $+$ | $\times$ | $0$ | $1$ | Natural Number Arithmetic |
$\mathbb B$ | $\vee$ | $\wedge$ | F | T | Boolean Algebra |
Tables | $\cup$ | $\bowtie$ | $\emptyset$ | $\left<\right>$ | SQL |
Mathematicians call this a Semiring
|
|
How many courses is each student taking?
$\pi_{S\_ID}(Courses)$ | | | S_ID |
---|---|---|
| | 111 | |
| | 111 | |
| | 222 |
SELECT S_ID FROM COURSES
How many courses is each student taking?
$\pi_{S\_ID}(Courses)$ | | | S_ID | # | |
---|---|---|---|---|
| | 111 | $\rightarrow$ | 2 | |
| | 222 | $\rightarrow$ | 1 | |
| | * | $\rightarrow$ | 0 |
SELECT S_ID, COUNT(*) FROM COURSES GROUP BY S_ID
How many courses is each student taking?
$\pi_{Name}(Courses \bowtie Students)$ | | | S_ID | # | |
---|---|---|---|---|
| | 111 | $\rightarrow$ | 2 | |
| | 222 | $\rightarrow$ | 1 | |
| | * | $\rightarrow$ | 0 |
SELECT Name, COUNT(*) FROM COURSES GROUP BY S_ID
$$\left<\mathbb B, \vee, \wedge, F, T\right>$$ | Set Databases (SELECT DISTINCT) |
$$\left<\mathbb N, +, \times, 0, 1\right>$$ | Multiset Databases (Normal SQL) |
Other Applications: Provenance, Permissions, Differential Privacy
Database Operation | Semiring Operation |
---|---|
Union | $+$ |
Join | $\times$ |
Aggregation | $+$ |
SELECT * FROM R WHERE A = 1
$$\sigma_{A = 1}(t \rightarrow \#) = \begin{cases} \# & \text{if }t.A = 1 \\ 0 & \text{otherwise}\end{cases}$$
SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \sigma_{A = 1}(t \rightarrow \#) & \text{if }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$
SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{if }t.A = 1\text{ and }t.B \neq 2 \\ 0 & \text{otherwise}\end{cases}$$
SELECT * FROM (SELECT * FROM R WHERE A = 1) WHERE B = 2
$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$
SELECT * FROM R WHERE A = 1 AND B = 2
$$\sigma_{B = 1}(\sigma_{A = 1}(t \rightarrow \#)) = \begin{cases} \# & \text{if }t.A = 1\text{ and }t.B = 2 \\ 0 & \text{otherwise}\end{cases}$$