\AR{Aaron: Please re-write this section more generally. I.e. instead of assuming $h_i$ and $s_i$ are specifically defined as linear functions, define them generally: i.e. keep $h_i:W\to[B]$ and $s_i:W\to\{-1,1\}$ as generic function but abstract out the properties we want from them-- i.e. (1) $h_i$ is pair-wise independent, (2) $s_i$ is $4$-wise independent and (3) given any $\buck\in\{0,1\}^b$, we want to be able to compute the following quantity in $\mathrm{poly}(N)$ time (or an approximation of it):
From my discussion with the folks here at the workshop the requirement (3) seems to be new for $k$-wise independent hash functions and we should highlight this definition too. Once things have been defined this way, you can state the definition of $h_i$ as you have stated below. But in the next section, it would be good to state the algorithm only in terms of these more general properties of the hash functions. Once you have made this change, I can make a more careful pass over this section and the next.}
Starting with the latter term $\gIJ=\sum\limits_{\wVecPrime\in\pw}\polP{\wVecPrime}$, by the definition of the image of $\pol$ and the property of associativity in addition, we can break the sum into
\polP{\wVecPrime} = 0}} 1$ and $T_2 = \sum\limits_{\substack{\wVecPrime\in\pw\st\\
\polP{\wVecPrime} = 1}} -1$ and fixing $\buck\in\{0,1\}^\lenB$(with $\lenB = \log\sketchCols$) to a specific value, gives a system of linear equations for each term. It is a known result given a consistent matrix multiplication that the number of solutions are $| \kDom |^{\numTup - rank(\matrixH')}$, where $\kDom$ is the set being considered. For $\kDom = \mathbb{B}$ this gives us an exact calculation for both terms,
where the notation $\jpbit{y}$ denotes the polarity bit $\lenB$ value of the $\buck$ bucket identifier, specifically $\buck(b)$, such that $\buck(b)\in\{0, 1\}$. For each bucket $\buck$, we therefore want to compute the following quantitity $\mathrm{poly}(N)$ time, or an approximation thereof:
Setting $T_3=\sum\limits_{\wVec\in\pw\st\polP{\wVec}=0}\kMapParam{\wVec}$, $T_4=\sum\limits_{\wVec\in\pw\st\polP{\wVec}=1}\kMapParam{\wVec}$ gives an exact calculation for each term given a fixed $\buck$:
As with world identification, bucket identification can be viewed as a binary vector. As detailed above, this vector is of length $\lenB$. In a similar manner, we can define a set of hash vectors $\matrixH$ as a matrix of $\lenB$ precomputed vectors $\hVec$ where each $\hVec\in\{0, 1\}^\numTup$, formally