Score:0

# Proving pairwise independence of a set of hash functions

A collection of hash functions $$H=\{h:\{0,1\}^n \to \{0,1\}^m\}$$ is pairwise independent if for every $$x_1 \neq x_2 \in \{0,1\}^n$$ and $$y_1, y_2 \in \{0,1\}^m$$:

$$\Pr_{h \leftarrow H}[h(x_1)=y_1 \wedge h(x_2)=y_2]=\frac{1}{2^{2m}}$$

Given a finite field $$\mathbb{F}$$ of size $$2^n$$ I've been able to prove for the set of hash functions: $$\{h_{a,b}: \mathbb{F}\to \mathbb{F}\}_{a,b \in \mathbb{F}}$$ where: $$h_{a,b}=ax+b$$ that it is pairwise independent.

I am now asked to extend this for the case where the domain and range are not the same - - the former has size $$2^n$$ and the latter $$2^m$$.

I thought of the following as the extension, but I'm not sure whether it would work: choose $$a,b \in \mathbb{F}_{2^m}$$. The idea is that I need to have an inverse in the field in order to uniquely solve for $$a,b$$ (e.g. for $$ax_1+b=y_1$$ I want to solve for $$a,b$$ uniquely by finding $$a=(y_1-b)x_1^{-1}$$ and similarly for $$b$$ (a system of two equations mod $$\mathbb{F}_{2^m}$$) so that I get the aforementioned probability). A corollary would mean making sure $$x_1, x_2$$ are in the field, i.e. there are $$a$$ and $$b$$ such that $$x_1=(y_1-b)a^{-1}$$. Since $$y_1 \in \mathbb{F}_{2^m}$$ then it would be sufficient that also $$a,b \in \mathbb{F}_{2^m}$$.

I'm quite unsure whether this is correct. Any advice would be greatly appreciated.

Your definition of pairwise independence says something *for all* $x_1,x_2 \in \{0,1\}^n$. So shouldn't the same condition also hold for all $x_1,x_2$ from a *subset* of $\{0,1\}^n$, too?
@Mikero I'm not entirely sure what you mean. Do you mind clarifying please? As far as my proof goes, $x_1,x_2$ are arbitrary elements in $\{0,1\}^n$.
I'm saying that there is not really anything to do if input length $<$ output length -- i.e., the input space can be viewed as a subset of the output space. In that case, the existing analysis just works.
@Mikero Yes, that's OK, but what about the case where input length > output length? The aforementioned solution should (unless I'm mistaken) capture both cases, no?
No, you need to do something else for input > output. But I'm not sure how much of a hint to give since I'm not sure whether this is homework.
@Mikero It is indeed homework so I would prefer a more instructive hint of how to think about this over a blatant solution, but I think I might have an idea why my current solution fails for the case input > output -- just as I needed $x_i \in \{0,1\}^m$ I also need $x_i$ to be **all** elements in $\{0,1\}^n$, whereas my solution is limited to a subset of them. If that's the case, I'm not sure how to 'expand' this to all of $\{0,1\}^n$.
@Mikero Think I got it for the case: input ($\mathbb{Z}_{2n}$) > output ($\mathbb{Z}_{2m}$): we still select $\{a,b\}$ from input, but the analysis goes as such: If $(a,b)$ is a solution, w.l.g the solution with the smallest $a$ such that it is also an element in $\mathbb{Z}^{2m}$, then there are $2^{n-m}$ additional solutions $(a_i, b)$ ($i \in 2^{n-m}$) - one for each multiple of $2^m$. Since the same goes for holding $a_i$ constant ($(a_i,b_j)$ where $j \in 2^{n-m}$) there are $2^{2(n-m)}$ solutions out of $2^{2n}$ $(a,b)$ pairs, yielding $\frac{2^{2(n-m)}}{2^{2n}}=2^{-2m}$. Correct?
What you describe sounds like $h(x) = ((ax+b) \bmod 2^{m}) \bmod 2^n$. If I've understood correctly then: (1) integers mod $2^n$ are not a field, so the pairwise independence analysis doesn't work (can't multiply by an inverse); (2) this is the same as $a (x \bmod 2^n) + b$ and so if $x, x'$ are congruent mod $2^n$ then they are a collision in $h$ (no matter what $a,b$ are).
I sit in a Tesla and translated this thread with Ai: