Score:0

Finding collisions of polynomial rolling hashes

ru flag

A polynomial hash defines a hash as $H = c_1a^{k-1} + c_2a^{k-2} ... + c_ka^0$, all modulo $2^n$ (that is, in $GF(2^n)$).

For brevity, let $c$ be a $k$ dimensional vector (encapsulating all the individual $c_n$ values).

Are there particular values for $c$ that make the probability of collisions between two randomly chosen $a$ greater than $k/2^n$?

I would argue that there are not. For $H(c, a)$ equals evaluating a polynomial (of degree at most $k$) at $a$. Thus, $H_c(x)$ defines a polynomial with degree at most $k$. Let $H_{c,a}(x) = H_c(x) - H_c(a)$; the zeroes of $H_{c,a}$ are precisely the points where $H_c(x) = H_c(a)$, and there can be no more than $k$ such zeroes. Thus, for any non-zero $c$, two randomly chosen $a,a'$ have $H(a) = H(a')$ with probability $\le k/2^n $.

However, this crypto CTF challenge seems to argue certain $c$ do produce collisions, and this solution explains and demonstrates it (unfortunately most of the explanation is in Chinese). Where is my mistake?

kodlu avatar
sa flag
Modulo $2^n$ is operations in the ring $\mathbb{Z}_{2^n}$ which is NOT the same as $GF(2^n)$.
Score:2
sa flag

This ring has zero divisors so the answer is different than over fields.

Let $H(a)-H(a')=c_1 a^{k-1}+c_2 a^{k-2}+\cdots+c_k,$ and let $j$ be the largest nonnegative integer such that $2^j$ divides $gcd(c_1,\ldots,c_k).$

Claim: Let $j$ be as above, then the polynomial $H(a)-H(a')$ can have $k\times 2^{j}$ roots, leading to a collision probability of $$\frac{k}{2^{n-j}}.$$

Proof: If the coefficients of the difference polynomial have a gcd divisible by $2^j$ then all values of the polynomial are in the subset (which is an ideal) $$2^j \mathbb{Z}_{2^n}=\{2^j u: u \in \mathbb{Z}_{2^n}\}.$$ This means that the difference polynomial is of the form $2^j g(a)$ for some polynomial with gcd equal to 1. Therefore, it is enough for $g(a)$ to take values in $2^{n-j}\mathbb{Z}_{2^n}$ for $2^j g(a)$ to take on the value zero. This means that each zero of $g(a)$ is duplicated $2^j$ times to be a zero of the difference polynomial so the probability that the difference polynomial takes on the value zero is now $$ \frac{k 2^j}{2^n}=\frac{k}{2^{n-j}}. $$

Example from [Magma Calculator][1] of a degree $k=2$ polynomial, which has 2 roots and one where $j=2,$ which has $k 2^j=8$ roots.

code:

Z2to6:=IntegerRing(2^6); Z2to6;
R<a>:=PolynomialRing(Z2to6); R;
{* Z2to6!(a^2+63*a): a in Z2to6 *};
{* Z2to6!(4*(a^2+63*a)): a in Z2to6 *};}

output:

Univariate Polynomial Ring in a over IntegerRing(64)
{* 0^^2, 2^^2, 4^^2, 6^^2, 8^^2, 10^^2, 12^^2, 14^^2, 16^^2, 18^^2, 20^^2,
22^^2, 24^^2, 26^^2, 28^^2, 30^^2, 32^^2, 34^^2, 36^^2, 38^^2, 40^^2, 42^^2,
44^^2, 46^^2, 48^^2, 50^^2, 52^^2, 54^^2, 56^^2, 58^^2, 60^^2, 62^^2 *}`
{* 0^^8, 8^^8, 16^^8, 24^^8, 32^^8, 40^^8, 48^^8, 56^^8 *}```

The second polynomial $4(a^2+63a)$ has a gcd of 4 thus it has 8 roots not 2.

The magma list notation 0^^8 means the element 0 appears 8 times in the list.




  [1]: http://magma.maths.usyd.edu.au/calc/
ru flag
Fascinating. Can you explain the math a bit more. It seems my mistake was confusing the Zn ring for the GF field, and, since the Zn ring has zero divisors, polynomials can have greater degree than I expected. Your equations remind me of gcd / Euclidean algorithm, but it would be very helpful if you could elaborate.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.