PRF collision search for input smaller than output

fgrieu

4/19/24, 10:14 AM

Assume a given pseudo-random function $H:\{0,1\}^a\mapsto\{0,1\}^b$ with $b\in[104,256]$ and $b/2<a<b$. We want to exhibit a collision if there is one, which has probability $>63\%$.

We are ready to perform $2^{b/2+1}$ evaluations of $H$ or slightly more, distributed among several search units. But we don't have $2^{b/2}b$ bits of memory, especially accessible by each search unit. How can we practically organize a search?

Because $a<b$, I fail to adapt techniques based on iterating $H$ and cycle detection, like in Paul C. van Oorschot and Michael J. Wiener, Parallel Collision Search with Cryptanalytic Applications, in Journal of Cryptology, 1999.

Update: Samuel Neves points in comment that the problem has been studied, and can make use of §4.2 in the above reference. Still that warrants explanation: how that's done, expected cost as a function of $a$, $b$ and other parameters, like available RAM and number of search units; and reasonable parameterization.

171

1 + 1

pseudo-random-function

collision-attack

Samuel Neves

4/21/24, 9:39 AM

See Appendix A of the [Equihash paper](https://eprint.iacr.org/2015/946) or Appendix B of [Dinur](https://eprint.iacr.org/2018/575). This problem is reducible to the "golden collision" problem from [Oorschot-Wiener](https://people.scs.carleton.ca/~paulv/papers/JoC97.pdf) Section 4.2.

Score:2

Crypto

Daniel S

4/29/24, 7:01 PM

It's possible to go down some deep rabbit holes involving rainbow tables on this one, so I'll give a broad brush overview and hope that you will prod me with questions.

As noted in the comments, the basic idea is to apply a conversion $H'=f\circ H$ to make this a $\{0,1\}^a\top\{0,1\}^a$ problem. If $f:\{0,1\}^b\to\{0,1\}^a$ is relatively even then each $a$ bit output has $2^{b-a}$ corresponding inputs, and a collision $H'(x_1)=H'(x_2)$ has $f\circ H(x_1)=f\circ H(x_2)$ and $H(x_1)=H(x_2)$ with probability $2^{a-b}$. Thus if we generate $2^{b-a}$ collisions on $H'$ (or on a family of $H'$ functions), then we expect to encounter a collision on $H$.

One neat way to generate $f$ with uniform outputs is to use a linear function. We can generate a random binary $a\times b$ matrix $M$ with rank $a$ and define $f$ to be the map $\mathbf t\mapsto M\mathbf t$. Note that gives a large family of possible $H'$ functions. This family includes the obvious truncation function and all of the "selection of a subsets of $a$ out of $b$ bits" functions.

The simplest case is the single processor $O(1)$ memory version, where we use Pollard rho or a basic Pollard lambda/kangaroo to generate a single collision in $H'$ using $\approx \sqrt\pi2^{a/2-1/2})$ evaluations of $H'$. On failure, we can try another $H'$ and our expected work is $2^{b-a}$ versions of Pollard each of which takes $\approx\sqrt\pi 2^{a/2-1/2})$ evaluations of $H'$ for total work $\approx \sqrt\pi 2^{b-a/2-1/2}$ function evaluations.

With more memory we can produce more collisions per $H'$ with less work (with increasing memory, we can perhaps produce $k$ collisions with $O(\sqrt k2^{a/2})$ work). In this case we can probably find a collision in $2^{b-a}/k$ versions of your $H'$ search for total work $2^{b-a/2}/\sqrt k$ evaluations of $H'$. In particular, following the (flawed) heuristic analysis of section 4.2 of the van Oorschot and Wiener paper, using $O(w)$ memory we require $2^{b-a}$ collisions which might be producible using $\approx \sqrt{8\times2^a/w}$ function evaluations for work $\approx 2^{b-a/2+3/2}/\sqrt w$.

Note that the "rigorous optimisation" of section 4.2 is an empirical observation rather than a mathematical derivation.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: PRF collision search for input smaller than output

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.