Score:2

Are SHA-1 hash collisions harder to find when files are big?

in flag

Was that just a coincidence, or did the 2 "shattered" pdf files whose SHA-1 hashes are identical had to be small in size (412KB) to make the collision attack easier?

kelalaka avatar
in flag
Related: [SHA-1 collision for strings of two symbols](https://crypto.stackexchange.com/q/66584/18298)
Score:3
ru flag

No, in fact it was demonstration of how powerful the attack was. Larger file sizes would give more input bits that could be manipulated and controlled by the attacker. The size of the files would be trivial to increase with a length extension attack.

Subsequent work in the SHAmbles attack was even more powerful in constructing collisions between data with arbitrary start values, which could be arbitrarily large amounts of different data for the different files.

kelalaka avatar
in flag
It is not actually about larger files, it is about the degree of freedomness in the file format.
oqdn avatar
in flag
But since large files require more time for hashing, wouldn't this rise the difficulty of finding collisions?
Daniel S avatar
ru flag
Large files will linearly increase the cost of a single hash function evaluation, but the control of more bits gives the attacker the potential to reduce the number of overall evaluations needed.
Score:1
cn flag

It is definitely easier to find short collisions. But 412 KiB isn't small, it's extremely large in this context.

The two files differ only in bytes 0xC0 through 0x13F (192 through 319), so it is more accurate to say that they found a 320-byte collision, or a 128-byte collision, depending on how you look at it.

The two 320-byte files not only have the same SHA-1 digest, but leave the SHA-1 algorithm in the same internal state. As a result, you can append a suffix of any length to those two prefixes and the two resulting files will also have equal SHA-1 digests.

It's possible, if you're careful, to come up with a suffix that results in two valid PDF files that display two different JPEG images, which is what they did. This is totally separate from the process of finding the colliding prefixes. If you didn't know how to break SHA-1, and were given these two prefixes by a NSA mole, you could still construct visibly different PDF files from them, just by using information from the publicly available PDF and JPEG standards.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.