Score:2

Crypto

Are SHA-1 hash collisions harder to find when files are big?

oqdn

7/12/23, 10:37 AM

Was that just a coincidence, or did the 2 "shattered" pdf files whose SHA-1 hashes are identical had to be small in size (412KB) to make the collision attack easier?

100

0 + 0

sha-1

collision-resistance

kelalaka

7/12/23, 2:35 PM

Related: [SHA-1 collision for strings of two symbols](https://crypto.stackexchange.com/q/66584/18298)

Score:3

Crypto

Daniel S

7/12/23, 12:47 PM

No, in fact it was demonstration of how powerful the attack was. Larger file sizes would give more input bits that could be manipulated and controlled by the attacker. The size of the files would be trivial to increase with a length extension attack.

Subsequent work in the SHAmbles attack was even more powerful in constructing collisions between data with arbitrary start values, which could be arbitrarily large amounts of different data for the different files.

0 + 0

kelalaka

7/12/23, 2:10 PM

It is not actually about larger files, it is about the degree of freedomness in the file format.

oqdn

7/14/23, 6:39 PM

But since large files require more time for hashing, wouldn't this rise the difficulty of finding collisions?

Daniel S

7/15/23, 2:10 PM

Large files will linearly increase the cost of a single hash function evaluation, but the control of more bits gives the attacker the potential to reduce the number of overall evaluations needed.

Score:1

Crypto

benrg

7/14/23, 2:49 AM

It is definitely easier to find short collisions. But 412 KiB isn't small, it's extremely large in this context.

The two files differ only in bytes 0xC0 through 0x13F (192 through 319), so it is more accurate to say that they found a 320-byte collision, or a 128-byte collision, depending on how you look at it.

The two 320-byte files not only have the same SHA-1 digest, but leave the SHA-1 algorithm in the same internal state. As a result, you can append a suffix of any length to those two prefixes and the two resulting files will also have equal SHA-1 digests.

It's possible, if you're careful, to come up with a suffix that results in two valid PDF files that display two different JPEG images, which is what they did. This is totally separate from the process of finding the colliding prefixes. If you didn't know how to break SHA-1, and were given these two prefixes by a NSA mole, you could still construct visibly different PDF files from them, just by using information from the publicly available PDF and JPEG standards.

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Are SHA-1 hash collisions harder to find when files are big?

TH: การชนกันของแฮช SHA-1 หาได้ยากขึ้นเมื่อไฟล์มีขนาดใหญ่หรือไม่

RO: Sunt coliziunile hash SHA-1 mai greu de găsit atunci când fișierele sunt mari?

RU: Сложно ли найти коллизии хэшей SHA-1, когда файлы большие?

VI: Các va chạm băm SHA-1 có khó tìm hơn khi các tệp lớn không?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.