Score:0

Does google's Shattered paper(The first collision for full SHA-1) mean creating a new file with the same hash as the original file?

es flag

I have a source data A and a hash H(A) of this A. Is it possible by google's shattered docs to create a new data B that outputs this H(A)?

Method1,2

++ I understood that the content of the paper is to complement two message blocks so that the final CV is the same. The method I thought of is whether M2(2) of "Dummy File" can be properly found so that CV2 of "Original File" and "DummyFile" are output the same as in Method1 in the photo, or make only CV2 the same as in Method2 through find new M1(2), M2(2).

A 64-byte message block is meaningless data. Therefore, the file does not have to open normally.

The format of the file is not limited to PDF.

kelalaka avatar
in flag
Does this answer your question? [Does "Shattered" actually show SHA-1-signed certificates are "unsafe"?](https://crypto.stackexchange.com/questions/60640/does-shattered-actually-show-sha-1-signed-certificates-are-unsafe) It is a collision attack, what you described is a secondary pre-image attack.
Daniel S avatar
ru flag
No. What you are describing would be a [second pre-image attack](https://en.wikipedia.org/wiki/Preimage_attack) attack which is much harder than a basic collision attack where both data sources are under the control of the attacker. AFAIK SHA1 second pre-image attack will take roughly $2^{160}$ hash evaluations, which is thoroughly infeasible.
es flag
@DanielS I added a little more content and uploaded it. I would appreciate it if you could answer my question.
kelalaka avatar
in flag
Could you read the dupe or https://shattered.io/ ? The PDF files have some flexible parts that can carry arbitrary data without breaking the format. All is explained in the paper. too.
Daniel S avatar
ru flag
I disagree that this is a dupe to the certificates question. None of the answers to the certificates question mention anything about the distinction between a collision and a second preimage attack. I have voted to keep this open.
kelalaka avatar
in flag
@DanielS The certificate needs hash, so it is over the dupe. The answer of Squeamish Ossifrage is already covered that it collision attack, I think that we don't need an answer to teach the difference of second pre-image and collision attack. here another [The tricky issue](https://crypto.stackexchange.com/q/48289/18298), [another](https://crypto.stackexchange.com/a/16587/18298), [another](https://crypto.stackexchange.com/q/44502/18298)
Score:3
ru flag

No. What you described would be a second pre-image attack, whereas the google team produced a collision attack. In the SHAttered "The first full collision on SHA1" paper by Mark Stevens et al. where those states are quoted in your method 1, the authors were first able to simultaneously select $M_1^{(1)}$ and $M_1^{(2)}$ making slight perturbations to one or the other until a close pair of values $CV_1^{(1)}$ and $CV_1^{(2)}$ were produced. In your scenario $M_1^{(1)}$ would be fixed and only $M_1^{(2)}$ could be adjusted making it much harder to find close $CV_1$ values. Likewise, in the second step the researchers were able to perturb both $M_2^{(1)}$ and $M_2^{(2)}$ to find the full collision, but in your scenario an attacker would only be able to adjust $M_2^{(2)}$.

The difference in difficulty of these challenge is such that finding a second pre-image for SHA1 is still likely to take roughly $2^{160}$ hash function evaluations. It is akin to the difference between the probability of finding two people in the room with same birthday and the probability of finding someone in a room with the same birthday as you.

More recent work ("SHA1 is a shambles" by Leurent and Peyrin) shows that it is possible for an attacker to take an arbitrary file and append data that they control both to your file and theirs in such a way as to produce the same SHA1 value for both files. This is known as a chosen prefix-collision and is still stronger evidence that SHA1 needs to be deprecated.

NB: Your table of byte values labels both left and right as $M_1^{(2)}$ and I've reverted to the notation of the SHAttered paper. I'm not sure of the source of your second table.

fgrieu avatar
ng flag
<[Dr. Spock](https://en.wikipedia.org/wiki/Spock) mode>This answer's "No" is making untold assumptions on A that are not explicit in the question. That's proven by this [A](https://shattered.io/static/shattered-1.pdf) and the matching [B](https://shattered.io/static/shattered-2.pdf). </> Exactly how much assumptions must be made to make this "No" correct is not obvious, and that's how I've chosen to read the question.
Daniel S avatar
ru flag
@fgrieu Possibly you're right. For my part, I felt that the generic second pre-image problem was implicit in the words "new" and "original". In any event I hope that between us we have managed to say something that is useful to the questioner.
Score:2
ng flag

The Shattered attack finds a different file/message B that has the same SHA-1 hash as a file/message A, but only if some section of the data in A has a characteristic that won't happen by chance. For most file formats/data semantic, that still allows attack by adversaries that can't alter the meaning/appearance/effect of A, but can slightly influence the binary content of A.


I have a source data A and a hash H(A) of this A. Is it possible by google's shattered docs to create a new data B that outputs this H(A)?

The answer depends on the data A, which itself depends on how A was obtained or produced, which the question does not state or allow to guess.

  1. Yes, without computational effort, if A starts with one of two particular 320-byte values that the Shattered attack gave. We can just replace one with the other to form B with the same SHA-1 hash. This can (among other things) be used to effortlessly make two different valid PDF files with slightly different byte content and entirely different visual appearance.
  2. No, if A is less than about 125 bytes, when sticking to the method in the Shattered paper, even with their computational effort. But we must lower that limit to about 19 bytes if we change the method and accept an increase of the computational effort by a modest factor (about 250 thousands) to about $2^{81}$ SHA-1 hashes.
  3. Yes, with the computational effort of Shattered, if an adversary can choose the first 128 bytes of A. That's still the Shattered attack, but it requires sizable work. We can lower that limit to 64 bytes is we accept the increase in computational effort of the previous point.
  4. Yes, as an extension of 3, if the adversary knows the first $n$ bytes of A and can choose the next $128+(-n\bmod 64)$ bytes. And we must lower that to $64+(-n\bmod 64)$ bytes with the increased computational effort. This can be further extended to the adversary choosing about 20 bytes worth of information in any the first $64\,f$ bytes of A with knowledge of that section of A, and a further increase in work by a factor no larger than $f$.
  5. Yes, as a consequence of 4, if A is prepared in a way under control of an adversary, e.g. if A is a PDF document, PNG image, CD/DVD "ISO" image, perhaps executable file prepared using tools crafted by an adversary. For digital certificates, see this question.
  6. No, if A passes the test provided by the Shattered authors and we stick to their attack method. But their test can't guard against other attacks of comparable cost, and no test can guard against attacks with modestly increased cost.
  7. No, if there is at least say 64 bits of entropy in the first 64 bytes of A that are unknown to an adversary at a time when the adversary can influence A, for any feasible computational effort. That includes random A, and digital certificates issued by certification authorities taking the simple precaution of using randomized serial numbers at start of the certificate.

I want to make sure that the format of the file is not limited to PDF.

The Shattered attack and extension are not limited to PDF, see 3/4/5. It can be carried (with some work) with any file format without some internal redundancy check; that is, most. Further, prefixes analogs to 1 (which is specialized for inexpensive forgery of PDF) can be devised for many formats including JPEG, PNG, GIF, MP4, JPEG2000, Portable Excutable format, and many more, with sizable but feasible work to be done only once. The prefixes of 1 even work with some existing file formats, e.g. audio and video for players that skip over what they do not recognize (since the question does not ask that the different A and B behave differently; that would be hard to achieve with the prefixes of 1 for something not PDF, and a standard player not crafted for the purpose).

If in doubt: better safe than sorry, assume that yes the attack is possible, and use an unbroken and significantly wider hash than SHA-1


the (principle) of the paper is to complement two message blocks

Not exactly. It's complemented some few bits in two consecutive message blocks (64 byte each), which content is chosen (with sizable but feasible work) as a function of the hash state before processing these two message block. Therefore the content of these two messages blocks in A is not arbitrary, it must match these painfully computed 128 bytes. Such match won't happen by accident (e.g. for random A), it requires some control on A, and knowledge of the state of the hash before hashing these two blocks. Such knowledge can be obtained by knowing the part of message A before these two blocks of A one must choose to make the attack work.

An attack that works for arbitrary A would be a (second) preimage attack. Such attack is not known for SHA-1.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.