Score:1

Hash function collision importance

ng flag

Suppose a collision has been found in a certain hash function, such that H(x1) = H(x2)

However, x1 and x2 are both a seemingly 'random' collection of bits which do not convey a coherent message, and cannot be interperted in a coherent way.

Does this collision make the hash function H not secure? if so how can it be exploited, even if the known collision doesn't convey a coherent message? thanks

kelalaka avatar
in flag
Have you ever heard the [SHA1 shattered](https://shattered.io/) or [corkami of MD5](https://github.com/corkami/collisions)
kelalaka avatar
in flag
Some real examples: [1) What are other good attack examples that use the hash collision?](https://crypto.stackexchange.com/q/87104/18298) $\\$ [2) Does it matter if I publish only publish good or bad MD5 hashes after recovering from a hack?](https://crypto.stackexchange.com/q/70056/18298)
Score:4
us flag

I prefer to think of cryptography as infrastructure. We should strive to develop infrastructure that minimizes the number of usage caveats. It's not cryptography's job to define what are "meaningful" messages in your application. Can you look at a specific collision $m_1, m_2$ and say with certainty that no application of a hash function will ever assign meaning to these $m_1$ and $m_2$?

Which hash function would you rather use, one with a guarantee "it's hard to find any collisions" or one with a guarantee "it's hard to find a collision, except sometimes among strings which are JPGs of gerbils and gzip files of Shakespeare"? I would not want to drive a car with a warning sticker that said "car may explode if you are driving 88.1 mph with the left turn signal on and the radio tuned to 88.1 FM", even if that warning was correctly narrow and I would never do those 3 things at the same time.

That's why cryptographic security definitions consider a collision to be any two messages that satisfy $H(x_1) = H(x_2)$, a signature forgery on any message is an attack, encryptions of any plaintext should look indistinguishable, etc. If you want your cryptography to be used, make sure you strive for a security guarantee that puts everyone at ease.

A second practical reason to be concerned about an "unstructured" collision in $H$, is that when one is found, it is often a matter of time before the techniques are extended to find "structured" collisions. For example, finding structured collisions in a random function (using the classic Yuval collision attack) is approximately the same difficulty as finding unstructured collisions (using standard brute force and birthday bound).

Score:0
in flag

H would be unsecure iff when given x1 it would be possible to algorithmically derive a different x2 with the same hash. As the the hash space is much smaller than the data space there will always be potential collisions, the question is can we find them.

us flag
Sounds like a definition of second-preimage resistance, but not collision resistance?
Score:0
ru flag

Even if the collision is completely unstructured and arose from two completely independent evaluations, it would still be a significant concern for a modern hash function. In cryptographic constructions, we regularly assume that the output of a hash function is distributed uniformly at random. For hash functions such as SHA256 there is no proof for this, and we do not know for example even if all outputs are possible.

Now consider that even for SHA256 (probably the most evaluated hash function of all time), I'd still estimate that fewer than $2^{90}$ outputs have been calculated (and most of those values discarded). For a hash function that produces 256-bits outputs uniformly at random to produce a collision in $2^{90}$ outputs is about a $2^{-77}$ event. We would have to conclude that either SHA256 is does not behave as we hoped or that we are incredibly unlucky (and no-one is that unlucky).

Such an event, if it occurred, should be taken as evidence that SHA256 has considerably less collision entropy than we have been assuming and that with significantly less computational power than we had been assuming, people would be able to collide interesting data values simply by appending random blocks. In such a hypothetical event, it should be taken as a sign that something is badly wrong with the hash function but that we don't know why.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.