Score:4

Why is a too fast hash function not secure?

cn flag

I understand why we need hash functions to be fast enough for processing but slow enough for security. But I do not get why a very fast hash function can cause a collision. My guess is that a very fast hash function produces a small number of bits as output so that means a higher probability of collision. Could someone correct me?

kelalaka avatar
in flag
Where did you get "a very fast hash function can cause a collision."? What kind of hash function are you talking about? What kind of application do you consider? Do you know that password hash algorithms don't need collision resistance?... What metric do you use for too fast?
Seif Ashraf avatar
cn flag
check this: https://youtu.be/b4b8ktEV4Bg?t=279
kelalaka avatar
in flag
I see, based on the arguments of the video, I [wrote a response to them](https://crypto.stackexchange.com/a/95430/18298)
Score:22
ng flag

In most applications, it's desirable that a hash function is fast, and it would not be a problem if it was ultra fast (say, as fast as input in cache memory can reach a CPU) if that's not at the expense of security. This includes usage in signature, encryption padding, construction of block ciphers.

In application doing key stretching, including access control by password, or deriving a symmetric encryption key from a passphrase, it is desirable to have slow hash functions. More precisely, a parameter should control the duration, and the computational cost per hash should not be much lower for adversaries than it is for legitimate users¹. In such applications, when we manage to slow down the hash by a factor of $c$ for adversaries, we get the equivalent of $\log_2(c)$ extra bits on entropy in the password/passphrase input. For example, when we increase the cost by a thousand (for $c\approx 2^{10}$ ), with respect to password cracking attacks we get the security equivalent of appending three random decimal digits to the password, without the need to remember them.

That speed consideration is largely distinct from resistance to collisions (that is, to the practical impossibility to exhibit two messages with the same hash). For resistance to collisions to hold, it's necessary (not sufficient)

  • That the hash is wide enough, because there are generic, efficiently distributed attacks² that find collisions in a $n$-bit hash with about $2^{n/2+2}$ hashes. Thus any fast hash with a result smaller than 192-bit (give or take 32) is susceptible to collision (for adversaries with means comparable to that used by some bitcoin miners). When we make the hash slower, that increases the computational cost to find a collision. Increasing the cost by a factor $c$ allows to narrow the hash by roughly $2\log_2(c)$ bit from the standpoint of collision resistance (caveat: for very fast hashes that may apply only for $c$ past some small threshold). For example, if instead of SHA-224 we used PBKDF2-HMAC-SHA-256 with 184-bit output and six hundred thousand iterations (for $c\approx 2^{20}$ ), we'd get comparable security against collision resistance.
  • That the hash is not so fast that's by oversimplification of it's internals, enabling different, specific attacks. As an extreme, exclusive-OR of blocks of input as wide as the output is an extremely fast hash, but it's trivially not collision-resistant. There are attacks against MD5, SHA(-0) and SHA-1, arguably because their design put too much emphasis on speed. And there are attacks³ against SHA-256 if we reduce the number of rounds to 31 (down from 64).

¹ The best way we have to achieve this is to make the hash memory-hard, that is it's evaluation should require many memory accesses in a large amount of memory dedicated to that usage during the whole evaluation. Such functions include the modern Argon2, scrypt, and to some degree the obsolescent bcrypt, but not the straight bad PBKDF2 (which is disastrous because it's low RAM footprint gives a huge edge to adversaries with ASICs, FPGAs, or GPUs, compared to most users which use a CPU).

² See Paul C. van Oorschot and Michael J. Wiener's Parallel Collision Search with Cryptanalytic Applications, in Journal of Cryptology, 1999.

³ See Florian Mendel, Tomislav Nad and Martin Schläffer's Improving Local Collisions: New Attacks on Reduced SHA-256, in proceedings of Eurocrypt 2013

za flag
```it would not be a problem if it was ultra fast.``` - [blake3 says hi](https://github.com/BLAKE3-team/BLAKE3/blob/master/media/speed.svg)
kelalaka avatar
in flag
@hanshenrik Blake3 is a parallel hash. A real comparison can only be made with ParallelHash [which is derived from SHA3](https://csrc.nist.gov/projects/hash-functions).
za flag
@kelalaka blake3 supports parallelization yes, but that benchmark is from running blake3 with just a single thread, no parallelization in that benchmark! (if they were using parallelization, blake3 would run even faster than what's displayed in that chart)
Score:9
jp flag

The currently accepted answer is good, but I'd like to add something. "security" is a broad term in this context, and you should ask what threat you want to defend against. In most discussions "collision" is all about finding a different input with an identical output. But for the password case, the "fast hash" is problematic just for plain old brute forcing. With a fast hash, it's easy for an attacker to dictionary attack many possible inputs, looking for a match to the hash.

It's not a case of "fast hash == too few bits of output" as OP asked. It's just that if the hash is too fast, it's easy too find the original input (not a collision), that makes it insecure.

Maarten Bodewes avatar
in flag
I'm not agreeing with this. In e.g. password hashing, all you are interested in is the speed *difference* between the server / intended user and adversary. However, that's fixed by making sure you use a fast implementation and by making sure that there is a significant work factor / iteration count. That, however, has little to do with the speed of the used cryptographically secure hash function. This issue could be addressed by making the difference between cryptographically secure hash function and password hash more explicit.
Score:3
id flag

Given a hash function that is tuneable for "speed", (iterations, perhaps), increasing the speed at which it hashes does not create more collisions. The security issue with a hash that is too fast is that given X amount of time, a hash that is faster will generate a greater quantity of outputs, so an attacker has a higher chance of finding a collision.

Score:2
in flag

The OP got the idea from a YouTube post that argues around the MD5.

Here is the simple argument if anyone watches the video.

The speed doesn't imply that a hash function is insecure, the design makes it secure.

Over the years, the researcher find weaknesses in the design of the MD5 and improved over time. Once the MD5 was released, 1992, the attacks set out, and in 2010 Xie and Dengguo Feng announced the first published single-block (512-bit) MD5 collision. Of course, the improvement of the CPU and their cycles are helping the attack faster, however, the main contribution is the attack methodologies. Consider that, the generic collision attack ( birthday attack) requires $2^{64}$ MD5 hashes with a success probability of 1/2. Even today, no ordinary CPU can make $2^{64}$ in a reasonable time. On the other hand, we have now instant collision for MD5, see corkami.

On the other hand, the Blake2 is a very fast hash function, faster than MD5 and SHA-1, and doesn't have the length extension attack since it uses the HAIFA structure, fixes of MD design. It is faster than MD5 and secure at least in a foreseeable feature. Its output size is secure enough for massive parallel attacks or even for Grover's quantum search algorithm. The Blake series now have BLAKE3, it is a parallel hash and very fast.

What important about speed and its relation to security is the digest size. The generic attacks for pre-image attack, secondary pre-image attack, and the generic collision are $2^n$,$2^n$, and $2^{n/2}$, respectively. If you have the 256 or 512-bit output you are fine about the generic attacks. Keep in mind that the short input space is a problem for pre-image attacks and HMAC is adviced.


PS: password hashing ( no collision resistance is required), where we want controllable speed, memory hardness, and thread amount of the password hash function. One should read the canonical question from our venerable site, information security.. Also, Argon2 paper, the 2015 password hashing competition winner, is a good choice to read.


Final note: One should carefully define their problem so that they can achieve a secure design. There is no real security if you don't know your risks.

Score:0
my flag

Let's consider the use of a hash to protect a password, and take a couple of examples.

  • First case: we have a hash function which takes 1 second to execute on a run-of-the-mill computer. If the attacker has access to 1 million computers (e.g. via a botnet), they can do 1 million attempts per second, 86 billion attempts per day. But an 8-character password using upper, lower, digits and symbols is about 50 bits of entropy, so at current performance it will still take on average about 18 years to crack that password, even with that massive botnet (of course in 18 years things will probably speed up quite a lot so it'll be shorter than that, but still). Worst case it will take 36 years.

  • Second case: we have a hash function which takes 1 microsecond to execute. Same attacker using the same million-computer botnet can now do a million more attempts per second. It now takes 1125 seconds (less than 20 minutes) to try each and everyone of the about 2^50 possible 8-character passwords. So on average about half the time.

If you think a botnet with one million computers is fantasy, think again.

So it's not a matter of generating more collisions (for any decent hash function, this is directly linked to the size of the output, not the speed of the hash), but a matter of being able to find collisions (or actually, the original input, but you don't necessarily know that), defeating the purpose of the hash.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.