Score:3

Will Hashing Multiple Times Be More, Less, Or Similarly Secure As Hashing Once

br flag

Will Hashing Multiple Times Be More, Less, Or Similarly Secure As Hashing Once?

Flushing out this question:

  • I saw this claim:
  • To me, it would appear that what you are attempting to do in rehashing is to slow down a brute force attack.
  • Although the birthday problem could apply to multiple hashing, it seems it would equally apply to the first example (which the person labeled as "don't do this"), as to the second example (which the person felt more comfortable with).
  • In either example, once the collision is found, then rehashing at that point is moot (since it will reproduce the same results between the hash and its collision).
  • The specific scenario for this question is: is the person's example's that I have given in my question above correct in their assertion - that example 2 will make collisions, probability speaking, happen less often?

What I am looking for in an answer:

  • This is not my field of expertise (which is why I am asking it here, where it is your expertise). Please feel free to correct my misunderstandings and problematic assumptions.
  • I am interested in both a detailed technical explanation as long as it is accompanied by a layman's explanation of the steps and particularly the conclusion (with the emphasis being on the layman's explanation).
  • I am asking theoretically of all one way hashing functions (if there are distinctions between broad classes of hashing algorithms, please feel free to generally point those out while understanding that the implications of your answer would practically be applied on my part toward a hashing algorithm similar to SHA-2, SHA-3, or bcrypt).
  • I have looked through 10 different crypto.stackexchange.com questions and answers that would seem very similar to this question, but either they dealt with specific use cases that shaped the answer to the use case, or they gave a theoretical proof without explaining in layman's terms what their proof was actually doing and a layman's conclusion to their answer.
    • For the purposes of this question, assume my calculus is not up to par, nor am I generally read on the mathematical variables being used in cryptography, but assume that I can follow algebra and can dust off my calculus if you tell me what the variables stand for (or links to their explanations).

Thank you in advance for your help with this question.

jp flag
It depends on what you're using the hash for. Hashing passwords (and deriving keys from low-entropy secrets) is a lot different from most other applications, and has very different security considerations.
kelalaka avatar
in flag
What is your aim? What do you want to achieve? What questions and answers don't satisfy you?
Score:5
ng flag

is the person's example's that I have given in my question above correct in their assertion - that example 2 will make collisions, probability speaking, happen less often?

Several assertions are made there:

One assertion is that the function 1 of password + salt yielding hash defined by

hash = sha512(password + salt); 
for (i = 0; i < 1000; i++) {
    hash = sha512(hash); // <-- Do NOT do this!
}

is more likely to collide than the function 2 of password + salt yielding hash defined by

hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
    hash = sha512(hash + password + salt);
}

That's correct in theory, and would be observable in practice for a narrow hash, e.g. 80-bit.

One way to see it is that

  • the output set of a fixed random function $H$ iterated $n$ times tends to narrows as $n$ grows (argument: it can never increase when $n$ does, and some narrowing results from collisions of the iterated hash, here SHA-512);
  • the probability of collision of a random function for two random distinct inputs is the inverse of the size of it's output set.

It's thus understandable (even though iterating a random function $n$ times may not quite yield a random function over what remains of the output set) that the probability of collision of $H^n(x)$ for two random distinct $x$ increases with $n$.

That limits the collision-resistance of function 1 which for loop iterates $H: h\mapsto\operatorname{SHA-512}(h)$. Whereas in 2, the function iterated changes when the input password + salt changes, thus the output sets shrinking with $n$ depend on that input. Under a model of the hash as a random function, 2 also is a random function that can potentially reach the full set, and it's probability of collision for different inputs password + salt does not increase as $n$ does.


It's also asserted that for this reason we should in practice use 2 rather than 1, and that's incorrect for a number of reasons:

  • At least for any hash that's collision resistant, as SHA-512 is, we need not worry about collisions or cycles at all.
  • In the context (password hashing application), collision-resistance of the overall iterated function is not an issue. Preimage resistance is. And even for SHA-1, which is not collision resistant, and any realistic $n$, we don't need to worry that breaking 1 would require sizably less evaluations of the hash than breaking 2 would, including with realistic precomputation.

An advantage of 2 is: it's slightly more difficult to implement in hardware because the hardware needs to use password and salt at each iteration, which thus must be stored. ASICs to accelerate 2 will thus be more complex than for 1. That's the only reason I see 2 is less bad than 1 in practice. It still is bad, because $n=1000$ iterations is not adequately slow facing ASICs or GPUs doing password search; and because the whole thing is not putting memory to use, thus loosing the extra protection against brute force password search that modern memory-hard password hashes (like scrypt or Argon2) give.


Will hashing multiple times be more secure than hashing once

This depends primarily on the objective, which dictates the hash to use

  • If we are hashing for the purpose of key stretching, that is typically when hashing a password or passphrase, yes: security against brute force search grows about linearly with the number of rounds. But again, we should use a memory-hard password hash. For equal time spent, even the obsolete bcrypt will give more security than constructions purely iterating a hash, like PBKDF2 or the functions 1 and 2, which are plain not recommendable in an era of ASICs, FPGAs and GPUs for password cracking.
  • If we are hashing more than once to reinforce some cryptographic hash function with a defect (like it's collision-resistance is broken), possibly yes. E.g. function 2 (with password + salt replaced by the message to hash) would fix all known security defects of MD5 or SHA-1 beside their output width, even if we make only one extra round rather than 1000. However that's at a huge price in performance and usability, because we need to hash the message twice, thus in some cases store it.
  • For other cryptographic purposes including signature and key derivation from a wide key using a modern hash, no. We can use SHA-2 or SHA-3 (to name those in the question) for these purposes. Iterated hashing is not necessary, and can even reduce collision resistance slightly (if performed as in 1), with one exception: if the hash has the length-extension property (like SHA-2 has, but not SHA-3), and if that's undesirable, it's reasonable to re-hash once the output of the hash (which does not require storing the message twice).
Score:1
us flag

Here is an intuitive (versus rigorous) explanation:

A hash function appears random, but it doesn't actually add any entropy to the result. In fact, it removes entropy by the property that many inputs may result in the same output. (Only a one-to-one and onto function could preserve entropy.) Therefore, each application of the hash to its own output removes entropy until it converges to a repeating cycle (finite group) of order possibly much smaller than the field size. I am unaware of any proof of bounds on the order of such a group for LSR hashes.

The second example refreshes the entropy with each application by adding the password and salt. Because it is a different operation, it has the effect of switching to a different cycle (coset) each round.

So, working backwards, we start with an intuition about the hash function as a set of many cycles rather than a fully-randomized map. An ideal hash function would be one large cycle, and in that function, the first example would be fine.

If we were to draw a very rough analogy to a simpler algebraic field, we might consider that we are looking at a composite field composed of many cosets of the kernel group defined by application of the hash function, whereas only a prime field would have maximal order.

And one more thing, because I have been mixing the concepts of cycles and entropy loss. The LSR has a limited capacity for entropy, and loses some portion of the excess in each round. That means that the union of all sets of cycles that any input could end up in compose a much smaller field than the input itself. Putting these concepts together, repeated application slowly drops all inputs into a small number of small cycles, meaning that you will greatly increase the chance of collision.

To state the obvious, a greater chance of collision means a greater chance of guessing an input that will hash to the same value as the password, and thus grant access. Remember that the password itself is irrelevant, just the ability to provide a value that hashes to same value as the password.

Hopefully someone can show this with more rigor using theorems of abstract algebra, and properties of LSR hash groups. Looking forward to it. Thanks for the question.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.