is the person's example's that I have given in my question above correct in their assertion - that example 2 will make collisions, probability speaking, happen less often?
Several assertions are made there:
One assertion is that the function 1 of password + salt
yielding hash
defined by
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash); // <-- Do NOT do this!
}
is more likely to collide than the function 2 of password + salt
yielding hash
defined by
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash + password + salt);
}
That's correct in theory, and would be observable in practice for a narrow hash, e.g. 80-bit.
One way to see it is that
- the output set of a fixed random function $H$ iterated $n$ times tends to narrows as $n$ grows (argument: it can never increase when $n$ does, and some narrowing results from collisions of the iterated hash, here SHA-512);
- the probability of collision of a random function for two random distinct inputs is the inverse of the size of it's output set.
It's thus understandable (even though iterating a random function $n$ times may not quite yield a random function over what remains of the output set) that the probability of collision of $H^n(x)$ for two random distinct $x$ increases with $n$.
That limits the collision-resistance of function 1 which for
loop iterates $H: h\mapsto\operatorname{SHA-512}(h)$. Whereas in 2, the function iterated changes when the input password + salt
changes, thus the output sets shrinking with $n$ depend on that input. Under a model of the hash as a random function, 2 also is a random function that can potentially reach the full set, and it's probability of collision for different inputs password + salt
does not increase as $n$ does.
It's also asserted that for this reason we should in practice use 2 rather than 1, and that's incorrect for a number of reasons:
- At least for any hash that's collision resistant, as SHA-512 is, we need not worry about collisions or cycles at all.
- In the context (password hashing application), collision-resistance of the overall iterated function is not an issue. Preimage resistance is. And even for SHA-1, which is not collision resistant, and any realistic $n$, we don't need to worry that breaking 1 would require sizably less evaluations of the hash than breaking 2 would, including with realistic precomputation.
An advantage of 2 is: it's slightly more difficult to implement in hardware because the hardware needs to use password and salt at each iteration, which thus must be stored. ASICs to accelerate 2 will thus be more complex than for 1. That's the only reason I see 2 is less bad than 1 in practice. It still is bad, because $n=1000$ iterations is not adequately slow facing ASICs or GPUs doing password search; and because the whole thing is not putting memory to use, thus loosing the extra protection against brute force password search that modern memory-hard password hashes (like scrypt or Argon2) give.
Will hashing multiple times be more secure than hashing once
This depends primarily on the objective, which dictates the hash to use
- If we are hashing for the purpose of key stretching, that is typically when hashing a password or passphrase, yes: security against brute force search grows about linearly with the number of rounds. But again, we should use a memory-hard password hash. For equal time spent, even the obsolete bcrypt will give more security than constructions purely iterating a hash, like PBKDF2 or the functions 1 and 2, which are plain not recommendable in an era of ASICs, FPGAs and GPUs for password cracking.
- If we are hashing more than once to reinforce some cryptographic hash function with a defect (like it's collision-resistance is broken), possibly yes. E.g. function 2 (with
password + salt
replaced by the message to hash) would fix all known security defects of MD5 or SHA-1 beside their output width, even if we make only one extra round rather than 1000. However that's at a huge price in performance and usability, because we need to hash the message twice, thus in some cases store it.
- For other cryptographic purposes including signature and key derivation from a wide key using a modern hash, no. We can use SHA-2 or SHA-3 (to name those in the question) for these purposes. Iterated hashing is not necessary, and can even reduce collision resistance slightly (if performed as in 1), with one exception: if the hash has the length-extension property (like SHA-2 has, but not SHA-3), and if that's undesirable, it's reasonable to re-hash once the output of the hash (which does not require storing the message twice).