Score:3

Composition of cryptographic hash functions

tr flag

I've stumbled upon many opinions, so I ask about them myself.

Let $H(x)$ and $F(x)$ be hash functions.

  • is $H(F(x))$ or $F(H(x))$ safer than $H(x)$ or $F(x)$
  • is $H(F(x))$ or $F(H(x))$ safe when $H(x)$ or $F(x)$ becomes vulnerable
  • is $H(H(x))$ safer than $H(x)$

Background

Doing my own research, I found statements suggesting that combining H and F is a common practice to stay safe when one of them becomes unsafe, but there were mixed opinions when it comes to whether such combination is as safe as MAX(H, F) or MIN(H, F) in terms of safety, almost nobody suggested than safety will be higher or lower than this.

I also found out that some algorithms involving passwords use SHA256d, which is essentially SHA256(SHA256(x)), and that in theory using more rounds would increase hash safety, which is not so far away from combining the hash with it self.

I also have some personal thoughts on this, originating from school math. It seems obvious to me that if f:A -> B and greatly |A| > |B| and g:B -> C and greatly |B| > |C| then g(f(x)) should have much more collisions than f or g. But also a fact that we combine two complex functions should be somewhat harder to "break".

So, my own intuition is that g(f(x)) has more collisions, but it's harder to find a method to find a collision on purpose and in the case of functions like SHA or BLAKE the increase in collisions is not a problem, but the security benefit is worth it.

poncho avatar
my flag
In practice, if we want to rely on multiple hash functions (and are looking for collision or second-preimage resistance), we go with $F(x) | H(x)$ (where $|$ is concatenation); it is straight-forward to show that (assuming $F$ or $H$ has a fixed length output) the result is collision/second-preimage resistant if either $F$ or $H$ is collision/second-preimage resistant
kelalaka avatar
in flag
All for this? [How to securely hash passwords?](https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords)
Kuba Chrabański avatar
tr flag
I believe you still try to make sense from my questions as a whole, when there is none. I have a few thing to do, completely separate, which I broke down into small sub-questions, which I then recomposed as a few bigger questions
Kuba Chrabański avatar
tr flag
I ask about specific cases to build some intuition, which can, in the future, reduce the set of solutions to test .
kelalaka avatar
in flag
Cryptography is not a divide, solve, and combine especially for beginners. The devil in details and combining two secure can make insecure designs. I'm simply saying that stop a while take your time and define your actual problem including what you found there and here in the question post.
Score:7
ru flag

Composition is not the best way to combine hash functions, but the risk is a question of what security properties you require.

For collision resistance, finding a collision on $H(F(X))$ is no harder than finding a collision of $F(X)$ and so if $F$ is collision compromised, then so is $H(F(X))$. To see this, let $m_1$ and $m_2$ be to two distinct inputs such that $F(m_1)=F(m_2)$ then because $H$ is a function we have $H(F(m_1))=H(F(m_2))$. In the particular case $F=H$, $H(H(X))$ is no more collision resistant than $H(X)$.

Similarly, if $H(X)$ is collision compromised and $F(X)$ is pre-image compromised (extremely compromised so that pre-images take less than birthday work), we can collision compromise $H(F(X))$ by finding a collision on $H$ and then finding pre-images for both of the colliding messages.

Notice the difference in inner and outer functions. We are currently able to find messages $m_1$ and $m_2$ such that $\mathrm{SHA3}(\mathrm{SHA1}(m_1))=\mathrm{SHA3}(\mathrm{SHA1}(m_2))$, but we cannot find $m_3$ and $m_4$ such that $\mathrm{SHA1}(\mathrm{SHA3}(m_3))=\mathrm{SHA1}(\mathrm{SHA3}(m_4))$.

Similarly for second pre-image resistance if the inner function is second pre-image compromised, then so is the composition.

For full pre-image resistance, we can certainly find a full pre-image for the composition if we can find full pre-images for both $H$ and $F$. Pre-image compromise of a single hash function is insufficient: consider a linear hash $L(X)$ which is trivially pre-image broken. For a target $Y$, we can find neither $X$ such that $\mathrm{SHA3}(L(X))=Y$ nor $L(\mathrm{SHA3}(X))=Y$.

Score:6
cn flag

General wisdom

As a general bit of wisdom, combining cryptographic primitives is rarely a straight win. It can strengthen some properties if done right, but weakens other properties. So it should only be done if it strengthens the properties you care about and doesn't weaken the properties you care about.

combining H and F is a common practice to stay safe when one of them becomes unsafe, but there were mixed opinions when it comes to whether such combination is as safe as MAX(H, F) or MIN(H, F) in terms of safety, almost nobody suggested than safety will be higher or lower than this.

You found mixed opinions because it depends how you do the combination. (That, or because people were uninformed. It happens.)

I also have some personal thoughts on this, originating from school math. It seems obvious to me that if f:A -> B and greatly |A| > |B| and g:B -> C and greatly |B| > |C| then g(f(x)) should have much more collisions than f or g.

Counting collisions is irrelevant. What matters is the difficulty of finding them. MD5(x)||SHA1(x) is a 288-bit hash and probably has fewer collisions on 257-bit strings than SHA256(x) — in fact it's plausible that MD5(x)||SHA1(x) has no collisions at all on 257-bit strings, whereas SHA256(x) must have collisions by the pigeonhole principle. But we know how to find collisions on MD5(x)||SHA1(x) for slightly longer strings in long-but-doable time (it's only a little more expensive than SHA1 collisions), whereas we don't know how to find SHA256 collisions at all.

But also a fact that we combine two complex functions should be somewhat harder to "break".

No, this is not a fact. It entirely depends on what you're combining and how.

Composing two hashes

Regarding hashes and collision resistance, composing two hashes reduces the security. In other words, $H \circ F$ is less collision-resistant than $H$ or $F$ on its own.

It's easy to see that if $F$ has a collision, then so does $H \circ F$: if $F(x_1 = F(x_2)$ with $x_1 \ne x_2$ then $(H \circ F)(x_1) = H(F(x_1)) = H(F(x_2)) = (H \circ F)(x_2)$. This alone means that composing two hashes is, at best, useless if what you care about is collision resistance: you might as well just use $F$.

The collision resistance of $H \circ F$ can be better than that of $H$ alone. If $H$ has a collision $H(y_1) = H(y_2)$ with $y_1 \ne y_2$, to use this fact to find a collision for $H \circ F$, you need to find a preimage for both $y_1$ and $y_2$. So you're confident that $F$ is collision-resistant and preimage-resistant, then $H \circ F$ can be safer than $H$ in terms of collision resistance. However, since it isn't safer than $F$, the only reason to use this construction is if it improves some other property.

If $H \circ F$ has a collision, i.e. if $H(F(x_1)) = H(F(x_2))$ with $x_1 \ne x_2$, then either $F(x_1) = F(x_2)$ and this is a collision for $F$, or $F(x_1) \ne F(x_2)$ and this is a collision for $H$. So the composition is no worse than the worst of the two, for collision resistance.

Composing two hashes can improve preimage resistance. To take advantage of the structure of $H \circ F$ when looking for a preimage, you have to both find a preimage through $H$ and then a preimage of that through $F$. However, no cryptographic hash function that I would qualify as having been mainstream has ever had preimage resistance broken, so it's not a practical concern.

Preimage resistance is a concern for password hashing functions. But password hashing functions are a completely different kind of cryptographic primitive from “ordinary” hash functions. They have different parameters and different security objectives. Collision resistance is not relevant to password hashing. Composing two password hashing functions can make sense, but you need to be careful: it definitely is possible to do it wrong.

Composing two hashes can also improve security properties other than the properties that define a cryptographic hash function. Hashes are often used as random oracles, and commonly used hashes (such as the SHA-2 family) are known to be imperfect as random oracles due to a length extension attack. Composing two hashes, even the same function, as in SHA256d(x) = SHA256(SHA256(x)), eliminates the length extension attack and does not introduce another known weakness on its own. (“On its own” is important: if you mix SHA256d and SHA256 on the same data, the results can be disastrous.)

Concatenating two hashes

Another simple way to combine two hashes is to concatenate them: $H(x) || F(x)$. This is a definite improvement in terms of collision resistance: if there's a collision for $H || F$, it's also a collision for $H$ and $F$. Concatenation also improves second-preimage resistance: if you know $H(x_1) || F(x_1)$ and you want to find $x_2 \ne x_1$ such that $H(x_1) || F(x_1) = H(x_2) || F(x_2)$, you need to find a second preimage for both $F$ and $H$. For both collision resistance and second-preimage resistance, the concatenation is at least as strong as the weaker of the two, and possibly stronger.

An example where concatenation was used in a real-world protocol is older versions of the SSL/TLS protocol, up to version 1.1. They used MD5(x)||SHA1(x) for the handshake signature, where the essential security concern is second-preimage resistance. TLS 1.2 replaced this by a single, customizable hash function (typically SHA-256 or SHA-384): the additional complexity wasn't worth it (and there was no sensible popular hash function to combine with SHA-256 at the time anyway).

Concatenation is not an unambiguous win. For example, it reduces first-preimage resistance to the weaker of the two functions: if you know $H(x) || F(x)$ then you can find $x$ if you know how to do this either via $H(x)$ or via $F(x)$.

Xoring two hashes

Yet another way to combine two hashes is to xor them: $H(x) \oplus F(x)$. The security properties of the result depend on the choice of functions. This has an obvious potential for going horribly wrong: the special case $H = F$ results in output that's all-bits-zero. On the other hand, if $H$ and $F$ are independent, then $H \oplus F$ is at least as good as the stronger of the two as a random oracle. I'm not sure off the top of my head if there's a sensible condition that would give any guarantee on collision resistance.

Kuba Chrabański avatar
tr flag
"Preimage resistance is a concern for password hashing functions." That's what I expected, okay, but does second preimage resistance matter here? I believe not, at least not in a straight way as: "lack of preimage resistance implies no second preimage resistance" and it applies in one direction
Gilles 'SO- stop being evil' avatar
cn flag
@KubaChrabański Indeed, for a password, second preimage resistance is just as irrelevant as collision resistance. If the adversary knows one preimage, his job is done.
Kuba Chrabański avatar
tr flag
Okay, thank you, very good and comprehensive answer!
Score:4
in flag

The target security of cryptographic hash functions like SHAx, BLAKE, etc. is collision resistance. The classical generic collision attack cost is $\mathcal{O}(2^{n/2})$-time due to the birthday attack.

Now, look at your hash compositions in a little detail;

  • is $H(F(x))$ or $F(H(x))$ safer than $H(x)$ or $F(x)$
  • Length extension attack;

Double hashing is proposed by Bruce Schneier to mitigate the length extension attack of the MD-based hash functions. Length extension attack is performed against prefix-secret $H(secret||message)$. Double hashing is designed as $H(H(x))$ and Bitcoin uses SHA256d to have slow mining. Your case is different, however, this attack only works if you want to use this hash to construct a MAC with prefix-secret construction. In double hashing, the attackers cannot extend the internal hash, so it is safe against length extension even the single hashes are not. It is, however, time-consuming, use SHA3, SHA-512/256, BLAKE2, Shake that are already secure in this manner even for the possible cryptographic quantum computer.

Any cryptographic hash function with length extension attack is not Random Oracle. Therefore, SHA3 and BLAKE2 are more close to random oracles.

  • **

  • Collision

resistance**

If the internal hashes $H$ or $F$ is not collision-resistant then the combined hashes $H(F(x))$ or $F(H(x))$ will not be collision-resistant. Given $x,y$ with $x\neq y$, if $H(x)=H(y)$ then $F((H(x))=F(H(y))$ is collision for the composition, and similarly for the $H(F(\cdot))$ case.

If you want to mitigate the collision attack of the internal hash function the simple answer; don't use it.

  • Pre-image resistance

    The first and second pre-image resistance have the same problem, skipped here.

  • is $H(F(x))$ or $F(H(x))$ safe when $H(x)$ or $F(x)$ becomes vulnerable

When $F(x)$ has a collision then $H(F(x))$ has a trivial collision answered in the first part.

Let $F(x)$ secure ( no collision, no length-extension, no-pre-image) but not $H(x)$ then what about $H(F(x))$? This is a bit tricky, to use the collision attack of $H$ we need to find pre-images of $F$ so this is more secure.

  • is $H(H(x))$ safer than $H(x)$

A little yes; $H(H(x))$ is secure against length extension attack. If there is a collision of $H$, $H(x)=H(y)$ with $x \neq y$ then it is a collision for the double $H(H(x)) = H(H(y))$, too.

I also found out that some algorithms involving passwords use SHA256d, which is essentially SHA256(SHA256(x)), and that in theory using more rounds would increase hash safety, which is not so far away from combining the hash with it self.

Whoever uses SHA256d for password hashing they don't know anything about them. Password hashing requires time, memory, and thread consumption in order to mitigate massive searches with CPU, GPU, or ASIC.

SHA256d is used in Bitcoin to have slow mining. Any newly designed system for password hashing should use at least Scrypt or better Argon2 and the old systems should be migrated as soon as possible.


For an academic detailed result on hash combiners start reading from

And, even they can generate multi-collisions;

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.