Score:2

How strong if I combine two hash functions, such as MD5(SHA256(input))?

in flag

If I try to do MD5(SHA256(input)), what is the strength of this so-called double hashing approach?

Is it as strong as SHA256, or as strong as MD5, or as strong as SHA256 + MD5?

This is not a homework question btw, I am asking because of a real issue in my project. By right, I only need to do SHA256(input) on the input, and store it in a column in one MySQL table. But my practical concern is: SHA256 is of length 64 when being base64 encoded, but MD5 is of length 32 when being base64 encoded. I need an index on this column. Would MySQL perform better when I use MD5(SHA256(input)) because it is shorter (either because of a more condensed B+ tree or basically smaller means less storage)?


Edit by myself: try to think about what do we mean by "strong" here? Is it pre-image resistance, or second-preimage resistance?

us flag
Think about some other hash function much worse at collisions than MD5 -- say Hash(X) = 1... what would the security of Hash(SHA256(X)) be? The same logic applies to this question.
fgrieu avatar
ng flag
A SHA-256 (resp. MD5) hash is _not_ of length 64 (resp. 32) when base-64 encoded. Using binary as search key has no _good_ IT reason to be an issue, and has reasons to be faster. Since security goals of the overall system are not stated, we can't tell if they are met. I reopened the Q, but really: why would this be considered, with the broken MD5? Why not a modern truncated hash?
kelalaka avatar
in flag
[RIPEMD](https://en.wikipedia.org/wiki/RIPEMD) is better to be used than MD5, which Bitcoin already uses RIPEMD-160
kelalaka avatar
in flag
What is the total data size? Why the collision is important for you? Why can't you use PRP that guarantees uniqueness provided that the data are unique? Your target is not clear to provide a full answer... And the B+ tree part is off-topic here.
dave_thompson_085 avatar
cn flag
Aside from the fact storing truncated SHA256 is as good cryptographically, if you want you can store full SHA256 and have MySQL _index_ only a prefix of it; see the manual. But that's not cryptography and is offtopic.
Score:3
ng flag

We study a hash $H$ defined by $H(m):=H_2(H_1(m))$, with SHA-256 for $H_1$ and MD5 for $H_2$.

The collision-resistance of MD5 is badly broken, therefore the generic argument¹ that collision resistance of $H$ is at least the weakest of the collision resistance of $H_1$ and $H_2$ leads us nowhere.

The generic argument² that first preimage resistance of $H$ is at least that of $H_2$ works inasmuch as we trust the preimage resistance of MD5; which is questioned (see this), but not broken in practice.

The only thing I can think of about second preimage resistance of $H$ is that it can't³ be better than second preimage resistance of $H_1$. That gives no security assurance, and since second preimage resistance of SHA-256 is believed good that does not give an attack either.


From an applied standpoint (which is what the question seems to be about since it mentions a concrete application), I don't see that any of the existing cryptanalytic attacks on MD5 or SHA-256 are relevant.

  • MD5's collision resistance is broken. But the short MD5 input size (the 256-bit output size of SHA-256 is only half an input block of MD5), and the heavy constraint that this input is a hash, are each enough to make existing MD5 collision attacks inapplicable (restricting to those much better than brute force).
  • Both SHA-256 and MD5 have an undesirable length extension property, but the fixed length of the MD5 input blocks that for MD5's property. And the re-hashing of the SHA-256 output to half size blocks that for SHA-256's.

Thus the only cryptographically relevant attacks I see are the ones there remains for ideal hashes $H_1$ and $H_2$ with their respective output width. Since $H_2$ is much narrower than $H_1$, brute force attacks are limited by the width of $H_2$ and speed considerations. In particular:

  • The question's compound hash is only 128-bit wide, thus collisions can be exhibited with roughly $2^{66}$ evaluations (of both SHA-256 and MD5) using a standard distributed collision search. I can imagine situations where that low collision resistance is an issue (and can't tell here, that's application-dependent).
  • The compound hash is not usable for key stretching, as would be needed for passwords, for it's much too fast and not memory hard. There's no indication key stretching is a requirement, though.

That compound hash lacks a clear security argument, is not very fast especially for short input, and is bad from at least a public relations perspective because MD5 is broken. As far as I can tell, it's in practice about as good as any mildly fast 128-bit hash can be on all technical standpoints, including unsatisfactory on collision-resistance (but there's no certainty that collision-resistance is a requirement in the use case). I see no reason to use that compound hash rather than something more modern and safer than MD5, including some (possibly truncated) SHA-2, or SHA-3's SHAKE128 with it's tunable output size, or Blake2b (or Blake3 after the dust settles).


¹ Hypothesize a collision for $H$, that is $(m,m')$ with $m\ne m'$ and $H(m)=H(m')$. We can compute $w=H_1(m)$ and $w'=H_1(m')$. If $w=w'$, we have a collision for $H_1$. Otherwise, we have $H_2(w)=H_2(w')$ with $w\ne w'$, thus a collision for $H_2$. Thus we have broken collision resistance for $H_1$ or $H_2$.

² Hypothesize an algorithm finding first preimage for $H$, that is given $h$ finding $m$ such that $H(m)=h$ with non-negligible probability and feasible effort. We can compute $w=H_1(m)$ and it's such that $h=H_2(w)$. Thus we have built a method finding first preimage for $H_2$ with the same non-negligible probability of success and essentially the same effort as the hypothesized algorithm.

³ An algorithm finding second preimage for $H_1$, that is given $m$ finding $m'\ne m$ such that $H_1(m)=H_1(m')$, also finds second preimage for $H$.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.