Score:2

# How strong if I combine two hash functions, such as MD5(SHA256(input))?

If I try to do `MD5(SHA256(input))`, what is the strength of this so-called double hashing approach?

Is it as strong as SHA256, or as strong as MD5, or as strong as SHA256 + MD5?

This is not a homework question btw, I am asking because of a real issue in my project. By right, I only need to do `SHA256(input)` on the input, and store it in a column in one MySQL table. But my practical concern is: SHA256 is of length 64 when being base64 encoded, but MD5 is of length 32 when being base64 encoded. I need an index on this column. Would MySQL perform better when I use MD5(SHA256(input)) because it is shorter (either because of a more condensed B+ tree or basically smaller means less storage)?

Edit by myself: try to think about what do we mean by "strong" here? Is it pre-image resistance, or second-preimage resistance?

Think about some other hash function much worse at collisions than MD5 -- say Hash(X) = 1... what would the security of Hash(SHA256(X)) be? The same logic applies to this question.
A SHA-256 (resp. MD5) hash is _not_ of length 64 (resp. 32) when base-64 encoded. Using binary as search key has no _good_ IT reason to be an issue, and has reasons to be faster. Since security goals of the overall system are not stated, we can't tell if they are met. I reopened the Q, but really: why would this be considered, with the broken MD5? Why not a modern truncated hash?
[RIPEMD](https://en.wikipedia.org/wiki/RIPEMD) is better to be used than MD5, which Bitcoin already uses RIPEMD-160
What is the total data size? Why the collision is important for you? Why can't you use PRP that guarantees uniqueness provided that the data are unique? Your target is not clear to provide a full answer... And the B+ tree part is off-topic here.
Aside from the fact storing truncated SHA256 is as good cryptographically, if you want you can store full SHA256 and have MySQL _index_ only a prefix of it; see the manual. But that's not cryptography and is offtopic.
Score:3

We study a hash $$H$$ defined by $$H(m):=H_2(H_1(m))$$, with SHA-256 for $$H_1$$ and MD5 for $$H_2$$.

The collision-resistance of MD5 is badly broken, therefore the generic argument¹ that collision resistance of $$H$$ is at least the weakest of the collision resistance of $$H_1$$ and $$H_2$$ leads us nowhere.

The generic argument² that first preimage resistance of $$H$$ is at least that of $$H_2$$ works inasmuch as we trust the preimage resistance of MD5; which is questioned (see this), but not broken in practice.

The only thing I can think of about second preimage resistance of $$H$$ is that it can't³ be better than second preimage resistance of $$H_1$$. That gives no security assurance, and since second preimage resistance of SHA-256 is believed good that does not give an attack either.

From an applied standpoint (which is what the question seems to be about since it mentions a concrete application), I don't see that any of the existing cryptanalytic attacks on MD5 or SHA-256 are relevant.

• MD5's collision resistance is broken. But the short MD5 input size (the 256-bit output size of SHA-256 is only half an input block of MD5), and the heavy constraint that this input is a hash, are each enough to make existing MD5 collision attacks inapplicable (restricting to those much better than brute force).
• Both SHA-256 and MD5 have an undesirable length extension property, but the fixed length of the MD5 input blocks that for MD5's property. And the re-hashing of the SHA-256 output to half size blocks that for SHA-256's.

Thus the only cryptographically relevant attacks I see are the ones there remains for ideal hashes $$H_1$$ and $$H_2$$ with their respective output width. Since $$H_2$$ is much narrower than $$H_1$$, brute force attacks are limited by the width of $$H_2$$ and speed considerations. In particular:

• The question's compound hash is only 128-bit wide, thus collisions can be exhibited with roughly $$2^{66}$$ evaluations (of both SHA-256 and MD5) using a standard distributed collision search. I can imagine situations where that low collision resistance is an issue (and can't tell here, that's application-dependent).
• The compound hash is not usable for key stretching, as would be needed for passwords, for it's much too fast and not memory hard. There's no indication key stretching is a requirement, though.

That compound hash lacks a clear security argument, is not very fast especially for short input, and is bad from at least a public relations perspective because MD5 is broken. As far as I can tell, it's in practice about as good as any mildly fast 128-bit hash can be on all technical standpoints, including unsatisfactory on collision-resistance (but there's no certainty that collision-resistance is a requirement in the use case). I see no reason to use that compound hash rather than something more modern and safer than MD5, including some (possibly truncated) SHA-2, or SHA-3's SHAKE128 with it's tunable output size, or Blake2b (or Blake3 after the dust settles).

¹ Hypothesize a collision for $$H$$, that is $$(m,m')$$ with $$m\ne m'$$ and $$H(m)=H(m')$$. We can compute $$w=H_1(m)$$ and $$w'=H_1(m')$$. If $$w=w'$$, we have a collision for $$H_1$$. Otherwise, we have $$H_2(w)=H_2(w')$$ with $$w\ne w'$$, thus a collision for $$H_2$$. Thus we have broken collision resistance for $$H_1$$ or $$H_2$$.

² Hypothesize an algorithm finding first preimage for $$H$$, that is given $$h$$ finding $$m$$ such that $$H(m)=h$$ with non-negligible probability and feasible effort. We can compute $$w=H_1(m)$$ and it's such that $$h=H_2(w)$$. Thus we have built a method finding first preimage for $$H_2$$ with the same non-negligible probability of success and essentially the same effort as the hypothesized algorithm.

³ An algorithm finding second preimage for $$H_1$$, that is given $$m$$ finding $$m'\ne m$$ such that $$H_1(m)=H_1(m')$$, also finds second preimage for $$H$$.