It's my understanding that MD5 is still resistant to preimage attacks in the general case, but that an attacker may still launch a second preimage attack if an innocent file is "unlucky" enough to be a viable collision candidate.
- What are the odds of a uniformly random file of length $512n$ bits being a candidate for an MD5 collision attack?
- Is it possible to detect current state-of-the-art attack candidacy from just $H(m)$, or would I need to have access to $m$ to tell?
I was inspired to this question when reading that Github keeps SHA-1 functioning in public-facing production services despite the demonstrated existence of collision attacks:
The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less [than $2^{160}$] time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.
GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the “innocent” half of their collision, as well as preventing them from hosting the malicious half.
The possibility of false positives can be neglected as the probability is smaller than $2^{-90}$.
The "when computing" suggests (alongside my amateur analysis of the code) that $m$ is required to detect the SHAttered attack, but I'm unclear whether MD5 collisions have the same detection requirements, or if they could be predicted solely from the digest.