It might be a huge problem in your case since there is an identical prefix collision in MD5;
|identical prefix | free part of file A | identical suffix |
|identical prefix | free part of file B | identical suffix |
^
they have collision here| the rest is the same
Although Today the collision finding for MD5 is very easy, where the attacker can control a middle block of MD5, the probability of this may lessen if the files are not arbitrary since it will give the attack fewer possible candidates for the collision.
In your case, there is no attacker and you are looking for uncontrolled collision. In VCSs, files with lots of edits can fall into the pool of possible collision scenarios, same prefix, some changed parts, and identical suffix. Your major problem will be determining which part to test; just the second block ( MD5 has 512-bit blocks) or just the third block or second and third block...
Why bother with MD5 and have a secondary check while we can have better and faster alternatives.
- BLAKE2 was the fastest around now there is BLAKE3 which is even faster. BLAKE2 ~2 times and BLAKE3 ~9 times faster than MD5. Use BLAKE2/3 with 512-bit output and have a $2^{256}$-time collision resistance; so creating a collision is computationally infeasible.
- SHA-512 which almost has the same speed as MD5 and it can guarantee much better collision resistance that MD5 cannot match by any means.
The conclusion of Corkami;
Kill MD5!
Unless you actively check for malformations or collisions blocks in files, don't use MD5!
It's not a cryptographic hash, it's a toy function!