Score:3

Quickest way to find MD5 collision

es flag

I'm trying to find a MD5 hash collision between 2 numbers such that one is prime and the other is composite (at most 1024-bit). I'm using fastcoll with random prefixes for each iteration.

For this I wrote this script:

import subprocess
from Crypto.Util.number import bytes_to_long, isPrime
import string
import random

won = False

N = 10

while not won:
    # Run the fastcoll executable to generate the two messages
    pfx = ''.join(random.choices(string.ascii_uppercase + string.digits, k=N))
    file = open("prefix", "w")
    file.write(pfx)
    file.close()

    subprocess.run(["./fastcoll_v1.0.0.5.exe", "-p", "prefix", "-o", "msg1.bin", "msg2.bin"])

    # Convert the messages to integers
    with open("msg1.bin", "rb") as f:
        msg1_int = bytes_to_long(f.read())
    if(msg1_int.bit_length() > 1024):
        continue
    with open("msg2.bin", "rb") as f:
        msg2_int = bytes_to_long(f.read())
    if(msg2_int.bit_length() > 1024):
        continue
    
    print(msg1_int)
    print(isPrime(msg1_int))
    print(msg2_int)
    print(isPrime(msg2_int))

    # Check if one message is prime and the other is composite
    if isPrime(msg1_int) and not isPrime(msg2_int):
        won = True
    elif isPrime(msg2_int) and not isPrime(msg1_int):
        won = True

However, I'm not sure this is the most efficient way so can someone suggest improvements? Any help would be appreciated.

Maarten Bodewes avatar
in flag
Generating the collision and then checking if the result is prime seems the only way, so in that sense the algorithm should be good. Of course, normally you'd simply compile an executable yourself and run everything in memory, without calls to external applications.
infinite-blank- avatar
es flag
@MaartenBodewes is there a way of restricting fastcoll to 1024 bit pre-images for collisions?
fgrieu avatar
ng flag
@infinite-blank: no. fastcoll is not a pre-image attack. We don't know any practical one for MD5. The general method of generating collisions until one matches the prime/composite criteria seems workable: probability that an integer near $n$ is prime is about $1/\ln(n)$ and that's not overly small. The only speedup I can think of is to modify fastcoll to restrict to odd values.
infinite-blank- avatar
es flag
@fgrieu how do I restrict it to odd values? + I had another idea, we can use the fact that if md5(x) ==md5(y) then md5(x+z) == md5(y+z). So if we find a collision, we can append 1 bits to the pre images until one of them becomes prime. Is this feasible?
fgrieu avatar
ng flag
Yes, for all colliding messages $x$ and $y$ of identical length multiple of 64 bytes that we know how to build, and all suffixes $z$, $\operatorname{MD5}(x)=\operatorname{MD5}(y)\implies\operatorname{MD5}(x\mathbin\|z)=\operatorname{MD5}(y\mathbin\|z)$, and yes that leads to a massive speedup (though if you want others to replicate your results, I recommend to keep what's hashed byte-aligned, since bit-aware MD5 implementations are a rarity). Congratulations. I recommend that you make an answer out of that.
Score:3
es flag

Using the length extension attack of MD5 as I discussed in the comments, I was able to achieve this.

Here's the script I used:

from Crypto.Util.number import isPrime, bytes_to_long, long_to_bytes
import hashlib

x = "4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2"
y = "4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2"

print("x : ", bytes_to_long(bytes.fromhex(x)))

print("md5(x) : ", hashlib.md5(bytes.fromhex(x)).hexdigest())
print("md5(y) : ", hashlib.md5(bytes.fromhex(y)).hexdigest())

z = 1

xx = 0
yy = 0

while True:
    # append 1s till prime
    xx = bytes_to_long(bytes.fromhex(x) + long_to_bytes(z))
    yy = bytes_to_long(bytes.fromhex(y) + long_to_bytes(z))
    if isPrime(xx) and not isPrime(yy):
        break
    z += 2

print("x+z :", xx)
print("y+z :", yy)

print("md5(x+z) : ", hashlib.md5(long_to_bytes(xx)).hexdigest())
print("md5(y+z) : ", hashlib.md5(long_to_bytes(yy)).hexdigest())
fgrieu avatar
ng flag
Note: If we wanted 1023-bit primes for some reason, we can change the initial `z` to `3**322`. For 1024-bit, the initial collision would need to be changed so that the initial byte is at least 0x80.
infinite-blank- avatar
es flag
A little correction from my side, I should have written at most 1024 bit prime in the original question, I've corrected it now.
fgrieu avatar
ng flag
If would be best if you accept your answer. I hope it's possible. You well deserve it.
infinite-blank- avatar
es flag
@fgrieu I think it has a time limit after which I can accept the answer. I also wanted to know how you calculated this : "For 1024-bit, the initial collision would need to be changed so that the initial byte is at least 0x80. "
fgrieu avatar
ng flag
`x` is a 128-char string, representing 64-byte bytestring $X$, and (per the big-endian convention used) a 511-bit integer $x$ (rather than 512) because the high-order bit of the first byte (0x4d) is 0, and the second high-order bit of that is 1. Most MD5 implementations (including the one at hand) support only bytes as input, thus we append bytes to $X$ so that we can still hash it. Appending $k$ bytes to $X$ makes a $64+k$-byte bytestring coding integer `xx` of bit size $511+8k$. That can't be $1024$. For this we need to change the first byte of $X$ so that it's high-order bit is 1.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.