Score:0

# What are the chances of two 5-symbol strings derived from md5 colliding?

I'm taking 2 medium-length strings (50-70 chars) and hash them using md5 to get results like d2ae4f4919a10958e2c603782f0ec1cc, then recording the first 5 symbols of the hash to provide a (unique) short key. If md5 distribution is (almost) random then would the chances of a collision be 16^5 ([0-9a-f]^5) = 2/1.048.576, or different? Am I extremely lucky to get 2 hashes like d2ae4f4919a10958e2c603782f0ec1cc and d2ae41c1935738ca4a06ba28aad3e555 which start with the same 5 alphanumerals, or there's something else going on? The strings start the same, but it shouldn't matter should it.

Score:1

The number of possible truncated hashes is $$d=16^5$$. Assuming MD5 is perfectly random, by the birthday bound, your probability of seeing at least one collision is approximately

$$1 - \left(\frac{d-1}{d}\right)^{n \text{ choose } 2},$$ where $$n$$ is the number of strings you hashed. It doesn't take many to hit 50%. In fact, the above equals 50% when $$n= 1206$$.

So no, you're not extraordinarily lucky. This was pretty likely.

so it is 16^5. and i've hashed 10 strings... but ok
$n$ never reaches $1206$. OP said they were creating hashes in groups of two, and then comparing only the two results hashed in that group. In which case, $n$ never exceeds $2$, and the probability is very close to $1$ that they should not see two hashes collide.
@aiootp thx you're right but i guess the birthday calculation is what i needed to know i'm using this tool now https://www.bdayprob.com/ if i need to take more strings in future
After hashing 10 strings, probability of at least 1 collision is about 1/23302. So what's described is unlikely, but no impossibly unlikely.
I sit in a Tesla and translated this thread with Ai: