Score:0

What are the chances of two 5-symbol strings derived from md5 colliding?

pm flag
ADC

I'm taking 2 medium-length strings (50-70 chars) and hash them using md5 to get results like d2ae4f4919a10958e2c603782f0ec1cc, then recording the first 5 symbols of the hash to provide a (unique) short key. If md5 distribution is (almost) random then would the chances of a collision be 16^5 ([0-9a-f]^5) = 2/1.048.576, or different? Am I extremely lucky to get 2 hashes like d2ae4f4919a10958e2c603782f0ec1cc and d2ae41c1935738ca4a06ba28aad3e555 which start with the same 5 alphanumerals, or there's something else going on? The strings start the same, but it shouldn't matter should it.

Score:1
br flag

The number of possible truncated hashes is $d=16^5$. Assuming MD5 is perfectly random, by the birthday bound, your probability of seeing at least one collision is approximately

$$ 1 - \left(\frac{d-1}{d}\right)^{n \text{ choose } 2}, $$ where $n$ is the number of strings you hashed. It doesn't take many to hit 50%. In fact, the above equals 50% when $n= 1206$.

So no, you're not extraordinarily lucky. This was pretty likely.

ADC avatar
pm flag
ADC
so it is 16^5. and i've hashed 10 strings... but ok
in flag
$n$ never reaches $1206$. OP said they were creating hashes in groups of two, and then comparing only the two results hashed in that group. In which case, $n$ never exceeds $2$, and the probability is very close to $1$ that they should not see two hashes collide.
ADC avatar
pm flag
ADC
@aiootp thx you're right but i guess the birthday calculation is what i needed to know i'm using this tool now https://www.bdayprob.com/ if i need to take more strings in future
fgrieu avatar
ng flag
After hashing 10 strings, probability of at least 1 collision is about 1/23302. So what's described is unlikely, but no impossibly unlikely.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.