I read this question: Cracking $f(x) = Cx \oplus Dx$
Asking about finding collisions in a simple 64 bit hash, and I thought I will give it a go myself just for fun.
I quickly wrote code to find collisions:
https://gist.github.com/meirmaor/b0e59352eb73cacec47d0f95c25a25fc
And yet it finds many near collisions and no full collisions, this baffles me.
Algorithm description:
I wanted to solve this using 8GB of Ram, so I allocate two Int arrays of length $2^{30}$ *(4 byte int) each.
I populate them by hashing Int values, I take the lower 30 bits as the index into both arrays, and store the top 32 bits in the first array and the source int in the second array.
I populate using $2^{32}$ possible Int values(as byte arrays), and get as expected a 98% fill rate, vary close to the idealized $1-e^{-4}$ I would expect.
It's like a hash table but I don't deal with collisions, just keep a single value for each 30 bit hash key. It's essentially a mapping between truncated 62 bits hash to 32 bit origin.
I then try hashing longer values with an extra Int prefix, and search for collisions, again using lower 30 bits as index to array, check if top 32 match and if they do we found a near collision. However when verifying them I find no full collision, I've found over 60 near collisions so far, validated them separately they indeed match in 62 or 63 bits, but I was expecting 1/4 to be a full collision, I got 0.
I repeated the test twice first comparing hashes of 4 bytes to hashes of 8 bytes starting with the bytes {small number,0,0,0}. I then tried comparing hashes of equal length by pre-populating with hashes of data all beginning with the byte sequence {1,0,0,0} and comparing again with the prefix {2+,0,0,0}
How is this possible, something special in this hash function? A strange bug in my code allowing me to successfully find near collisions but no full collisions?
Is there a reason near collisions found this way won't turn into full collisions.
An example of a near collision found(I have many):
Array(24, 0, 0, 0, 14, 103, 61, 80) vs
Array(1, 0, 0, 0, -2, -81, 79, 79)