Score:3

Can I treat SHA-256 hashes as 64 fair dice rolls with numbers between 1 and 16?

in flag

My understanding was that SHA-256 is pretty random or "random" enough.

I assumed that would mean that every character would behave like a 1 to 16 dice roll.

With this assumption, I would expect that you can model the probability of repeating characters as $16^x$. So a chain of $\texttt{FFF}$ or $\texttt{333}$ would have a chance of 1 to $16^3 (4096)$ and a chain of $\texttt{FFFF}$ a chance of 1 to $16^4 (65536)$.

But while generating a lot of hashes (with random UUIDs as seed) to confirm my assumption the numbers do not add up. For example, in a set of 100k hashes I already have more than 1k chains of 4 characters or more (while I was expecting between 1 and 2 chains).

So here I am trying to understand why my assumption was wrong in the first place.

Did I fundamentally misunderstood the randomness of SHA-256 hashes or is it something else?

kelalaka avatar
in flag
Not a clear question since your experiment is not clear. See my [SHA-1's experiment on the leading zeros](https://crypto.stackexchange.com/a/83227/18298). How do you have 1K chain? Note that we model SHA-256 as a Pseudo-Random Function we don't know it is one one not.
kelalaka avatar
in flag
Your model missing a point that in 64-hex output of SHA-256 you need to find the probability of a sequence of 4 characters anywhere. You cannot really model as each output hex as a single roll since they are not independent of the input...
in flag
@kelalaka thx for the input!
Score:4
my flag

So a chain of $\texttt{FFF}$ or $\texttt{333}$ would have a chance of 1 to $16^3 (4096)$

Actually, a chance of three repeated nybbles (be it $\texttt{FFF}$ or $\texttt{333}$ or $\texttt{000}$) would be 1 in $16^2 (256)$ - that happens because there are $16^3$ equally likely values of those 3 nybbles, and 16 of those patterns are repeats - hence the probability of a repeat is ${16 \over 16^3} = {1 \over 16^2}$. If you specify that they must be $\texttt{FFF}$ (and so $\texttt{333}$ would not count), you'd then get $16^3$; however that's not what you're doing.

For example in a set of 100k hashes I already have over 1k chains of 4 characters or more

That's about right - in 100k hashes, there are roughly 6,000,000 places where a string of 4 repeated nybbles might occur; any one place has a probability of $16^{-3} = {1 \over 4096}$ of being a repeat - a simplistic computation gives about an expected 1,400 strings of repeats.

I say simplistic, because this straight-forward computation ignores overlapping strings - for example, a string of 5 repeated nybbles would count as a run, not 2 runs of 4. In addition, the probabilities involved with overlapping strings are not independent. While these effects reduce the expected total somewhat, I believe that the simplistic computation is good enough for a back-of-the-envelope estimate.

in flag
thank you very much! By pointing out the error in my assumption i was able to understand where the problem is and with this video https://www.youtube.com/watch?v=O4Qnsubo2tg i was able to understand how i have to adjust my function
in flag
tbh i am still a little confused on why a chance of 1/4096 does not mean on avgr of 100k / 4096 outcomes , because that would be ~24.
knaccc avatar
es flag
@braunbaer Because in a 64 char hex string, there are 61 possible positions where there can be a 4-hex string sequence. For each of those positions, the chances of the first char being the same as the next three chars is (1/16)^3 = (1/4096). Hence, quad-repeating-hex sequences per hash will be (1/4096 * 61) = 0.01489257812. Per 100k hashes, that's 0.01489257812 * 100k = 1489.
in flag
@knaccc yes! that makes so much sense. So just to be clear, if we would work with a 4 char hex string we would have a "plain" chance of 1/4096 as there is only one possible position for a 4-hex string sequence or (1/4096 *1) to be clear
knaccc avatar
es flag
@braunbaer yes, exactly. 1/4096 chance of all hex chars being the same, which is another way of saying that the 2nd, 3rd and 4th char are all the same as the first.
Maarten Bodewes avatar
in flag
This Q/A made HNQ, so I edited the question to be representative - which means also updating the answer of course - hope you don't mind.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.