Score:0

SHA-256 doesn't follow a uniform distribution?

vn flag
Bob

I have been playing with SHA-2-256 in Julia and I noticed that the hashes produced don't appear to follow a uniform distribution. My understanding of secure hashing algorithms is that they should approximate a uniform distribution well, so they are not predictable.

Here is the Julia code I'm using:

using BitIntegers, Distributions, HypothesisTests, Random, SHA

function sha256_rounds()
    rounds::Array{Array{UInt8,1}} = Array{Array{UInt8,1}}(undef, 10000) # 10000 Samples
    hash::Array{UInt8} = Array{UInt8}(undef, 64) # 64-byte array

    for i = 1:10000
        hash = sha2_256(string(rand(UInt64), base = 16)) # Random number, convert to hex string, then seed
        rounds[i] = hash
    end

    return rounds
end

sha256_str_vals = [join([string(x, base = 16) for x in y]) for y in sha256_rounds()] # Stitch the bytes together into strings
sha256_num_vals_control = [parse(UInt256, x, base = 16) for x in sha256_str_vals] # Get the numerical value from the strings

OneSampleADTest(sha256_num_vals, Uniform()) # One sample Anderson-Darling test

And the result of the test:

One sample Anderson-Darling test
--------------------------------
Population details:
    parameter of interest:   not implemented yet
    value under h_0:         NaN
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-7

Details:
    number of observations:   10000
    sample mean:              8.73991847621225e75
    sample SD:                2.2742656031884893e76
    A² statistic:             Inf

To me this says that the produced hashes do not conform to a uniform distribution. Am I using the test incorrectly, or is my sample faulty? Thank you for your thoughts.

kelalaka avatar
in flag
your hash value stores $64*8 = 512$ bits, however, SHA-256 has 256-bits, define it as `hash::Array{UInt8} = Array{UInt8}(undef, 32) # 32-byte array`
fgrieu avatar
ng flag
I remember earlier similar claims that [a hash](http://eprint.iacr.org/2002/099) or [a block cipher](http://eprint.iacr.org/2003/003)'s output is not random. They [turned out to be wrong](https://eprint.iacr.org/2003/022). SHA-256's outputs (for distinct inputs prepared independently of the constants in SHA-256) are usable to validate statistical tests. Independently: the claim needs to be expressed independently of the Julia code, and include a description of the statistical test performed.
kelalaka avatar
in flag
Look at similarly [How to get an output of SHA-1 with first 2-bit are zeros?](https://crypto.stackexchange.com/q/83224/18298)
Meir Maor avatar
in flag
I voted to reopen even without the improvements fgrieu suggested. SHA256 will not fail a simple statstical test, I would try to test individual bits, and bit pairs, to convince myself it aproximate uniformty. If you insist on the test you applied, look at how you are converting to numeric the bug is very likely there.
Paul Uszak avatar
cn flag
This is actually really simple. Generate 1GB of stuff in counter mode and run `ent` on it. If it passes so be it. If it fails, then so does your code...
Score:2
ng flag

Again, we are not a code review site, especially for code in a language seldom used for cryptography. And there are obvious issues with the code:

  • sha256_num_vals_control is computed but not used, when presumably the intend was that it is.
  • I can see neither an attempt to normalize the generated material to interval $[0,1)$, nor an input to OneSampleADTest specifying a range.

I conclude the samples for OneSampleADTest are not formatted as expected for this test. Malformed in, garbage out.

Even if the samples were correctly formatted, cryptography would not care for bugs in OneSampleADTest in a certain version of Julia and the library used. It would care for a valid claim that SHA-256 output for distinct inputs prepared independently of the constants in SHA-256 can be distinguished from random. But such extraordinary claim would need extraordinary evidence. And as a preliminary, a description independent of the language and it's libraries.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.