Score:0

Crypto

SHA-256 doesn't follow a uniform distribution?

Bob

3/26/23, 8:07 PM

I have been playing with SHA-2-256 in Julia and I noticed that the hashes produced don't appear to follow a uniform distribution. My understanding of secure hashing algorithms is that they should approximate a uniform distribution well, so they are not predictable.

Here is the Julia code I'm using:

using BitIntegers, Distributions, HypothesisTests, Random, SHA

function sha256_rounds()
    rounds::Array{Array{UInt8,1}} = Array{Array{UInt8,1}}(undef, 10000) # 10000 Samples
    hash::Array{UInt8} = Array{UInt8}(undef, 64) # 64-byte array

    for i = 1:10000
        hash = sha2_256(string(rand(UInt64), base = 16)) # Random number, convert to hex string, then seed
        rounds[i] = hash
    end

    return rounds
end

sha256_str_vals = [join([string(x, base = 16) for x in y]) for y in sha256_rounds()] # Stitch the bytes together into strings
sha256_num_vals_control = [parse(UInt256, x, base = 16) for x in sha256_str_vals] # Get the numerical value from the strings

OneSampleADTest(sha256_num_vals, Uniform()) # One sample Anderson-Darling test

And the result of the test:

One sample Anderson-Darling test
--------------------------------
Population details:
    parameter of interest:   not implemented yet
    value under h_0:         NaN
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-7

Details:
    number of observations:   10000
    sample mean:              8.73991847621225e75
    sample SD:                2.2742656031884893e76
    A² statistic:             Inf

To me this says that the produced hashes do not conform to a uniform distribution. Am I using the test incorrectly, or is my sample faulty? Thank you for your thoughts.

226

0 + 0

statistical-test

sha-256

sha-2

kelalaka

3/26/23, 8:31 PM

your hash value stores $64*8 = 512$ bits, however, SHA-256 has 256-bits, define it as `hash::Array{UInt8} = Array{UInt8}(undef, 32) # 32-byte array`

fgrieu

3/26/23, 8:41 PM

I remember earlier similar claims that [a hash](http://eprint.iacr.org/2002/099) or [a block cipher](http://eprint.iacr.org/2003/003)'s output is not random. They [turned out to be wrong](https://eprint.iacr.org/2003/022). SHA-256's outputs (for distinct inputs prepared independently of the constants in SHA-256) are usable to validate statistical tests. Independently: the claim needs to be expressed independently of the Julia code, and include a description of the statistical test performed.

kelalaka

3/26/23, 9:24 PM

Look at similarly [How to get an output of SHA-1 with first 2-bit are zeros?](https://crypto.stackexchange.com/q/83224/18298)

Meir Maor

3/28/23, 8:00 AM

I voted to reopen even without the improvements fgrieu suggested. SHA256 will not fail a simple statstical test, I would try to test individual bits, and bit pairs, to convince myself it aproximate uniformty. If you insist on the test you applied, look at how you are converting to numeric the bug is very likely there.

Paul Uszak

4/28/23, 10:39 PM

This is actually really simple. Generate 1GB of stuff in counter mode and run `ent` on it. If it passes so be it. If it fails, then so does your code...

Score:2

Crypto

fgrieu

3/28/23, 7:30 PM

Again, we are not a code review site, especially for code in a language seldom used for cryptography. And there are obvious issues with the code:

sha256_num_vals_control is computed but not used, when presumably the intend was that it is.
I can see neither an attempt to normalize the generated material to interval $[0,1)$, nor an input to OneSampleADTest specifying a range.

I conclude the samples for OneSampleADTest are not formatted as expected for this test. Malformed in, garbage out.

Even if the samples were correctly formatted, cryptography would not care for bugs in OneSampleADTest in a certain version of Julia and the library used. It would care for a valid claim that SHA-256 output for distinct inputs prepared independently of the constants in SHA-256 can be distinguished from random. But such extraordinary claim would need extraordinary evidence. And as a preliminary, a description independent of the language and it's libraries.