Score:2

A query regarding SHA256 output hash structure vs input entropy?

br flag

Given an Input string of N bytes where some bytes positions in the string are fixed/immutable (F Bytes) and rest of the bytes positions can contain any value as we want or are configurable/variable (V = N-F Bytes).

SHA256(SHA256(N)) = H (256 bits).

Now, Given an Input string of N bytes, the values of N, F, V and the positions which can change and which can't:

How do we calculate the probability/formula that for at least 1 assignment of values in V, the calculated H has k leading bytes as 0?

For eg: For a random input string of size N, N=80, F=40, V=40 (assuming the position information is also given) how do we know/calculate the probability that for at least 1 assignment of values in V first k bytes of H are 0?

I tried searching for some analysis on this but couldn't find any answer. Can someone please help?

Score:0
in flag

As previously mentioned, the location of the fixed byte or the number of them doesn't matter if we assume that the output is randomized.

Let's assume bits, so $v = 8 \cdot V$.

Now for one value, the chance of it starting with $k$ bits can be brought down to the chance that the first $k$ bits have a constant value. The size of the hash does not matter. So for one try this is just $1 \over 2^k$.

As the output is randomized, we can also conclude that the outputs are not related; each try has the same chance. In that case it is much like rolling dice, so the computation is similar to one minus the chance of not throwing a 6 in an amount of throws.

So that means that the probability is one minus the chance that the constant value of $k$ bits is not thrown:

$$1 - \bigg({{2^k-1} \over {2^k}}\bigg)^{2^v} = 1 - (1 - 2^{-k})^{2^v}$$

Now this seems daunting, but you can play around with (small) values using WolframAlpha.

Note that if $v$ is becomes larger than $k$ then the probability quickly goes towards 1, while it quickly moves to zero when $k$ becomes larger than $v$ - which makes sense, they are used as exponents after all.

As we assume that SHA-256 already randomizes the output, this seems to have nothing to do with entropy at all, a counter with size $v$ would work just as well as random input - better even since there is no chance of duplicates.

Maarten Bodewes avatar
in flag
Note that WolframAlpha tries to calculate the exact amount; if somebody has a nice approximation (e.g. in $\log_2$ form) then I'm all ears.
Score:0
sa flag

Really a comment but more room here for displaymath:

Regarding @MaartenBodewes request for a closed form expression, we can proceed as follows. Since $$(1-2^{-k})^{2^k}\rightarrow \exp(-1)$$ for even moderately large $k$ we can write $$(1-2^{-k})^{2^v}=[ (1-2^{-k})^{2^k} ]^{2^{v-k}}=\exp(-2^{v-k}) $$ so the required expression is $$ 1-\exp(-2^{v-k}) $$ which is extremely sensitive to the difference between $k$ and $v$.

Also, I think the remark about this question having nothing to do with input entropy is a bit too strong. This expression is independent of the input entropy since it essentially uses the random oracle model for the hash function. For a specific hash such as SHA, the entropy would matter but modelling it would be devilishly difficult, I think

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.