Score:2

Hash functions, bijectiveness, and entropy

cn flag

For those who don't know, a bijective function is one for which each input yields one and only one output. A block cipher, for example, is guaranteed to be bijective or you could not decrypt.

When a hash function like SHA256 or SHA3 is used with an input the same length as its output, AFAIK this is not or at least should not be bijective. (Is that correct?)

If a hash is not bijective, does this mean that repeated hashing loses entropy?

Lets say you have 256 bits of entropy and you pass it through SHA256. Do you still have 256 bits of entropy? Lets say you SHA256 hash it a million times. What then?

It seems to me that the answer ought to be no, but then again wouldn't that create a problem for hash based cryptography?

Just a random question that popped into my head.

pe flag
This seems like a duplicate of https://crypto.stackexchange.com/a/15084
fgrieu avatar
ng flag
[Related](https://crypto.stackexchange.com/a/24672/555), but not a duplicate: for a hash onto the same set, the expected loss of entropy hashing a uniformly random value quickly converges to 0.827245389153… bit for the first hash as the set size increases. That's the only mildly relevant and not well known constant I ever derived.
Score:3
pe flag

This is already answered here for 1 iteration, and here for many iterations, but since the latter answer presents an heuristic argument I'll leave here Lemma 4 of Functional Graphs and Their Applications in Generic Attacks on Iterated Hash Constructions which gives a good approximation based on Theorem 2 of Random Mapping Statistics, and (3.10) of On the height of trees:

Lemma 4. Let $f$ be a random mapping in $\mathcal{F}_N$. Denote $N = 2^n$. For $k \le 2^{n/2}$, the expectation of number of k-th iterate image nodes in the functional graph of $f$ is $$ (1 - \tau_k)\cdot N \approx \left(\frac{2}{k} - \frac{2}{3}\frac{\log k}{k^2} - \frac{c}{k^2} - \dots \right) \cdot N\,. $$ It suggests that $\lim_{k \to \infty} k\cdot (1 - \tau_k) = 2$. Thus, $$ \lim_{N \to \infty, k \to \infty, k \le \sqrt{N}}(1 - \tau_k)\cdot N \approx 2^{n - \log_2 k + 1}\,, $$ where $\tau_k$ satisfies the recurrence $\tau_0 = 0, \tau_{k+1} = e^{-1+\tau_k}$, and $c$ is a certain constant.

So while yes, there is some entropy loss due to repeated iteration of a random function, this loss grows only logarithmically on the number of iterations. To lose, say, $32$ bits of entropy, you need to iterate the hash around $2^{32}$ times. For large output lengths such as $256$ bits, this kind of loss is negligible for almost all practical purposes.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.