Score:1

Why is the output size exactly half the capacity for sha-3?

jp flag

For the SHA-3 family of hash functions, the output size $d$ is always chosen as $d=c/2$, i.e. exactly half the capcity. What is the rational for this?

Naively, I think that $d=c$ would make more sense because

  • The collision resistance seems to be $\min(d/2, c/2)$ and
  • The pre-image strength seems to be $\min(d, c)$.

So choosing $d=c$ would make attacks on the capacity equivalent to attacks on the output. What am I missing?

kelalaka avatar
in flag
Does this answer your question? [Security of Keccak/SHA3 against birthday attacks](https://crypto.stackexchange.com/questions/41413/security-of-keccak-sha3-against-birthday-attacks) and [Why has the sponge construction's generic collision finding attack a complexity of O(min(2^(-n/2) , 2^(-c/2)))?](https://crypto.stackexchange.com/q/12668/18298)
Simon avatar
jp flag
@kelalaka Not quite I think. Those questions only talk about collision-attacks. The answer by poncho explains why it is actually about preimage-attacks, which I had not realized before.
kelalaka avatar
in flag
I don't see pre-image in your question, here [a more coverage with quantum added](https://crypto.stackexchange.com/a/67677/18298)
Simon avatar
jp flag
@kelalaka I wasnt asking about collision attacks in particular. I was asking about the rational for choosing $d=c/2$ in the sha3 standard (because my own reasining seemed to indicate $d=c$ to be a better choice), I just edited the question to make that more clear. I was (incorrectly) only thinking about collision attacks. Anyway, the answer by poncho is completely satisfactory.
kelalaka avatar
in flag
The last link mentions the reasons of the NIST and later they correction on SHAKE series.
Score:4
my flag

Why is the output size exactly half the capacity for SHA-3?

Because collision attacks are not the only ones of interest; we also expect that, for preimage (and second preimage) attacks, the best attack is no better than the brute force (which takes an expected $2^n$ hash evaluations for an $n$-bit hash function).

For SHA-3, we have this alternative approach:

  • Select a large number of initial images $A_1, A_2, ..., A_k$ and compute the intermediate state when they are given to SHA-3 $\alpha_1, \alpha_2, ..., \alpha_k$

  • Select a large number of final images $B_1, B_2, ..., B_k$ and using the fact that the Keccak permutation is invertable, compute from the known final state (that outputs the target value) the needed intermediate state that would result in the final state; these intermediate states are $\beta_1, \beta_2, ..., \beta_k$

  • Search through the $\alpha_i$, $\beta_j$ intermediate states to find a pair $\alpha_x, \beta_y$ which agree in their capacity bits; call the xor of their rate bits $C$

If we find such a pair, then the hash of the message $A_x C B_y$ is the target value.

To have a good probability of finding such a pair, we would need to have $j \approx 2^{c/2}$.

So, to make this approach no easier than the brute force method would require $c \ge 2n$

Simon avatar
jp flag
That makes sense. I did not think about the Keccak function being invertible. Thanks for enlightening me.
Score:1
us flag

In addition to collision and preimage attacks, there is also the problem of length-extension attacks (which I think of as the "real" reason for this property of SHA3).

If you have a hash function $H$ then it is tempting to treat $H(k,m)$ as a MAC of $m$, with secret key $k$. Unfortunately, this does not lead to a secure MAC if you are using the previous generation (SHA1, SHA2) of hash functions.

Length extension attacks happen precisely because the hash function outputs its entire internal state. The idea behind length extension attacks is the following: if $m$ is a prefix of $m'$ then the output $H(k,m)$ occurs as the internal state when computing $H(k,m')$. In fact, since $H(k,m)$ is the entire internal state at some point, then you can compute $H(k,m')$ if you know $H(k,m)$ -- even if you don't know $k$! This violates the security property you would want from a MAC (learning the MAC of $m$ shouldn't help you predict the MAC of a different $m'$, even if $m$ is a prefix of $m'$). (Here I am glossing over the issues of length-padding, which don't present a significant barrier to the kind of attack I am sketching.)

During the SHA3 competition, most submissions were designed to be resistant to length-extension attacks. The way to do this is with a so-called "wide pipe" construction: simply make the internal state larger than the output. Put differently, the hash should only output part of its internal state at the end of the computation. If you do that, then $H(k,m)$ will not contain everything necessary to compute $H(k,m')$, and this thwarts the length-extension attack.

pe flag
To prevent length extension with at most $2^n$ effort you only need $n$ capacity bits, not $2n$. The $2n$ capacity on SHA-3 was most definitely to preserve the $2^n$ (second) preimage security of a random oracle with $n$-bit output length.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.