Score:0

Sponge construction versus Merkle-Damgard For Hashing based on the very same primitive

cn flag

I am a bit confused about Sponge construction and Merkle-Damgard-style ones for hashing. The only advantage I see for sponge construction is that they are secure against length extension attacks. So if the application does not mind about such an attack, Merkle-Damgard should be better, Is that the case? My intuition is that to hash long messages with sponge ones every block message has to be smaller (due to the rate of the absorption) while in Merkle-Damgard they can be longer and so hashing is faster.

Let's consider the cipher/permutation MiMC where the length of its input is set to $n$. One can use the permutation form of MiMC in the sponge structure. Thus each call to the permutation absorbs a message-block of size $r<n$ where $r$ is the rate of the absorption in the sponge. If we use the cipher form of MiMC in the Merkle-Damgard structure then each call to the block-cipher absorbers a message-block of $n$ (e.g., by Miyaguchi–Preneel for generating the compression function). This means hashing of the long messages is faster with Merkle-Damgard (when the same cipher/permutation is used in MD/Sponge).

Putting together, if the length extension attack is not a concern (like in the integrity of computations, SNARK), then the Merkle-Damgard construction seems a better choice (where the cipher and permutation used in MD and Sponge are the same/counterpart like the above example for MiMC). Is the above observation valid?

note: For MiMC the permutation is the same as cipher where the key is set to all zero bits (https://eprint.iacr.org/2016/492.pdf)

poncho avatar
my flag
"My intuition is that to hash long messages with sponge ones every block message has to be smaller (due to the rate of the absorption) while in Merkle-Damgard they can be longer and so hashing is faster" - well, with SHA-256, you do a MD compression once every 64 bytes, while with SHA3-256, you do a permutation once every 136 bytes. At least in this instance, your intuition doesn't line up (and of course, we would need to talk about the relative performance of a SHA-256 compression function vs a Keccak permutation)
cn flag
I tried to clarify the question a bit more, the aim is not to compare any MD versus any sponge-based hashes. But the case that both are based on the same permutation/cipher like the example for MiMC such that the cipher or permutation absorbs the same number of bits in both. so a MiMC hash generated via Miyaguchi–Preneel and MiMC cipher is faster than MiMC hash based on the sponge and MiMC permutation.
Marc Ilunga avatar
tr flag
The clarification that you require both be built out of the same primitive should perhaps appear in the first paragraph or so (even maybe the title). With those constraints in place, another advantage of sponge based construction is their indifferentiability from random oracles. Many applications end up needing ROs rather than basic collision resistance or others. MD hashes don't provide this out of the box, unless they are restricted in some way. Length extension is an attack denying RO properties but the question doesn't say that there are no other things to worry about.
Marc Ilunga avatar
tr flag
Another thing to note:there are parallel mode of operatioms for sponges, so if that's take into consideration, it's not clear why MD should be absolutely faster.
Score:1
fr flag

It's really hard to make absolute statements about performance in this case, because it depends on the environment. Let's look at a couple of common hash functions: SHA-2 variants, SHA-3 variants, and BLAKE2b.

All three of these use different constructions: SHA-2 is Merkle-Dåmgard, SHA-3 is a sponge, and BLAKE2b is HAIFA-like. In SHA-256, we process 64 bytes at a time; in SHA-3-256, we process 136 bytes at a time; and in BLAKE2b, we process 128 bytes at a time. By "at a time", we mean in each iteration of the compression function, which in SHA-3 is the Keccak permutation.

However, which algorithm is the fastest depends on the implementation. On many modern machines, SHA-256 is implemented in hardware, and it is the fastest choice. When we're working entirely in software, BLAKE2b tends to be the fastest choice because it's internal operations are highly parallelizable, much like the ChaCha stream cipher on which it's based. SHA-3-256 is usually the slowest algorithm in terms of cycles per byte in software, even considering the fact that it processes the most data at once. That's in part because it has the largest state and therefore it typically does more work to adequately mix the data.

Ironically, in software, SHA-384 and SHA-512 are typically faster for most messages than SHA-256 on 64-bit machines because they process twice as much data at once with 5/4 the operations, but that's because the internal operations are almost completely identical, just on different word sizes (and with different constants). This is not true on modern x86-64 machines because SHA-256 is hardware accelerated and SHA-512 is not.

Overall, it's really hard to make a 100% judgment about performance without actually measuring. Any of SHA-2, SHA-3, or BLAKE2 are sound, acceptable choices if you don't need to protect against length-extension attacks. For cases where you do, I like BLAKE2b because it's generally very fast on 64-bit machines and secure. However, sometimes people need to pick from only certain approved algorithms (e.g., FIPS), and so SHA-3 is a good choice. If performance matters to you, construct a performance test on real data in your workload to determine which of the secure options works for you. However, don't use MD5 or SHA-1, since they're insecure.

samuel-lucas6 avatar
bs flag
I think it's worth mentioning that quite a few machines don't support SHA-NI instructions for SHA-256 compared to AES-NI. There's also BLAKE3, KangarooTwelve, and SHAKE to consider. SHAKE is popular in [standards](https://csrc.nist.gov/csrc/media/Events/2023/lightweight-cryptography-workshop-2023/documents/accepted-papers/03-proposals-for-standardization-of-ascon-family.pdf) like Ed448, BLAKE3 is seeing more [usage](https://github.com/BLAKE3-team/BLAKE3#adoption--deployment), and KangarooTwelve has an [Internet Draft](https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-kangarootwelve).
cn flag
I tried to clarify the question a bit more, the aim is not to compare any MD versus any sponge-based hashes. But the case that both are based on the same permutation/cipher like the example for MiMC such that the cipher or permutation absorbs the same number of bits in both. so a MiMC hash generated via Miyaguchi–Preneel and MiMC cipher is faster than MiMC hash based on sponge and MiMC permutation.
bk2204 avatar
fr flag
It's going to be very hard to effectively implement different structures based on the same underlying primitive. Typically sponges are unkeyed random permutations and Merkle-Damgård tends to use block ciphers instead, which are keyed. You simply can't compare them in that way.
Score:0
cn flag

So here is what I have found:

For sponge structures with permutation size n, rate r, and output t-bits, the security level is $\min(t/2,(n-r)/2)$-bits.

For Miyaguchi–Preneel with permutation size n, the security level is $n/2$-bits.

For a fixed security level $n/4<s<n/2$, and messages of size $m$, in sponge construction we need $m/r$ calls to the permutation during absorption and $2s/r$ squeezing, while in Miyaguchi–Preneel we need $m/n$ calls.

For example, for 100 security bits, the sponge structure needs $m/56+4$ calls to the permutation of size $n=256$, and Miyaguchi–Preneel needs $m/256$ calls.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.