Is there any standard extension of the Merkle-Damgård transform that handles arbitrary-length inputs?

Steven

2/18/24, 3:08 PM

I have seen multiple sources claim that the Merkle-Damgård transform is able to build a collision-resistant Hash-function $H$ for arbitrary-length inputs from a compression function $h : \{0,1\}^n \to \{0,1\}^\ell$ (with constant-length inputs). See, e.g., [1].

However, the construction relies on suitably padding the input $x$ to $H$ in order to:

Ensure that the length of the string $x_{\text{pad}}$ obtained after padding $x$ is a multiple of some parameter $0 < \ell' < \ell$. This means that $x_{\text{pad}}$ can be split into blocks of $\ell'$ bits.
Ensure that if two inputs $x,x'$ of different lengths are padded, then the last block of $x_{\text{pad}}$ differs from the last block of $x'_{\text{pad}}$.

The second property is used in the proof that $H$ is collision-resistant (if $h$ is collision resistant). However, the second property (and hence the proof) already breaks when we consider the set of all input strings $x$ of length at most $2^\ell$. This is because the $\ell'$ bits in the last block can only encode $2^{\ell'}$ distinct lengths.

I suspect that one can come up with more clever schemes to extend the Merkle-Damgård transform to handle any input in $\{0,1\}^*$ (perhaps use the transform recursively until a large-enough block length is available to store the length of $x$). I would like to known whether there is any standard approach to achieve the same result.

[1] Jonathan Katz, Yehuda Lindell. Introduction to Modern Cryptography (3rd edition). CRC Press. ISBN 9781351133012. Section 6.2 (pp. 170-172).

0 + 4

collision-resistance

merkle-damgaard

compression-function

Maarten Bodewes

2/18/24, 3:43 PM

"Ensure that if two distinct inputs $$, $′$ are padded, then their last blocks differ." We're talking about the input to $\text{H}$ right, the input to the collision resistant hash function itself? Are you talking about a different configuration or are you talking about the input of block of the compression function $\text{f}$? Because it makes sense for the latter, but if $x$ is the input of $\text{H}$ then no, the the difference will be in the state $s$ which is another input to $f$, unless the difference is in the last block. Or is the input $x$ by definition the last block?

Steven

2/18/24, 3:50 PM

I'm sorry for the confusion, I meant to talk about the lenghts of $x$ and $x'$. To clarify: the strings $x$ and $x'$ are the inputs to $H$. But the last block $B$ (resp. $B'$) of the padded version of $x$ (resp. $x'$) will be (part of) the input to the compression function $h$ in the final step of the transform and it encodes the length $|x|$ of $x$ (resp.the length $|x'|$ of $x'$). The required property is that if $|x| \neq |x'|$ then $B \neq B'$. I'll edit the question.

kodlu

2/18/24, 10:22 PM

I suspect that any possible "more clever schemes to extend the Merkle-Damgård transform to handle any input length" will not be worth it in terms of efficiency gain and be devilishly difficult to analyze. After all MD has been around for a very long time and cryptography is a fast moving field where there is a lot of incentive to find improvements and generalizations to existing primitives.

Steven

2/19/24, 8:48 AM

@kodlu, that may very well be the case. However, my interest is more of theoretical nature.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Is there any standard extension of the Merkle-Damgård transform that handles arbitrary-length inputs?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.