I have seen multiple sources claim that the Merkle-Damgård transform is able to build a collision-resistant Hash-function $H$ for arbitrary-length inputs from a compression function $h : \{0,1\}^n \to \{0,1\}^\ell$ (with constant-length inputs). See, e.g., [1].
However, the construction relies on suitably padding the input $x$ to $H$ in order to:
- Ensure that the length of the string $x_{\text{pad}}$ obtained after padding $x$ is a multiple of some parameter $0 < \ell' < \ell$. This means that $x_{\text{pad}}$ can be split into blocks of $\ell'$ bits.
- Ensure that if two inputs $x,x'$ of different lengths are padded, then the last block of $x_{\text{pad}}$ differs from the last block of $x'_{\text{pad}}$.
The second property is used in the proof that $H$ is collision-resistant (if $h$ is collision resistant). However, the second property (and hence the proof) already breaks when we consider the set of all input strings $x$ of length at most $2^\ell$. This is because the $\ell'$ bits in the last block can only encode $2^{\ell'}$ distinct lengths.
I suspect that one can come up with more clever schemes to extend the Merkle-Damgård transform to handle any input in $\{0,1\}^*$ (perhaps use the transform recursively until a large-enough block length is available to store the length of $x$). I would like to known whether there is any standard approach to achieve the same result.
[1] Jonathan Katz, Yehuda Lindell. Introduction to Modern Cryptography (3rd edition). CRC Press. ISBN 9781351133012. Section 6.2 (pp. 170-172).