According to the HKDF paper, the use of a salt serves two purposes: domain separation and randomness extraction.
This question is solely about the necessity of a salt for the purposes of randomness extraction.
The HKDF paper states:
a salt value (i.e., a random but non-secret key) ...
is essential to obtain generic extractors and KDFs that can extract randomness from arbitrary sources with sufficiently high
entropy.
The Randomness Extraction and Key Derivation paper (linked to by the HKDF paper) states:
In addition, the "monolithic" randomness assumption on a single
(unkeyed) function such as SHA-1 is inappropriate for the setting of
randomness extraction as no single function (even if fully random)
can extract a close-to-uniform distribution from arbitrary
high-entropy input distributions. This is so, since once the
function is fixed (even if to purely random values) then there are
high-entropy input distributions that will be mapped to small subsets
of outputs. Therefore, the viable approach for randomness extraction
is to consider a family (or collection) of functions indexed by a set
of keys. When an application requires the hashing of an input for the
purpose of extracting randomness, then a random element (i.e., a
function) from this family is chosen and the function is applied to
the given input. While there may be specific input distributions that
interact badly with specific functions in the family, a good
randomness-extraction family will make this "bad event" happen with
very small probability.
The last question is how to generate the random known keys used by the
extractor. Technically this is not hard, as the parties can generate
the appropriate randomness, but the exact details depend on the
application. For example, in the DH key exchange discussed earlier,
the parties exchange in the clear randomly chosen values, which are
then combined to generate a single key [salt] for the extractor family
(e.g. HMAC-SHA1).
The HKDF paper states:
the Merkle-Damgard family [used in the design of many popular hash algorithms such as MD5, SHA-1 and SHA-2] built on random
compression functions is not a good statistical extractor... the output of such family on any distribution for which the
last block of input is fixed is statistically far from uniform)
It then echos the point made in the Randomness Extraction and Key Derivation paper:
As we have already stressed in previous sections generic extractors,
i.e., those working over arbitrary high min-entropy sources, must be
randomized via a random, but not necessarily secret, key (or “salt”).
In particular, the example following Lemma 2 shows that for every
deterministic extractor there is a high-entropy source for which the
output is far from uniform.
Contradicting these statements, I know of an ECDH implementation that uses the Keccak-256 hash on the shared secret EC group element with no salt, but perhaps this is justified either A) on the grounds that the choice of a sponge-construction hash alleviates concerns that afflict Merkle-Damgard family hashes, or B) on the grounds that an EC compressed point representation is sufficiently uniformly random (compared to the uniformity of randomness of a shared secret in non-EC DH).
Under what circumstances is a salt necessary? (for which types of input key material, and for which types of HMAC-Hash function).
Are these papers out of date, and have the concerns raised by these papers about the use of a salt been alleviated to any extent through more thorough research into the nature of modern hash functions?