Score:5

HKDF randomness extraction - salt or no salt?

es flag

According to the HKDF paper, the use of a salt serves two purposes: domain separation and randomness extraction.

This question is solely about the necessity of a salt for the purposes of randomness extraction.

The HKDF paper states:

a salt value (i.e., a random but non-secret key) ... is essential to obtain generic extractors and KDFs that can extract randomness from arbitrary sources with sufficiently high entropy.

The Randomness Extraction and Key Derivation paper (linked to by the HKDF paper) states:

In addition, the "monolithic" randomness assumption on a single (unkeyed) function such as SHA-1 is inappropriate for the setting of randomness extraction as no single function (even if fully random) can extract a close-to-uniform distribution from arbitrary high-entropy input distributions. This is so, since once the function is fixed (even if to purely random values) then there are high-entropy input distributions that will be mapped to small subsets of outputs. Therefore, the viable approach for randomness extraction is to consider a family (or collection) of functions indexed by a set of keys. When an application requires the hashing of an input for the purpose of extracting randomness, then a random element (i.e., a function) from this family is chosen and the function is applied to the given input. While there may be specific input distributions that interact badly with specific functions in the family, a good randomness-extraction family will make this "bad event" happen with very small probability.

The last question is how to generate the random known keys used by the extractor. Technically this is not hard, as the parties can generate the appropriate randomness, but the exact details depend on the application. For example, in the DH key exchange discussed earlier, the parties exchange in the clear randomly chosen values, which are then combined to generate a single key [salt] for the extractor family (e.g. HMAC-SHA1).

The HKDF paper states:

the Merkle-Damgard family [used in the design of many popular hash algorithms such as MD5, SHA-1 and SHA-2] built on random compression functions is not a good statistical extractor... the output of such family on any distribution for which the last block of input is fixed is statistically far from uniform)

It then echos the point made in the Randomness Extraction and Key Derivation paper:

As we have already stressed in previous sections generic extractors, i.e., those working over arbitrary high min-entropy sources, must be randomized via a random, but not necessarily secret, key (or “salt”). In particular, the example following Lemma 2 shows that for every deterministic extractor there is a high-entropy source for which the output is far from uniform.

Contradicting these statements, I know of an ECDH implementation that uses the Keccak-256 hash on the shared secret EC group element with no salt, but perhaps this is justified either A) on the grounds that the choice of a sponge-construction hash alleviates concerns that afflict Merkle-Damgard family hashes, or B) on the grounds that an EC compressed point representation is sufficiently uniformly random (compared to the uniformity of randomness of a shared secret in non-EC DH).

Under what circumstances is a salt necessary? (for which types of input key material, and for which types of HMAC-Hash function).

Are these papers out of date, and have the concerns raised by these papers about the use of a salt been alleviated to any extent through more thorough research into the nature of modern hash functions?

kelalaka avatar
in flag
[The standard security reduction for HKDF applies to an adversary who can query HKDF-Expand for many info strings adaptively, with the theorem parametrized by the number of queries, so your proposed use falls squarely within the intended and studied use of HKDF. This applies whether or not you use a salt.](https://crypto.stackexchange.com/a/59070/18298)
Score:1
cn flag

Terminology is important here. A cryptographic salt's main purpose is to secure passwords during reuse and avoid hash pre-computation. So yes, that provides your domain separation. But your question is about randomness extraction from arbitrary sources i.e including devices.

  1. NIST's SP800 90B "Recommendation for the Entropy Sources Used for Random Bit Generation" makes no concrete recommendation as to what type of extractor or configuration must be used. You could use MD5 as suggested in your first paper. Or roll your own exotic type as long as you compute the output entropy to a bias of $< 2^{-64}$.

  2. A salt can be public, therefore it can be reused as it's known. Remember that you are not securing passwords but extracting from entropy sources that supply (admittedly non uniform) but always random sources. The salting requirement is therefore made redundant by the input entropy stream.

  3. I do not accept that fixing the final block into a Merkle-Damgard architecture implies that need for salting and a HKDF. Yes they prove the non uniformity of the subsequent output, but that's an attack scenario in my view. And if an adversary can feed a block of all zeros into your extractor, you need to secure the room better. Simple padding is catered for in the I/O entropy calculation (often the Left Over Hash lemma). Your paper actually says as much, promoting a computational indistinguishability argument.

  4. Commercial TRNGs do not randomly salt specifically. Any salt would be reused in a production run anyway.

  5. Philosophically, virtually all randomness extractors have an embedded salt in their initialisation vectors. Look at the large number block in a SHA2 implementation. Or a Toeplitz matrix.

So I concur with kelalaka's comment to you. Don't need a salt.

knaccc avatar
es flag
I think that essentially you're saying that you disagree with the paper's statement that "there are high-entropy input distributions that will be mapped to small subsets of outputs" when a modern hash is used. This makes intuitive sense, in that it amounts to something approximating finding collisions in a hash function by meddling with a high-entropy source.
Paul Uszak avatar
cn flag
@knaccc I guess so. You've cited three differing examples (your two papers and your personal experience of ECDH). There's often disparities between mathematical /academic approaches and real world TRNG extractors.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.