Score:1

Can I use a cryptographic hash function such as sha256 for Randomness Extraction

mc flag

I want to transform a semi random input to a shorter, uniformly random bit string. Assuming there is enough entropy in the semi random input, can I use a collision resistant hash function to extract randomness?

Update
After the first answer pointed out that the body of the question is not correctly asking the intended question, I didn’t want to edit the question, so that the answer given wouldn’t be confusing for the other readers. But the linked question, that is said to be duplicate, definitely does not answer my question.

kodlu avatar
sa flag
Does this answer your question? I believe it does with detailed analysis of collision resistance and *other* hash function properties, if present. [Can we assume that a hash function with high collision resistance also means a highly uniform distribution?](https://crypto.stackexchange.com/questions/70707/can-we-assume-that-a-hash-function-with-high-collision-resistance-also-means-a-h)
Marc Ilunga avatar
tr flag
Collision resistance is not enough. Another example, $H(x) = x$ for 128 bits inputs. A hash function can be an extractor if you are willing and comfortable to see it as a random oracle. Which technically SHA256 is not in the general case. Alternatively, the HMAC construction is shown to be indifferentiable from a RO (even with fixed salt). So that's might be better assumption-wise.
fgrieu avatar
ng flag
To all who closed the question: the designated duplicate only answers the question in the body, not the question in the title; which is what the OP wanted to ask, if you look at comments to my answer.
Marc Ilunga avatar
tr flag
I also agree with fgrieu that the linked answer isn't relevant for the title of the question.
fgrieu avatar
ng flag
I posted a [new question](https://crypto.stackexchange.com/q/102928/555) hopefully asking what the OP wanted to ask in the title of the present question.
Score:1
ng flag

It's asked different things in title and body, leading to radically different answers.

Can I use a cryptographic hash function such as sha256 for Randomness Extraction ?

Yes for standard cryptographic hash functions including SHA-256, any SHA-2 or SHA-3 function, Blake2b (even SHA-1 or MD5 if one does not care giving a poor first impression), and any of their truncation. They are suitable to transform a semi random input into a shorter, uniformly random bit string, assuming there is enough entropy in the semi random input, and assuming that this input is not prepared with knowledge of hash function. To illustrate why that last condition is required: if the input $x$ was filtered or tweaked so that $\operatorname{SHA-256}(x)$ has it's first bit $0$, then $\operatorname{SHA-256}(x)$ would be easily distinguishable from random.


I want to transform a semi random input to a shorter, uniformly random bit string. Assuming there is enough entropy in the semi random input, can I use a collision resistant hash function to extract randomness?

No, in general. Collision resistance is not a sufficient property. E.g. the function defined by $F(x)=\operatorname{SHA-256}(x)\mathbin\|\operatorname{SHA-256}(x)$ is just as collision-resistant as SHA-256 is, but is not suitable for randomness extraction as it's output is easily distinguishable from random.

Collision resistance is not a necessary property either. E.g. SHA-256 truncated to it's first 8 bytes is not collision-resistant, but is suitable for randomness extraction.


Security is met if the entropy extraction function is a random member of a computationally secure Pseudo-Random Function Family (PRF).

Again, it's critical that the "semi random input" is not somewhat biased with knowledge related to the function used for randomness extraction. For example: no matter how large and random most of the input is, if one input bit is prepared with knowledge of the rest of the input and of the otherwise ideal extraction function, then it's easy to get the first bit of the output equal to 0 with probability near 75%.


Update following comment:

I found some (direct academic source of the "yes") for HMAC, but if the hash function is not keyed, would this still be the case?

Yes for any standard cryptographic hash function, if it's usable to instantiate a PRF (when the kind of length-extension property SHA-256 has does not matter). But I'm afraid I can't point a source, because my "not somewhat biased with knowledge related to the function used for randomness extraction" condition is both necessary, and hard to formalize other than as without knowledge of the key of a PRF, and there's no key in a hash. Hence the formal statement using HMAC, which has a key input and is an archetypal PRF.

We can rightly argue that the constants in the definition of a hash like SHA-256 in practice (and proof of the Merkle-Damgård construction used in SHA-256) plays the role of a key. It's widely believed that SHA-256 passes the PRF experiment when we use the 32-byte IV as key; and that's even provable under an ideal cipher model for the cipher underlying the Davies-Meyer compression function. But that's not part of the design goals of SHA-256, nor a very standard result, and we need lesser hypothesis to prove that HMAC is a PRF.

ONUR EREN ARPACI avatar
mc flag
Could you point me to a source for your ‘yes’ answer? That was also my intuition but, I couldn’t find any direct academic sources, I found some for HMAC but if the hash function is not keyed, would this still be the case?
fgrieu avatar
ng flag
@ONUR EREN ARPACI: I can't point a source; see update.
Marc Ilunga avatar
tr flag
@fgrieu, couldn't you make a case for a generic case that in the random oracle model we have an extractor using a hash function? So we could use lemma 2 of https://eprint.iacr.org/2010/264.pdf? As you point out, SHA2 is not really a good generic random oracle. Hence, we require HMAC that can be proven indefferentiable from a RO.
fgrieu avatar
ng flag
@Marc Ilunga: yes, a random oracle can easily be proven to be a perfect random extractor function in the question's sense. But any hash is distinguishable from a random oracle when you know the hash's definition (just submit any particular message and compare the outcome to what the hash's definition imply). Thus we need a formalization of "the random input is not somewhat biased with knowledge related to the function used for randomness extraction", and the standard such formalization is a keyed PRF with key unknown to what generates the input of the random extractor. And hashes have no key.
Marc Ilunga avatar
tr flag
@fgrieu, "just submit any particular message and compare the outcome to what the hash's definition imply", a true random function would be "vuneralbe" to this attack as well right? After all, even a random function is still deterministic for a fixed randomness. Correct me if I am wrong, but an indifferentiability (not indistinguishability) result on a given hash function would capture the fact that we could use it in place of a RO (modulo assumptions on the internals).
fgrieu avatar
ng flag
@Marc Ilunga: yes, a true random function is vulnerable. [Lemma 2 in your reference](https://eprint.iacr.org/2010/264.pdf#page=15) succinctly repels that with _"and independent from H"_, then discusses why that's necessary and what that means in no less than two paragraphs below the lemma. It ends up with _"model H as a family of random functions indexed by a key rather than as a single deterministic function"_, so we are essentially back to what HMAC with a fixed key does, only it's not named HMAC.
Marc Ilunga avatar
tr flag
@fgrieu, yeah, that's a good point regarding independence. But if the keying material can be assumed to be independent, my question remains then: If H was shown indifferentiable from a RO, can't it then be a good extractor? With that said, in practice, I would always use a salted HKDF.
Paul Uszak avatar
cn flag
Re: _"Security is met if..."_ There is no concept of security in randomness extraction functions. Absolutely anything works that provides a uniform(ish) output. XOR, bit dropping, CRC16, Pearson, MD5, anything that you've written yourself, e.t.c. It's one of the counter examples of why you **can't** roll your own algorithm in cryptography.
Marc Ilunga avatar
tr flag
@PaulUszak "Absolutely anything works that provides a uniform(ish) output", wouldn't this be a characterization of some notion of security?
fgrieu avatar
ng flag
@PaulUszak: an example of "semi random input" that no amount of bit dropping will turn "to a shorter, uniformly random bit string" is bits from a biased coin toss. And if I know that the output is conditioned by MD5, I can arrange that for "semi random inputs" with all bits truly random and a single apparently random other that I craft, the MD5 output consistently fails a thorough ENT test. The condition for robust randomness extraction when only "assuming there is enough entropy in the semi random input" is far from trivial!
Paul Uszak avatar
cn flag
@MarcIlunga I think that Mr. Grieu was suggesting that a cryptographically secure, non invertible extraction function is required. That's not the case. TRNG security comes from the underlying raw entropy stream. Entropy $\equiv$ security. The extractor function only has to change a wiggly wobbly distribution into a uniform one. E.g. von Neumann creates a 100% secure TRNG if the input is IID. HMACs are unnecessary.
Paul Uszak avatar
cn flag
Not really that hard. Left over hash Lemma, Piling up Lemma and von Neumann. I was generalising on the bit dropping, but there are papers featuring chaotic lasers that just take the lower bits. Didn't know that you were a fan of ENT. I've developed ent2 :-)
ONUR EREN ARPACI avatar
mc flag
Thank you for the discussion, this has been helpful. https://people.seas.harvard.edu/~salil/pseudorandomness/extractors.pdf In page 174 section 6.1.3, it says that you can’t have deterministic extractors for k-sources. And then there is this huge literature for deterministic extractors for specific source models (such as, https://link.springer.com/book/10.1007/978-3-642-14903-0 ) so even if the source is not adversarial (for example, radiation waves coming from sun), I am still not sure if using a specific cryptographic hash function solves the problem.
Maarten Bodewes avatar
in flag
I would like to add that many CSPRNG's (or, rather, the better named DRBG's) within NIST SP 800-90Ar1 are based on hash functions, see section 10.1: DRBG Mechanisms Based on Hash Functions. The hash function used is commonly the only cryptographic primitive used to create the function. So if you have a hash and don't need top performance, you can just create an official DRBG out of it, possibly including things as reseeding and whatnot. For top security, base it off of HMAC (i.e. a PRF, see the answer), one of the options.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.