Score:2

Randomness extraction from a Santha-Vazirani (semi-random) source

us flag

In a quest to better understand randomness extractors (in the context of TRNG post-processing), I read some papers about the von Neumann Extractor and Santha-Vazirani (SV-) sources. The von Neumann extractor is a simple algorithm that works on independent, biased sources such as a biased coin. However, available physical sources of randomness are imperfect and are biased and correlated. Santha and Vazirani define such a source, known as an SV-source.

The Santha-Vazirani paper apparently demonstrates that it is impossible to (deterministically) extract an almost-uniform random bit from an SV-source. I don't have the required mathematical background to understand the demonstration, but this claim really puzzles me. For example, I would expect a hash function to produce uniform random bits out of an SV-source.

What does this demonstration really mean? Is it impossible to create a good TRNG from one (this does not apply to several) biased and correlated source?

Paul Uszak avatar
cn flag
I don't care for people renaming commonly known sources after themselves. Ignore _"SV"_ and just call it "correlated" as everyone else does.
Score:2
cn flag

However, available physical sources of randomness are imperfect and are biased and correlated.

No they're not. Sources don't actually produce any entropy whatsoever. None. Imagine a Zener diode or ring oscillator circuit in front of you. They just sit there looking circuity.

The observer generates the entropy when sampling the source. The source doesn't. This leads to the concept of $ (\epsilon, \tau) $-entropy per unit time, where "$ (\epsilon, \tau) $-entropy is the amount of information generated per unit time, at different scales $\tau$ of time and $\epsilon$ of the observables." Effectively sample rate and resolution. That means the observer can infinitely vary the entropy rate of the source at their discretion, by changing either $\tau$ or $\epsilon$.

This then further leads to an asymmetrical two sided problem: How do we measure $H_{\infty}$ for correlated sources? There are two asymmetric solutions:-

  1. Try to determine $H_{\infty}$ for a correlated source with very low certainty.

  2. Adjust your $ (\epsilon, \tau) $ sampling regime to produce uncorrelated data with high certainty.

Virtually nobody does #1, and even NIST have said it's neigh on impossible (Kerry McKay's unguarded comments). I can't imagine what $H_{\infty}$ means practically in a correlated scenario. So do #2 as the overwelming majority do, obtain $H_{\infty}$ as $-\log_2{(p_{\text{max}})}$ and extract.

Therefore it is possible to create a good TRNG from a biased and correlated source.

The Santha-Vazirani paper apparently demonstrates that it is impossible to (deterministically) extract an almost-uniform random bit from a SV-source.

Not quite. It actually says " by contrast, we prove that there is no algorithm that can extract even a single unbiased bit from a semi-random source (in fact, no better than a 1 - $\delta$ biased bit). " This is established knowledge and appears in all manner of 'extractor' papers. It simply means that any extracted randomness will always have a 1 - $\delta$ bias. NIST just recommends that you keep it below $2^{-64}$ which is easy.


Ref.: Pierre Gaspard and Xiao-Jing Wang, Noise, chaos, and $ (\epsilon, \tau) $-entropy per unit time, PHYSICS REPORTS (Review Section of Physics Letters) 235, No. 6 (1993) 291—343. North-Holland.

DurandA avatar
us flag
Thanks. I started reading the Gaspard-Wang reference. In the case of a digitally sampled signal, is $\epsilon$ the symbol space (e.g. 2 for a binary signal)?
Paul Uszak avatar
cn flag
@DurandA I think they left it vague because it can be a few things. It could be ADC resolution in bits, or it could be in millivolts (these are related of course). In something like the `haveged` TRNG it is number of bit values. Essentially, it's the thing that you _"observe_" i.e. measure (that's not time as that's $\tau$).
DurandA avatar
us flag
Now I better understand your remark about high frequency sampling of correlated source in my [related question](https://crypto.stackexchange.com/q/92020/39499).
Paul Uszak avatar
cn flag
@DurandA Yes. Imagine throwing a fair die every 10 seconds. Sample it at 1 sample/s and you have almost perfect correlation. But sample it every 11 seconds, and you have an uncorrelated source. And it doesn't really matter that you go slower in the latter case, as in the former case your entropy rate would be very low anyway due to the autocorrelation.
DurandA avatar
us flag
"_we show that no bit extracting algorithm is uniformly good for every semi-random distribution with parameter $\delta$_": how can this possibly apply to hash functions?
Paul Uszak avatar
cn flag
@DurandA I can't mathematically - I'm a kinda experimental dude. There are many extractor papers out there with the maths. But hash functions are deterministic. They themselves do not introduce entropy. Imagine an 8 bit hash function fed by biased 8 bit raw entropy (biased so that all values > 63). Whilst the output domain will be uniformly spread at first glance, it will by necessity be missing a quarter of it's possible values as it's a 1:1 mapping (surjection actually but ignore that). You can reduce $\delta$, but not to absolute zero.
DurandA avatar
us flag
I did not realize that taking a longer biased input, e.g. hashing 1 Mb of biased and correlated random data to 256 bits would not produce a uniform output but rather close to uniform. The demonstration only applies to biased **and correlated** data right? Otherwise the data can be pre-processed by a von Neumann extractor. Thank you for your time.
Paul Uszak avatar
cn flag
@DurandA I've spent a while working on TRNGs and I believe that the easiest (and most secure) method is to adjust your sampling regime $(\epsilon, \tau)$ so that your samples are not correlated. vN then becomes irrelevant.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.