TL;DR: My simple ChaCha-based sponge PRNG is getting "unusual" evaluation from PractRand test battery pretty reliably, sometimes even within the first GB; I'm trying understand why.
I was in need of a fast, non-cryptographic PRNG, preferably one without a bloated state, or one that would use a portion of its internal state as the output buffer. One of my initial ideas was to use a low-round count ChaCha, but N=4 is considered insufficient (cf. https://cr.yp.to/snuffle/diffusion.html, althogh ChaCha diffusion is much better than Salsa's) and N>6 proved to be a bit too slow. Additionally, an argument against pure ChaCha is that for each chunk of data (64B) the whole state has to be recreated instead of just reshuffled (which means twice the memory footprint).
This prompted me to look into creating a ChaCha-based sponge, as shown in the image below:
The principle is simple - after the initial shuffle, 128 bits of state are kept hidden (first row), while the remaining 384 bits (48B, 12 words) is used as PRNG output. When the current state is depleted, another N rounds are applied and the process is repeated.
The construction worked well in our setting, expectedly beating comparable "normal" ChaCha implementation, but I was surprised, when I learned that (at least with some seeds) it is getting "unusual" evaluation from PractRand, even for values of N that are considered safe for ChaCha.
For instance:
N = 2
RNG_test using PractRand version 0.95
RNG = RNG_stdin, seed = 00112233
test set = core, folding = standard(unknown format)
rng=RNG_stdin, seed=00112233
length= 8 gigabytes (2^33 bytes), time= 43.7 seconds
Test Name Raw Processed Evaluation
[Low1/32]Gap-16:A R= +5.4 p = 4.8e-4 unusual
...and 299 test result(s) without anomalies
N = 4
RNG_test using PractRand version 0.95
RNG = RNG_stdin, seed = 00112233
test set = core, folding = standard(unknown format)
rng=RNG_stdin, seed=00112233
length= 1 gigabyte (2^30 bytes), time= 6.8 seconds
Test Name Raw Processed Evaluation
[Low4/32]BCFN(2+0,13-3,T) R= -7.3 p =1-5.0e-4 unusual
...and 250 test result(s) without anomalies
N = 6
RNG_test using PractRand version 0.95
RNG = RNG_stdin, seed = 77777777
test set = core, folding = standard(unknown format)
rng=RNG_stdin, seed=77777777
length= 2 gigabytes (2^31 bytes), time= 12.4 seconds
Test Name Raw Processed Evaluation
[Low1/8]DC6-9x1Bytes-1 R= -4.6 p =1-2.9e-3 unusual
[Low4/32]Gap-16:B R= +4.7 p = 6.9e-4 unusual
...and 267 test result(s) without anomalies
N = 8
RNG_test using PractRand version 0.95
RNG = RNG_stdin, seed = 00112233
test set = core, folding = standard(unknown format)
rng=RNG_stdin, seed=00112233
length= 2 gigabytes (2^31 bytes), time= 12.2 seconds
Test Name Raw Processed Evaluation
[Low1/8]Gap-16:A R= -4.4 p =1-3.9e-4 unusual
...and 268 test result(s) without anomalies
This doesn't apply to every seed w/ every N (for instance N=6 with seed=00112233 went up to 2^38 without a hitch), but it's quite easy to come up with one that fails. Still, if this was purely due to low quality of the seed, I'd expect it to fail early on, before the diffusion makes the state indistinguishable from random.
Question
The construction since has been replaced with JSF32 in our application (with comparable performance to ChaCha sponge with N=2), but it left me wondering which part of the construction is responsible for the "failed" tests; so this question is mostly academic. My understanding is that for N=8, ChaCha internal function should be indistinguishable from a random permutation, and a random permutation can be used to create a PRNG via a sponge.
One guess is that I'm hitting some short permutation cycle, but then, from the examples above, I have found at least 2 "short cycles" for double rounds with cycle lengths under 2^30.
Final notes:
- k = 20, but it doesn't really matter. Any value that is a multiple of N is equivalent to skipping k/N rounds of output.
- The seeds I posted here are ones that I came up with while writing this question; in reality I've tested many more and the results are reproducible.
- The tests "failing" are usually of the following classes:
- Gap-16
- BCFN
- DC6-9x1Bytes-1
- I know that "unusual" is not strictly "FAIL", but it's appearing way too regularly to ignore.