Score:1

Is it possible that PracRand did not detect looping of the generator?

tf flag
Tom

I'm testing my own PRNG generator which should has period $2^{38}$ bytes. So after exactly $2^{38}$ bytes it should start repeat. But PractRand find no anomalies after $2^{39}$ bytes.

Could it be that PractRand wouldn't detect this, or I had miscount something and the generator does not loop after that number of bytes?

Paul Uszak avatar
cn flag
You ran it on 500 GB?
Tom avatar
tf flag
Tom
No, you can use numbers from your program and redirect them directly to PractRand in console. Example: python3 Mygenerator.py | ./RNG_test stdin. Then the PractRand starts and tests until it fails.
Paul Uszak avatar
cn flag
But you ran it over $2^{39}$ bytes?
Tom avatar
tf flag
Tom
@PaulUszak Yes, I do. It works now almost 3 days and still didn't fail. Which of course is not unusual, many generators can do this.
Score:2
ph flag

I have not yet witnessed a PRNG that didn't fail any of PractRand's tests after restarting its cycle, so most likely you are miscounting or misunderstanding how much data your generator should be able to produce.

Since 38 is neither a power of 2 nor a multiple of 8, I assume it to be miscounting.

An ideal generator with 38 bits of permanent state will produce 2^38 bytes only if each output is only one byte long. If, instead, each output is as long as the entire state, the full output will be 2^38×(38÷8)÷1024^3 = 1216 GiB.

Similarly, a 32-bit generator should output at most 2^32×(32÷8)÷1024^3 = 16 GiB before failing tests, and this is exactly what I've found when testing my own generators, as well as scaled-down versions of other generators.

If you're able to produce 2^38 bytes within a tolerable time, it should be easy to confirm whether or not the generator restarts after that point by simply comparing the next output against the very first output.

I can't give a better answer than this without seeing the code of the PRNGs.

Score:1
cn flag

I can't speak for your generator as I've not seen it. Is it cryptographically secure (as that's a little tricky to write and just because you can't recover the seed doesn't for one moment mean that others can't).

I am not surprised at all that PractRand doesn't detect a loop at >275 GB (See notes). I have no direct experience of PractRand, but irrespective of it's low pedigree, all of the 'standard' tests have problems. NIST's STS has a very narrow set of internal statistics which severely limits it's acceptable sample sizes. diehard has the infamous sums test and other weak tests. Recently dieharder has been found to have Kolmogorov–Smirnov test biases (~ 8 TB samples). And it's limited to ~250 GB of data. ent is missing several p values entirely. FIPS 140 is quite weak. Test U01 has to be complied with parameters that can be tweaked (why?). PractRand won't be any different, especially considering the limited number of developers working on it.

In summary, non of the available test suites are perfect and randomness is pesky. This is what we currently have though. I would suggest using another test suite for samples <275 GB and compare. Best of three runs is recommended. 275 GB of key material stretched out of one seed should be sufficient for most use cases anyway


Notes:

For testing the PractRand tests, simply generate 100 GB from /dev/urandom, copy it and concatenate so forming a roll over. See what happens for you.

I've just diehardered a concatenated file as 2 x 10 GB from /dev/urandom and it passed with two WEAKs:-

   sts_serial|   6|    100000|     100|0.99995833|   WEAK
  diehard_dna|   0|   2097152|     100|0.99637872|   WEAK

C'est la vie.

Tom avatar
tf flag
Tom
So it is possible to cheat this kind of testing tool. I will wait if the generator fails the tests after 2^40 bytes. That would mean the entire sequence was looped 4 times. Anyway it looks like, it can be looped many times as long as the entire sequence has perfect statistical properties, it can loop multiple times before statistical deviations will be detected.
Paul Uszak avatar
cn flag
@Tom Yes it seems so. But that's perhaps not surprising. The scan window on these tests just isn't large enough for massive data samples like yours. Also consider, `dieharder` rewinds data files if they're smaller than ~250 GB. My example above (2 x 10 GB) was rewound 11 times yet passed. So that's a scan window of <10 GB.
Tom avatar
tf flag
Tom
Yes, I forgot about that rewinds - that's true. I also noted passing tests with repetitions in dieharder. So in PractRand it could be the same. I will test a generator with a smaller state to estimate when they might fail.
Tom avatar
tf flag
Tom
After testing an 8-bit generator with a period of $2^{20}$ it failed in PractRand after $2^{24}$ tested 8-bit numbers. So it looped the stream several times before problems were detected. The same can be expected for the 16-bit generator I tested. I thought this software had loop detectors built in - but it doesn't.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.