Score:2

NIST random number generators ASCII input file format problem

la flag

I have temporal data from experiment and by setting a threshold, I convert them to zeros and ones. I saved these binary bits to txt file (normal txt file from MATLAB) with each line has 32 bits (total number of bits is a little over 20 million).

When running the NIST test suite, I use ./assess 500000 with 40 bit streams. It keeps give me an error of igamc: underflow. Also tried ./assess 1000000 with 20 bit streams and same error happens.

I am doubting the file format (I am using ASCII when using the test suite). However, is it correct to save them directly as a text file from MATLAB or should some special characters be used for the test suite to recognize them as ASCII format?

I hope I can find some guidance here...

Score:2
ng flag

I am doubting the file format (I am using ASCII when using the test suite). However, is it correct to save them directly as a text file from MATLAB or should some special characters be used for the test suite to recognize them as ASCII format?

According to the nice manual §5.4.1

Files should contain binary sequences stored as either ASCII characters consisting of zeroes and ones, or as binary data where each byte contains eight bits worth of 0’s and 1’s.

Further, looking at readBinaryDigitsInASCIIFormat in src/utilities.c shows that whitespace is ignored when using the ASCII format.

Thus if a text editor shows that the files consist of 0, 1, space/tabs/newline only, their format should be probably OK for the ASCII setting.


I use ./assess 500000 with 40 bit streams. (…) Also tried ./assess 1000000 with 20 bit streams

It's told, e.g. at §2.14.7, that at least 1000000 bits per sequence is recommended for some tests (but we don't know which are run).


It keeps give me an error of igamc: underflow

That's generated by an internal function cephes_igamc in src/utilities.c computing the incomplete gamma function, as an input sanity check.

The most likely cause is that at at least one sequence tested fails extremely badly at least one of the tests that are run. It would be a good idea to restrict to the frequency/monobit test at first (which result can be manually checked from the line like BITSREAD = 1000000 0s = 500712 1s = 199288 in the diagnostic), and then try other tests to find which fails.

Another possible cause is that the test suite is miscompiled or misused. That can be diagnosed by using the 4 sequences data/data.sqrt3 data/data.sqrt2 data/data.pi data/data.e which should not cause this error for any of the tests.

Short for miscompilation or misuse (including but not limited to a not long enough sequence), the reported failure can be interpreted as a practical certainty that the sequence tested is not random.


WARNING: Failing the NIST test (consistently, or ever with igamc: underflow) shows that the generator is bad or not tested correctly. But passing that test is NOT in itself a valid argument that a bit generator is suitable for cryptographic use. If an argument for the later assertion is needed, consider that a generator that outputs the binary representation of the square root of any small odd integer >2 passes the tests.

Score:1
cn flag

igamc: underflow

is just poor programming. The incomplete gamma function is difficult to approximate and that's as much as I know regarding it. NIST has a suspicious propensity for inventing randomness tests that rely on it rather than simpler tests, e.g. $P - \text{value2} = \text{igamc} \left\{ 2^{m-3}, \nabla^2 \Psi_m^2 / 2 \right\}$ from the Serial test. It sows uncertainty and doubt.

I'm also not sure of the exact ASCII file format, so I never use it although /data/data.pi looks like:-

   110010010000111111011010
   1010001000100001011010001
   1000010001101001100010011
   0001100110001010001011100
   0000011011100000111001101
   0001001010010000001001001
   1100000100010001010011001
   1111001100011101000000001
   0000010111011111010100110
   0011101100010011100110110
   0100010010100010100101000

For the avoidance of doubt, go full 8 bits/byte binary rather than converting to ones and zeros. That's simply like any binary /executable /image file consisting of pure octets as the following dump illustrates:-

$ xxd -l 256 /tmp/testfile.bin

00000000: 8c1d 2870 d269 1828 38d1 9a5f 5817 8f55  ..(p.i.(8.._X..U
00000010: b77c 55d2 1c48 e1de f480 80d8 f683 71d0  .|U..H........q.
00000020: 9d55 8e4f 5ad8 9857 8901 526c 3bc7 33c2  .U.OZ..W..Rl;.3.
00000030: 4bac 4dbc bd08 0b50 7e53 dabd 123b 9a0a  K.M....P~S...;..
00000040: 4a10 ead5 7ee9 6f5d 4be4 fe68 76cc 1ab7  J...~.o]K..hv...
00000050: 209b 7e44 6f50 6044 450c efbc ba6d 2623   .~DoP`DE....m&#
00000060: 45a9 992b 1909 b594 7e6e c90c dc71 7bef  E..+....~n...q{.
00000070: a2d1 ac42 3ac9 40ca 07a3 6b23 d5f4 da40  ...B:[email protected]#...@
00000080: 1865 cb60 6f1c 78a5 1b36 0ed4 ab74 0781  .e.`o.x..6...t..
00000090: 7c41 8dac 27c6 7124 a129 aec0 a98a bbaf  |A..'.q$.)......
000000a0: 1f4e 02a3 628c 0908 d365 8b6d 65c9 1135  .N..b....e.me..5
000000b0: 24e1 2e4b 5448 8f43 62d9 b030 99d6 545d  $..KTH.Cb..0..T]
000000c0: c00c d13e 94f3 ed92 32d3 13d1 3064 8b55  ...>....2...0d.U
000000d0: 14b5 1f41 48df cd5d 8fc7 2a4b 119d d1c0  ...AH..]..*K....
000000e0: a3ea 95c9 5a4a 3d5f 25cf c44c c10d 802c  ....ZJ=_%..L...,
000000f0: aeca 8842 81b6 10b7 0e7d e41f e36d a0a6  ...B.....}...m..

Repeat the test. You may still get the error if the samples file is poor and not very random. It's easy to check as you can just draw megabytes from /dev/urandom with dd if=/dev/urandom of=/tmp/testfile.bin bs=1K count=2500 which should pass 99% of the test suite.

O. Nawwar avatar
la flag
Although it is poor programming, still needed to get a decent report. A lot of people use the suite without having the same problem, so I guess there is another reason. For the binary file format, how it should look like. Can you give an example (like the file data.pi shown above)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.