In this journal paper related to physically unclonable function (PUF) [1], the authors used NIST 800-22 test to check if the bitstreams generated by their PUFs are random, which is described as follows:
NIST tests are performed using 60 sequences of 128 bits each such that 7680 bits (i.e., digitized keys) collected from 30 different PUFs are tested. The chi-squared (χ2) distribution is used to compare the goodness-of-fit of the p-value distribution of the blocks from the entire stream to the expected distribution. The bitstream is considered to be random only if the p-value ≥ 0.0001
I do not understand why such a low p-value of 0.0001 is chosen here. Since the purpose of this hypothesis test is to reject it for proving the randomness by saying that the calculated p-value is larger than the threshold 0.0001, shouldn't a larger threshold, such as 0.05 as in most statistic textbooks do? Furthermore, if the sample size is smaller, should I use an even smaller p-value or a larger p-value threshold here?
P.S. I have read this thread (NIST randomness test p values) and @Squeamish Ossifrage suggested always to use p < 0.05. But it is aiming at accepting the hypothesis for proving the non-randomness, shall I use different p-value thresholds for these two cases?
[1] Leem, Jung Woo, Min Seok Kim, Seung Ho Choi, Seong-Ryul Kim, Seong-Wan Kim, Young Min Song, Robert J. Young, and Young L. Kim. "Edible unclonable functions." Nature communications 11, no. 1 (2020): 1-11.