Score:4

Crypto

Is it valid to transform the tested sample file and re-test, rather than invent additional randomness tests?

Paul Uszak

7/27/24, 1:53 PM

I'm enhancing the venerable ent randomness test suite. And I came across this idea in On Independence and Sensitivity of Statistical Randomness Tests.

"To have more confidence in a random number generator, it is advantageous to use as many randomness tests as possible. In this part of the study, we propose to apply simple transformations to input sequences that significantly change the output p-value of a randomness test as an alternative to developing more tests." - §4.

If you include test variants (like all the unique aperiodic templates - NIST 800-22), there are many many randomness tests scattered throughout the common test suites. And the first line of the quote above seems to make sense. Rather than developing and calibrating more and more tests, can we just transform the input file if it will produce differing p values?

So transformations might include bit mangling $(x'_i = x_i \oplus 127)$, sequence reversal (already performed in Cusum test - NIST 800-22, §3.13), byte rotation e.t.c. And the ea_iid test- NIST 800-90B uses multiple 1000's of pseudo-random permutations of the input file to determine independence of samples.

Does this seem a valid alternative and as confidence inspiring as additional novel tests?

188

2 + 3

statistical-test

randomness

aiootp

7/30/24, 11:44 AM

That sounds like it would be its own novel test.

1

Reply

Paul Uszak

7/30/24, 2:41 PM

@aiootp Do you really think so? I can't decide, and then the obvious question arises; which and how many transformations are appropriate? Are new p values derived from $x'_i = x_i \oplus 127$ or $x'_i = x_i \boxplus 127$ test sequences as informative as a novel test algorithm?

0

Reply

aiootp

7/30/24, 3:00 PM

I guess it sounds more like a test for your tests using fuzzing. Which could inform you of how sensitive your tests are to non-statistically consequential transformations to inputs. But, only if your changes aren't introducing changes to the statistical properties of your sample, because then you'd actually be testing the randomness of your transformations, instead of testing the randomness of your sample or how much your tests are sensitive to non-consequential changes to their inputs.

0

Reply

Score:2

Crypto

aiootp

7/30/24, 9:13 PM

Sounds more so like a test for your tests using fuzzing.

That could inform you of how sensitive your tests are to non-statistically consequential input transformations. But, only if your transformations aren't introducing changes to the statistical properties of your sample. Because if they are, it's not defined apriori what the tests are testing. However, it's possible they're testing the randomness of your transformations, instead of testing the randomness of your sample.

The paper which you cite proposes the use of transformations on sample data which then cause randomness tests to be statistically independent:

Deﬁnition 1. Consider a randomness test $T$ and a transformation $σ:L→L$ where $L$ is the set of all $n$-bit binary sequences. $T$ is said to be invariant under $σ$ if for any $S \in L$, $T(S) = T(σ(S))$.$^{(7)}$ Here, we deﬁne a new concept of sensitivity to measure the eﬀect of a transformation to output $p$-values. If a test $T$ is invariant under $σ$, sensitivity of $T$ to $σ$ is represented by $0$. If the transformation has small eﬀect on the test results, that is, there is a significant correlation between $T(S)$ and $T(σ(S))$, sensitivity is represented by $1$. Whenever $T(S)$ and $T(σ(S))$ are statistically independent, sensitivity is represented by $2$, in those cases $T(σ(.))$ can be added to the test suite as a new test. $- §$4

The authors are clearly not considering simple transformations $σ$ which are limited in their impact on the statistical properties of samples $S$. They are only considering transformations which cause statistical independence between tests on $σ(S)$ and $S$. They clarify this point as follows:

It is obvious that the independence of $T(σ(S))$ and $T(S)$ is not enough to add $T(σ(.))$ to the suite. It should also be independent of other tests in the suite. $- §$4

Admittedly, there is a subtle distinction between my usage of "statistically independent data" and the authors' idea of transformations on data which lead to statistically independent statistical tests. But, I'd conjecture it's a distinction without a difference. As, if there's no statistical test $T$ which can show a statistical correlation between data $σ(S)$ and $S$, then the data itself must be statistically independent. This conjecture is incorrect in general.

Now, there are obvious examples of such transformations to a sample which make any subsequent tests statistically irrelevant. For instance, if I bitwise-AND the sample with zeros:

$$ \begin{equation}\begin{aligned} S = 00000110110100100011101&0101001111111011010110010 \newline &\mathrm{AND} \newline 00000000000000000000000&0000000000000000000000000 \newline &\downarrow \newline σ(S) = 00000000000000000000000&0000000000000000000000000 \end{aligned}\end{equation} $$

The now null sample $σ(S)$ will fail all randomness tests, while telling me nothing about $S$.

Likewise, if I apply a randomizing transformation to the sample:

$$ \begin{equation}\begin{aligned} S = 00000000000000000000000&0000000000000000000000000 \newline &\downarrow \newline \mathrm{H}&\mathrm{ASH}\left(S\right) \newline &\downarrow \newline σ(S) = 00100011100111110101000&0011010000001101110100101 \end{aligned}\end{equation} $$

The new sample $σ(S)$ is now endowed with randomness not present in the original, also telling me nearly nothing about $S$.

In both cases, the transformations caused any tests on the new sample $σ(S)$ to be statistically independent from tests on the original $S$. These cases are intended to be extreme, and therefore clear examples of how statistical independence can be introduced. The point is: How can you be sure your specific transformations aren't also doing something egregious to the information that can be gained about the original sample?

Generally, if your transformations are making your new samples statistically independent, it would be strange to learn anything statistically relevant from them. Doing so may even be a bit paradoxical? I refer to the definition: "two events or random variables are considered statistically independent if the knowledge of one provides no information about the other."

Of course, there are degrees to which transformations will conjure irrelevance. But, it would seem non-trivial to discern signal from the noise inherent to any and each type of transformation which introduces even a small degree of statistical independence.

Accounting for noise, and qualifying the effects each novel transformation has on the insights gained by each test result, may be the same amount of work, or even more, with unclear added value, than would be just inventing a new test that's well-justified, with insights that are understood, and that are designed from the start to provide specific value. The former is definitely not as confidence inspiring as the latter.

+ 4

aiootp

7/31/24, 8:48 PM

I may be mis-applying the term fuzzing to this context. However, what I mean to convey by saying fuzzing, is similar what is meant by fuzzing in software & systems testing contexts. Which is to set up automated tests for a set of functionalities that seek to discover if the expected functionalities are invariant, or preserve correctness, even when their inputs are varied in significant or even unexpected ways. I mean something similar to that, which is to run your tests on transformed data, which is statistically the same, to gain confidence that the tests aren't bias to mere presentation.

0

Reply

kodlu

8/4/24, 12:19 AM

You say: As, if there's no statistical test T which can show a statistical correlation between data σ(S) and S , then the data itself must be statistically independent. *This is incorrect, independence implies uncorrelated but not vice versa in general*.

1

Reply

aiootp

8/4/24, 1:37 AM

@kodlu thank you for the correction! fixes were attempted.

0

Reply

Paul Uszak

8/4/24, 2:48 AM

We're running out of bounty time, so I'll make this my last comment :-) Have you seen [this](http://www.reallyreallyrandom.com/ent3000/qa/#inter-test-correlation) heat map? It is possible to easily identify correlated tests and (by implication) correlated transformations. Those can then be excluded from the suite.

0

Reply

Score:2

Crypto

Meir Maor

7/31/24, 7:28 AM

Is it valid? well yes, if data fails a test after transforming the data it fails the test. Is it as good as adding more tests? Well depends on the test and on the transformation. In general I would say this has value, but less so then adding more tests.

Applying an existing test on transformed input is a subclass of adding more tests, but some tests would be difficult to formulate in such as fashion.

If we have statistical tests, which look at bit pair correlations in some windows, and we are considering trying to predict next bit based on previous k bit window that is not a test we can easily replace with a transformation.

If we have a test which does statistics on 8 bit window vs next bit, and we are considering building a linear predictor based on last 128 bits. It's again a powerful test, we can't easily replace with a transformation.

So some stuff can be done with simple cheap transformation and it might be a cheap way of getting a few more tests but it seems of limited value.

Maybe you have some specific clever transformations in mind, but suggested transformation like bit flip and reversal seem of little value, and many tests will be oblivious to many simple transformations.

+ 9

aiootp

7/31/24, 8:39 PM

The paper referenced by OP speaks about using transformations which are statistically independent as the basis for new tests. That may have value, but I just don't think it would be easy to determine what value exactly it provides. For instance, if I describe one transformation like this (https://en.wikipedia.org/wiki/File:Salsa_round_function.svg), what exactly am I saying about the original sample? What about if I describe a transformation like this (https://en.wikipedia.org/wiki/SHA-3#/media/File:SpongeConstruction.svg). It would be non-trivial to ascribe test results to the sample itself.

0

Reply

Meir Maor

8/1/24, 3:26 AM

If it fails it fails. For secure prng you don't want to be able to have a discriminator so a simple explanation of.the pattern found ia not required. A failed statistical test is a non negligible descriminator.

0

Reply

aiootp

8/1/24, 4:55 AM

There are obvious examples of transformations to a sample which make any subsequent tests statistically irrelevant. For instance, if I bitwise-AND the sample with zeros, the now null sample will fail all randomness tests, telling me nothing. Likewise, applying a randomizing algorithm, like a hash, to the sample, may endow the sample with randomness not present in the original. In both cases, the transformations caused the new sample to be statistically independent from the original. It seems strange to be able to learn anything statistically relevant from statistically independent data.

0

Reply

Meir Maor

8/1/24, 5:16 AM

Well yes, you need the transformation to preserve the information. Any Bijection would be a valid transformation.

0

Reply

Paul Uszak

8/2/24, 12:19 PM

@aiootp I think that the paper actually suggests very simple transformations like flipping all the bits. Or adding 42 (mod) 256 to all the bytes. Clearly using a cryptographic cipher on most files will create the sense of pseudo-randomness and invalidate such testing.

0

Reply

garfunkel

8/2/24, 3:02 PM

@MeirMaor I wouldn't say that any bijection is a valid transformation, as Paul already commented. But I totally agree with you that trivial transformations like the bit flips and reversals are worthless. The only clever transformation I can think of that might make sense in this context is the [Burrows-Wheeler transformation](https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform). It's a bijection used in data compression for preprocessing the data which lumps together the bytes with the same preceding bytes.

0

Reply

aiootp

8/2/24, 4:37 PM

@PaulUszak Check my update, they are specifically referring to transformations which lead to statistically independent test results, that excludes those simple transformations.

0

Reply

Paul Uszak

8/4/24, 2:49 AM

We're running out of bounty time, so I'll make this my last comment :-) Have you seen [this](http://www.reallyreallyrandom.com/ent3000/qa/#inter-test-correlation) heat map? It is possible to easily identify correlated tests and (by implication) correlated transformations. Those can then be excluded from the suite.

0

Reply

Meir Maor

8/4/24, 6:39 AM

Not optimistic, you can exclude some, but not necessarily all. If on the other hand you make a very strong transformation say, AES CBC, then passing tests after transformation does not make you confident in the original. It's a limited tool. But limited doesn't mean totally useless. I don't thin writing good transformations is easier than good tests with other techniques.

0

Reply

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Is it valid to transform the tested sample file and re-test, rather than invent additional randomness tests?

Is it valid to transform the tested sample file and re-test, rather than invent additional randomness tests?

Post an answer