Score:2

Crypto

What is wrong with Middle Square PRNG?

Tom

2/4/23, 12:39 AM

According to article:

https://www.pcg-random.org/posts/too-big-to-fail.html

When N of Middle Square is $2^{128}$ we can expect to produce $2^{64.3}$ numbers before we start to see repeats in generator. That's enough for 320 exabytes of random data, vastly more than any PractRand test will ever consume.

This is enough size of path to reach the cycle and cycle itself. It passes the randomness tests and probably it is quite fast. So is there something wrong with big enough Middle Square? Especially if we would not complain about speed and requirement of 256-bit math.

Mellisa O'Neil wrote that if a generator is a random mapping, there is going to be some risk of bad states and short cycles. But what is that probability in this case? It looks like it is very small, anyway I can't prove it.

315

0 + 0

pseudo-random-generator

fgrieu

2/4/23, 7:46 AM

The quality of the middle square PRNG with $n$ bit state will depend a lot on how its output is defined as a function of it's state, and to a lesser degree of how it is initialized from it's key. Have you settled on that? Independently; the $2^{64.3}$ expected cycle size seems to be derived by _assuming_ that the function iterated behaves like a pseudo-random random function from and to the set of 128-bit bitstrings; thus making conclusions about the quality of the middle square PRNG from that number is largely a circular argument. Food for thought: how many antecedents of zero?

Tom

2/4/23, 12:21 PM

@fgrieu but this circular argument is kind of present in case of every PRNG generator. Of course in most generators we know the period, but we can't have proof that they will behave well by many terabytes of data. We need to believe in good behaviour. And we believe. The same, if we proof good behaviour of Middle Square for first terabytes of data, there are no good reasons to be afraid it will fail with bigger amount of data and it would have short period. Since we legitimize the quality of other generators by faith for large amounts of data, why not do it for Middle Square cycle length.

alephzero

2/4/23, 8:06 PM

@Tom Is your assertion that this kind of "circular argument" applies to "every PRNG generator" based on understanding the math, or on your own personal opinion about what might happen? Sure, there are *some* terrible PNRG generators out there, but "some" is not the same as "all."

Score:6

Crypto

Daniel S

2/4/23, 11:08 AM

The statement "we expect to produce $2^{64.3}$ numbers before we start to repeats" is only true if we believe that 128-bit Middle Square behaves like a random mapping on $\{0,1\}^{128}$. However, we can show that it has properties that are highly unlikely for a random mapping.

Recall that 128-bit Middle Square maintains a 128-bit state $S_t$. Updates are effectively made by squaring $S_t$, and taking bits 64-191 as the new $S_{t+1}$ i.e. $$S_{t+1}=(S_t^2>>64)\%2^{128}.$$

The state $S_t=0$ represents a fixed point. Although random mappings have fixed points with probability roughly $(1-1/e)$, this is one is unusual as it has a large number of preimages. Any number $S<2^{32}$ will map to 0 as will any number $S$ divisible by $2^{96}$. These preimages alone (there may be others) total $2^{33}$ when for a large random map we expect the number of preimages to be distributed Poisson(1). Moreover if we consider predecessors, any number $S<2^{64}$ will map to a smaller number and number less than $2^{63}$ will reach 0 in fewer than 6 steps. Likewise for numbers divisible by $2^{65}$. This gives at least $2^{64}$ predecessor states for 0 when a random map would expect $2^{64}\sqrt{\pi/8}$ (with coalescence time about the same). The number of predecessor states increases still more when we consider the possible predecessors of our guaranteed $2^{64}$ predecessor states, if these each have $2^{64}\sqrt{\pi/8}$ predecessors we might see a positive proportion of our space degenerating to the 0 state.

There is also a preserved subspace of number exactly divisible by $2^{64}$ (this space is of size $2^{63}$) which we might expect to exhibit random mapping statistics for the smaller space (e.g. a cycle length of $2^{31.5}\sqrt{\pi/8}$). We then consider predecessors for this subspace and produce expectations significantly different from a full random mapping.

All-in-all these structures are very atypical of a random mapping and we should conclude that the random mapping is not a good model in this case.

0 + 0

Tom

2/4/23, 12:07 PM

Ok, most of my doubts were about - if this is so close to random mapping, what's wrong. Now I understand better that we have deviations from random mapping, and even if it worked like random mapping, we still have a kind of circular argument here.

fgrieu

2/4/23, 1:41 PM

As early as the [ENIAC](https://en.wikipedia.org/wiki/ENIAC), it was known that the properties of Von Neumann's middle-square method analyzed in this answer lead to short cycles when iterated. Knuth's TAOCP volume 2 states _"the sequence tends to get into a rut, a short cycle of repeating elements. (…) More extensive tests were carried out by N. Metropolis, mostly in the binary number system. He showed that when 20-bit numbers are being used, there are 13 different cycles into which the middle-square sequence might degenerate, the longest of which has a period of length 142"_.

Score:4

Crypto

fgrieu

2/4/23, 5:58 PM

Reading the page linked in the question: it acknowledges that the basic middle square method leads to short cycles, explains why (as does this other answer), and proposes a "Meddle square" variation instead, which iterates $x\mapsto g(x):=2^n-1-\left\lfloor x^2/2^{\left\lfloor n/2\right\rfloor}\right\rfloor\bmod 2^n$ instead of $x\mapsto f(x):=\left\lfloor x^2/2^{\left\lfloor n/2\right\rfloor}\right\rfloor\bmod 2^n$.

This variation removes any obvious fixed point, and more importantly any obvious large class of points bound to quickly fall into a short cycle. This is an improvement. However:

If we output the full state of the generator at each step, it is quite efficient (for competent implementation on a machine with fast multiplier), but trivially predictable from past output, thus cryptographically insecure. I find very believable the article's claim that (for $n=128$) Meddle square passes all standard statistical tests. This is an illustration that standard statistical tests on a PRNG's output are pointless when it comes to argument that it's a CSPRNG. And indeed, the article seems to introduce Meddle square precisely to illustrate that having a large state with no obvious fixed point or absorbing cycle, and passing tests, is not a valid security criteria for a CSPRNG.
If we output a single bit, Middle square and variations is $n$ times less efficient, and we still do not have a sound argument that it is a CSPRNG. Fact is, we can quite easily craft a PRNG iterating a function changing it's large state, that output a single bit of that state at each step, pass all standard tests, yet ultimately leak their full state and thus are cryptographically insecure.
We could settle for a middle ground, like a 256-bit state of which 64 bits are output at each stage. That can regain much efficiency, but without trying I have no certainty about how throughput compares with e.g. Chacha; and I would say it's much harder to get an efficient implementation right.
As any generator based on iterating a pseudorandom function, it's impossible to fast-forward the generator, which makes it unsuitable for many applications (like a stream cipher for large amount of data supporting random read access^*).

To sum it up: the Middle Square PRNG, and even it's improved variation Meddle Square, are wrong because:

They lack a sound security argument, in whatever variation. That should be enough.
They are trivially insecure when we output their full state at each step.
When we output less of their state, they become less efficient.
There's no direct access^* to the generated stream, a useful property of many CSPRNGs.

^* That is, capability to efficiently compute the $j^\text{th}$ output of the generator for large $j$, with effort essentially independent of how large $j$ is. This comes handy when $j$ is an index in a huge movie encrypted by XOR with the generator's output, and we want to start watching the last third of the movie.

0 + 0

Tom

2/4/23, 6:24 PM

What is direct access to the generated stream?

fgrieu

2/4/23, 7:44 PM