Per comments revising the original question: the OP conjectures that 100 digits $y_n$ in $\{0,1,2,3,4,5\}$ in their possession are obtained using a C(++) expression equivalent to rand()%6
with the rand()
using a (non-cryptographic) PRNG based on a Linear Congruential Generator, with code equivalent to
static unsigned long x;
int rand(void) {
x = 214013*x+2531011; // or is it 22695477*x+1
return (int)((x>>16)&0x7FFF);
}
Notice that $x$ is at least 32-bit, but only the lower-order 31 bits matter due to (x>>16)&0x7FFF
and C performing unsigned long
multiplication with truncation of high-order bits that do not fit in the variable.
Abstracting the programming details, the conjecture is that $x$ evolves per $x_{n+1}:=a\cdot x_n+b\bmod m$ with $m=2^{31}$ and $(a,b)$ for some fixed constants believed to be either $(214013,2531011)$ or $(22695477,1)$. And rand()
outputs bits 30…16 of $x$ thus the $y_n:=\lfloor x_n/2^{16}\rfloor\bmod 6$ are given for $n=0$ to $99$ (with the seed an unknown integer $x_{-1}$ in some immaterial range, since only $x_{-1}\bmod m$ would matter; and we are better off trying to find $x_0$ anyway).
The OP asks if it's possible to confirm or infirm that conjecture, and (if true) find which $(a,b)$ are used and predict what follows in the sequence.
Yes, that's possible, with excellent confidence. The effective state of the PRNGs considered has 31-bit, there are only two PRNGs considered, they are passable for simulation purposes, thus we should be able to find their state and which is used with a little more then $31+1=32$ bit of information; and we get $100\cdot\log_2(6)\approx258.5$ bit, which will give confirmation aplenty.
The simplest is to try for both conjectured $(a,b)$, and explore the possible values of $x_0$. There are only $2^{31}$, knowing $y_0$ allows to systematically reduce that by a factor of $6$. Each following $y_i$ further eliminates $5$ candidates out of $6$. If no candidate survives all the $y_i$, the hypothesis is disproved. If a match is found, we know which $(a,b)$ we used, and can compute additional $y_i$. A competent implementation in a compiled language would take like a second on a modern desktop PC.
But maybe want to break the thing in real time with a modern \$0.5 CPU running from a \$0.2 battery, or are stuck to a programmable calculator of the 1970s, or the ENIAC. Remark that $6$ is even, and the divisor is $2^{16}$. It follows $y_n\bmod 2$ is bit $16$ of $x_n$. Also remark that since $m$ is a power of two, the change of a bit in $x_n$ does not propagate to lower-order bits of $x_{n+1}$. If follows that bit 16 of $x_n$ depends only on the low 17 bits of $x_0$. We know bit 16 of $x_0$, thus need to test at most $2^{16}$ candidates for bits 15…0 of $x_0$. We can then find the other 14 bits as above. That divide and conquer approach would allow to tackle much larger parameters, e.g. variable x
of type uin64_t
and return x>>33
, on a modern CPU.
I wonder what we could do if $a$, $b$ and/or $m$ were unknown.
Notes on the above:
- It uses the dominant convention in computer science (and cryptography with few exceptions like DES): bits are counted from 0 (low-order bit), so that if a non-negative integer $v$ is represented in binary as $k$ bits $b_j$, then $v=\sum b_j$ with $0\le j<k$. In big-endian convention, bit 0 is written on the right: $6\times7=42$ in decimal is $110\times111=101010$ in binary.
- Computer variable
x
is 32-bit at least, but only it's low order 31 bits (0 to 30) matter and are used in $x_n$, thus $0\le x_n<m=2^{31}$. The output of rand()
is 16-bit at least, but only it's low order 15 bits (0 to 14) matter, and all the others are zero, thus $0\le$rand()
$\le$RAND_MAX
$=2^{15}-1$. If $0\le i<15$ then bit $j$ of the output of rand()
matches bit $j+16$ of x
. It follows bit 0 of $y_n$ matches bit 16 of $x_n$.
(Off-topic) comments on the current code:
- It tries to use the speedup made possible by
6
being even. I maintain this is not required to reach an execution time in seconds, if
- we explore the possible values of $x_0$, rather than the seed many steps before.
- we use that $x_0$ is 31-bit, thus we can restrict the outer search to [
0
, 0x7fffffff
] that is $2^{31}$ candidate $x_0$.
- we use that we know $y_0$, thus that $x_0=2^{16}\cdot(6\cdot u+ y_0)+v$ for $0\le u<\lceil2^{15}/6\rceil$ and $0\le v<2^{16}$, which restricts to about $2^{31}/6$ candidates for $x_0$.
- we optimize the test for checking candidate $x_0$ against $y_1$ in the inner loop on $v$.
- The essence of the optional speedup noting
6
is even is to first find bits 16…0 of $x_0$ matching the values $y_0$ gathered, then bits 30…17. The code does not do that! With that speedup, execution time would go down to the millisecond.
- We need the full values of the $y_n$ gathered (in both the non-optimized search, and the second part of the optimized search), but they do not seem to be available in the input, which I guess is $y_n\bmod2$. Further, I don't understand why that's in two-dimensional array
RESULT[3][7]
.
- When $x_0$ is found, if it was necessary to find the seed (or rather bits 30…0 of that) a known number of steps before, that can be done efficiently by walking back the LCG using $x_{n-1}:=a^{-1}\,(x_n-b)\bmod m$ where $a^{-1}$ is the modular inverse of $a$ modulo $m$.
Here is working code without the optional speedup (thus capable of working with odd final reduction modulus $r$ where the question has 6
). Try it online!
#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
#define A 214013 // for VC; 22695477 for BC
#define B 2531011 // for VC; 1 for BC
#define R 6 // modulus, in [2, 32768]
// static_assert(A > 0 && A % 8 == 5, "A mod 8 must be 5");
// static_assert(B > 0 && B % 2 == 1, "B mod 2 must be 1");
// static_assert(R >= 2 && R <= 32768, "R must be in [2, 32768]");
// decide modulo, reduced when R is a power of two
#define M ((uint32_t)(((R-1)&R)!=0 ? 0x8000 : R)<<16)
// Sequence of yn for VC (a=214013, b=2531011), n=6, seed=1639326023
// If R is a power of two, ceil(16/log2(R))+1 entries are enough
// Otherwise, that's ceil(31/log2(R)) entries, thus 12 for R=6.
const int16_t y[] = {
0,5,3,0,4,3,1,0,4,5,4,4,4,5,5,3,0,2,0,5,4,5,0, // 0,2,5,1,3,5,5,5,
};
// modular inverse INVA of A modulo M
#define INVA (IN_A(IN_A(IN_A(IN_A((uint32_t)A))))&(M-1))
#define IN_A(v) (v*(2-v*A))
int main(void) {
// decide starting x0 so that it matches y0
const uint32_t y0 = y[0], y1 = y[1];
int32_t x0 = (int32_t)(((M >> 16) - y0) / R * R + y0) << 16 | 0xffff;
uint32_t found = 0;
printf("generator: ((int)((x = %" PRIu32 " * x + %" PRIu32 ") >> 16) & 0x7fff) %% %u\n",
(uint32_t)A, (uint32_t)B, (unsigned)R);
while (x0 >= 0) {
uint32_t x1 = A * (uint32_t)x0 + B;
do {
// assert( (x0 >> 16) % R == y0);
// assert( x1 == A * (uint32_t)x0 + B);
if (((x1 >> 16) & 0x7fff) % R == y1) {
uint32_t x = x1;
int n;
for (n = 2; n < sizeof(y) / sizeof(*y); ++n)
if ((((x = A * x + B) >> 16) & 0x7fff) % R != y[n])
goto nxt;
// found a solution
x = (INVA * ((uint32_t)x0 - B)) & (M - 1);
printf("x0 can be %" PRId32 ", that is seed=%" PRIu32
" modulo 0x%" PRIx32 ", yielding:\n", x0, x, M);
// prove out point by showing the output
for (n = 0; n < sizeof(y) / sizeof(*y) + 8; ++n)
printf(" %d", ((int)((x = A * x + B) >> 16) & 0x7fff) % R);
printf("\n");
++found;
}
nxt: x1 -= (uint32_t)A;
} while (((x0--) & 0xffff) != 0);
x0 -= (int32_t)(R - 1) << 16;
}
printf("found %" PRIu32 " solution%s\n", found, found > 1 ? "s" : "");
return 0;
}
// yielding:
// generator: ((int)((x = 214013 * x + 2531011) >> 16) & 0x7fff) % 6
// x0 can be 531633902, that is seed=1639326023 modulo 0x80000000, yielding:
// 2 3 4 1 5 1 1 5 4 0 3 2 2 5 5 3 5 5 3 4 0 1 1 4 1 3 3 2 5 4 4
// found 1 solution