Score:0

Is it possible to crack a Linear Congruential Generator if I only know the modulus of the output?

sz flag

Edit suggested by fgrieu:

I have one hundred integers in $\{0,1,2,3,4,5\}$ which I suspect are consecutive values of $\lfloor x_n/2^{16}\rfloor\bmod6$ computed as $x_{n+1}:=a\cdot x_n+b\bmod m$, with $m=2^{31}$, and $(a,b)\in{(214013,2531011),(22695477,1)}$. How do I validate that hypothesis, find the $(a,b)$ used, and predict what follows in the sequence?


Question about "A competent implementation in a compiled language would take like a second on a modern desktop PC."

I wrote some code but they are expected to run 20 hours.

I am trying to find the random seed. First, I input my data in an array. Since I don't know my first data is what-th number generated by the server, I need to find it out. I only know the server shut down every thursday 2:00pm, and restart around 2:45-3:45pm the same day. When the server is on, ir generates 3 numbers every 45 seconds. The data I have is collected on fri 1:50 am, so my first data maybe the 2400-2640th number of the LCG.

I first run the rand 2399 times to discard the first 2399 numbers. Next, I loop 241 times to find my first data is what-th number generated by the server. (the uncertainity of the server restart time 2:45-3:45pm, 240 numbers per hour)

My method is: If 2400th x's bit 16 equal to bit 0 of $y_1$, then I check 2401th x's bit 16 and bit 0 of $y_2$, and so on. If there is unequal, break the loop then start another loop, compare 2401th x and bit 0 of $y_1$.

What is the better way to do it? I just started to learn c++ two weeks ago, please forgive my ignorance.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <iostream>
#include <inttypes.h>

using namespace std;

const int RESULT[3][7] = {
    {0,1,0,1,1,1,1},
    {1,0,1,0,0,0,0},
    {0,1,1,0,1,0,0}
};

static unsigned long x;

int test_rand(void)
{
    x = 214013 * x + 2531011; // or is it 22695477*x+1
    return (int)((x >> 16) & 0x7FFF);
};

int onlyx16(void)
{
    x = 214013 * x + 2531011; // or is it 22695477*x+1
    return (x >> 16) & 1;
};

void chk_seed(unsigned long seed)
{
    int d1[241]{};
    int d2[241]{};
    int d3[241]{};

    x = seed;

    for (int i = 0; i < 2399; i++) {
        test_rand();
    }

    for (int i = 0; i < 241; i++)
    {
        d1[i] = onlyx16();
        d2[i] = onlyx16();
        d3[i] = onlyx16();
    };

    int correct = 0;

    for (int k = 0; k < 236; k++)
    {
        correct = 0;
        for (int i = 0; i < 7; i++)
        {
            if ((d1[i + k]) == RESULT[0][i])
            {
                correct += 1;
            }
            else {
                correct = 0;
                break;
            };
            if ((d2[i + k]) == RESULT[1][i])
            {
                correct += 1;
            }
            else {
                correct = 0;
                break;
            };
            if ((d3[i + k]) == RESULT[2][i])
            {
                correct += 1;
            }
            else {
                correct = 0;
                break;
            };
        };
        if (correct == 21)
        {
            printf("seed 0x%d is OK\n", seed);
            printf("results forecast:\n");
            for (int round = 0; round < 5; round++)
            {
                printf("round%d ", round + 1);
                int new_d[3]{};
                for (int i = 0; i < 3; i++)
                {
                    new_d[i] = test_rand()% 6;
                    printf("%d", new_d[i]);
                };
                printf("\n");
            }
        };
    }
};

int main()
{
    for (unsigned long seed = 0; seed < 0x100000000; seed++)
        chk_seed(seed);
};

$x_{n+1} = (a \cdot x_{n} + b) \mod m$

In normal situation, $x_{n+1}$ and $x_n$ are known. But now I only know $x_n\mod 6$ and $x_{n+1}\mod 6$.

I have searched many website on google but I only find one question that talked about this problem.

Predicting values from a Linear Congruential Generator

However, it is not very clear and I still don't know what should I do after reading that. I hope someone can provide some math or example code, so that I can learn from trial and error.

I want to find a,b,m then use a C++ source code I found here to brute-force the seed.

I found an answer here that talked about how to find m, but I don't know $x_{n+1}$ and $x_n$.

https://security.stackexchange.com/questions/4268/cracking-a-linear-congruential-generator

I am new to this topic, but I desperately wanted to crack this PRNG, this PRNG made me suffered a lot, I decided to learn programming because of this PRNG. Thank you for your help!

fgrieu avatar
ng flag
The difficulty depends on if $a$, $b$, $m$ are given; on if $m$ is a power of two, and which; and on how many $x_n\bmod 6$ there are. If the problem is asked for the Java default LCG: you do _not_ get $n\bmod6$ at the output! Neither MSVC's nor Borland's `rand` return $x_n$. They reportedly return bits 30..16 (that's 15 bits) of $x_n$, see [this](https://stackoverflow.com/q/6793065/903600) and [this](https://stackoverflow.com/q/14672358/903600). Thus I doubt of «know $x_n\bmod6$».
fairytale avatar
sz flag
@fgrieu I know one hundred of $x_n\bmod 6$. I checked the .exe files in the folder, one's compiler is MSVC, another one's compiler is Borland C++, so I tried the MSVC and Borland rand, however they cannot give me correct future output. I only get 0-5 as output, so I think it is caused by "%6" https://stackoverflow.com/questions/1202687/how-do-i-get-a-specific-range-of-numbers-from-rand I think it is a common practice of getting specific range of random numbers. [edited by mod to condense multiple comments]
fgrieu avatar
ng flag
But you have no indication that it's $x_n$ which is taken$\bmod6$, as written in the question. If that was the case, and if $m$ was even and $a$ odd, as common, then the numbers you get would be alternatively odd and even. The question you want to ask may be: «I have one hundred integers in $\{0,1,2,3,4,5\}$ which I suspect are consecutive values of $\lfloor x_n/2^{16}\rfloor\bmod6$ computed as $x_{n+1}:=a\cdot x_n+b\bmod m$, with $m=2^{31}$, and $(a,b)\in\{(214013,2531011),(22695477,1)\}$. How do I validate that hypothesis, find the $(a,b)$ used, and predict what follows in the sequence?»
fairytale avatar
sz flag
@fgrieu OK, let's try this approach. Thank you for your suggestion. I will [edit] the question later and try your method, I cannot use computer now. [edited by mod to condense multiple comments]
Fractalice avatar
in flag
The paper "Reconstructing Truncated Integer Variables Satisfying Linear Congruences" may be very relevant.
Score:3
ng flag

Per comments revising the original question: the OP conjectures that 100 digits $y_n$ in $\{0,1,2,3,4,5\}$ in their possession are obtained using a C(++) expression equivalent to rand()%6 with the rand() using a (non-cryptographic) PRNG based on a Linear Congruential Generator, with code equivalent to

static unsigned long x;
int rand(void) {
  x = 214013*x+2531011; // or is it 22695477*x+1
return (int)((x>>16)&0x7FFF);
}

Notice that $x$ is at least 32-bit, but only the lower-order 31 bits matter due to (x>>16)&0x7FFF and C performing unsigned long multiplication with truncation of high-order bits that do not fit in the variable.

Abstracting the programming details, the conjecture is that $x$ evolves per $x_{n+1}:=a\cdot x_n+b\bmod m$ with $m=2^{31}$ and $(a,b)$ for some fixed constants believed to be either $(214013,2531011)$ or $(22695477,1)$. And rand() outputs bits 30…16 of $x$ thus the $y_n:=\lfloor x_n/2^{16}\rfloor\bmod 6$ are given for $n=0$ to $99$ (with the seed an unknown integer $x_{-1}$ in some immaterial range, since only $x_{-1}\bmod m$ would matter; and we are better off trying to find $x_0$ anyway).

The OP asks if it's possible to confirm or infirm that conjecture, and (if true) find which $(a,b)$ are used and predict what follows in the sequence.

Yes, that's possible, with excellent confidence. The effective state of the PRNGs considered has 31-bit, there are only two PRNGs considered, they are passable for simulation purposes, thus we should be able to find their state and which is used with a little more then $31+1=32$ bit of information; and we get $100\cdot\log_2(6)\approx258.5$ bit, which will give confirmation aplenty.

The simplest is to try for both conjectured $(a,b)$, and explore the possible values of $x_0$. There are only $2^{31}$, knowing $y_0$ allows to systematically reduce that by a factor of $6$. Each following $y_i$ further eliminates $5$ candidates out of $6$. If no candidate survives all the $y_i$, the hypothesis is disproved. If a match is found, we know which $(a,b)$ we used, and can compute additional $y_i$. A competent implementation in a compiled language would take like a second on a modern desktop PC.

But maybe want to break the thing in real time with a modern \$0.5 CPU running from a \$0.2 battery, or are stuck to a programmable calculator of the 1970s, or the ENIAC. Remark that $6$ is even, and the divisor is $2^{16}$. It follows $y_n\bmod 2$ is bit $16$ of $x_n$. Also remark that since $m$ is a power of two, the change of a bit in $x_n$ does not propagate to lower-order bits of $x_{n+1}$. If follows that bit 16 of $x_n$ depends only on the low 17 bits of $x_0$. We know bit 16 of $x_0$, thus need to test at most $2^{16}$ candidates for bits 15…0 of $x_0$. We can then find the other 14 bits as above. That divide and conquer approach would allow to tackle much larger parameters, e.g. variable x of type uin64_t and return x>>33, on a modern CPU.

I wonder what we could do if $a$, $b$ and/or $m$ were unknown.


Notes on the above:

  • It uses the dominant convention in computer science (and cryptography with few exceptions like DES): bits are counted from 0 (low-order bit), so that if a non-negative integer $v$ is represented in binary as $k$ bits $b_j$, then $v=\sum b_j$ with $0\le j<k$. In big-endian convention, bit 0 is written on the right: $6\times7=42$ in decimal is $110\times111=101010$ in binary.
  • Computer variable x is 32-bit at least, but only it's low order 31 bits (0 to 30) matter and are used in $x_n$, thus $0\le x_n<m=2^{31}$. The output of rand() is 16-bit at least, but only it's low order 15 bits (0 to 14) matter, and all the others are zero, thus $0\le$rand()$\le$RAND_MAX$=2^{15}-1$. If $0\le i<15$ then bit $j$ of the output of rand() matches bit $j+16$ of x. It follows bit 0 of $y_n$ matches bit 16 of $x_n$.

(Off-topic) comments on the current code:

  • It tries to use the speedup made possible by 6 being even. I maintain this is not required to reach an execution time in seconds, if
    • we explore the possible values of $x_0$, rather than the seed many steps before.
    • we use that $x_0$ is 31-bit, thus we can restrict the outer search to [0, 0x7fffffff] that is $2^{31}$ candidate $x_0$.
    • we use that we know $y_0$, thus that $x_0=2^{16}\cdot(6\cdot u+ y_0)+v$ for $0\le u<\lceil2^{15}/6\rceil$ and $0\le v<2^{16}$, which restricts to about $2^{31}/6$ candidates for $x_0$.
    • we optimize the test for checking candidate $x_0$ against $y_1$ in the inner loop on $v$.
  • The essence of the optional speedup noting 6 is even is to first find bits 16…0 of $x_0$ matching the values $y_0$ gathered, then bits 30…17. The code does not do that! With that speedup, execution time would go down to the millisecond.
  • We need the full values of the $y_n$ gathered (in both the non-optimized search, and the second part of the optimized search), but they do not seem to be available in the input, which I guess is $y_n\bmod2$. Further, I don't understand why that's in two-dimensional array RESULT[3][7].
  • When $x_0$ is found, if it was necessary to find the seed (or rather bits 30…0 of that) a known number of steps before, that can be done efficiently by walking back the LCG using $x_{n-1}:=a^{-1}\,(x_n-b)\bmod m$ where $a^{-1}$ is the modular inverse of $a$ modulo $m$.

Here is working code without the optional speedup (thus capable of working with odd final reduction modulus $r$ where the question has 6). Try it online!

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>

#define A  214013 // for VC; 22695477 for BC
#define B 2531011 // for VC;        1 for BC
#define R       6 // modulus, in [2, 32768]
// static_assert(A > 0 && A % 8 == 5, "A mod 8 must be 5");
// static_assert(B > 0 && B % 2 == 1, "B mod 2 must be 1");
// static_assert(R >= 2 && R <= 32768, "R must be in [2, 32768]");

// decide modulo, reduced when R is a power of two
#define M ((uint32_t)(((R-1)&R)!=0 ? 0x8000 : R)<<16)

// Sequence of yn for VC (a=214013, b=2531011), n=6, seed=1639326023
// If R is a power of two, ceil(16/log2(R))+1 entries are enough
// Otherwise, that's ceil(31/log2(R)) entries, thus 12 for R=6.
const int16_t y[] = {
  0,5,3,0,4,3,1,0,4,5,4,4,4,5,5,3,0,2,0,5,4,5,0, // 0,2,5,1,3,5,5,5,
};

// modular inverse INVA of A modulo M
#define INVA (IN_A(IN_A(IN_A(IN_A((uint32_t)A))))&(M-1))
#define IN_A(v) (v*(2-v*A))

int main(void) {
  // decide starting x0 so that it matches y0
  const uint32_t y0 = y[0], y1 = y[1];
  int32_t x0 = (int32_t)(((M >> 16) - y0) / R * R + y0) << 16 | 0xffff;
  uint32_t found = 0;
  printf("generator: ((int)((x = %" PRIu32 " * x + %" PRIu32 ") >> 16) & 0x7fff) %% %u\n",
    (uint32_t)A, (uint32_t)B, (unsigned)R);
  while (x0 >= 0) {
    uint32_t x1 = A * (uint32_t)x0 + B;
    do {
      // assert( (x0 >> 16) % R == y0);
      // assert( x1 == A * (uint32_t)x0 + B);
      if (((x1 >> 16) & 0x7fff) % R == y1) {
        uint32_t x = x1;
        int n;
        for (n = 2; n < sizeof(y) / sizeof(*y); ++n)
          if ((((x = A * x + B) >> 16) & 0x7fff) % R != y[n])
            goto nxt;
        // found a solution
        x = (INVA * ((uint32_t)x0 - B)) & (M - 1);
        printf("x0 can be %" PRId32 ", that is seed=%" PRIu32
          " modulo 0x%" PRIx32 ", yielding:\n", x0, x, M);
        // prove out point by showing the output
        for (n = 0; n < sizeof(y) / sizeof(*y) + 8; ++n)
          printf(" %d", ((int)((x = A * x + B) >> 16) & 0x7fff) % R);
        printf("\n");
        ++found;
      }
    nxt: x1 -= (uint32_t)A;
    } while (((x0--) & 0xffff) != 0);
    x0 -= (int32_t)(R - 1) << 16;
  }
  printf("found %" PRIu32 " solution%s\n", found, found > 1 ? "s" : "");
  return 0;
}

// yielding:
//  generator: ((int)((x = 214013 * x + 2531011) >> 16) & 0x7fff) % 6
//  x0 can be 531633902, that is seed=1639326023 modulo 0x80000000, yielding:
//   2 3 4 1 5 1 1 5 4 0 3 2 2 5 5 3 5 5 3 4 0 1 1 4 1 3 3 2 5 4 4
//  found 1 solution
fairytale avatar
sz flag
If $x_n$ is 32768(1000 0000 0000 0000), bit 16 is 0, if $y_n\bmod 2$ is also 0, then this $y_n$ matches $x_n$. Then, I test next $x_n$. Am I correct? (I assume bits are counting from left if you don't say "low")
fgrieu avatar
ng flag
@fairytale: uh, no. See new section "_Notes on the above_", especially the last two sentences.
fairytale avatar
sz flag
My sequence is 4,5,0,1,4,3,4,5,1,1,0,2,1,2,3,1,0,2,5,2,0. Unfortunately, there is no found... I also tried 1103515245 and 12345 (glibc rand), but there is still no found. Sigh. The program is very old, written before 2006, also it is not related to serious purpose, so I believe it is not cryptographically-secure PRNG. Now I guess the author may used his own A,B,M, or he used Mersenne Twister.
fgrieu avatar
ng flag
@fairytale: the question as reformulated is answered in the negative. You have the executable ("_I checked the .exe files_"), thus there's still the course of action of reverse-engineering; de-compiling, or/and observing the program running in an instrumented environment. That's even more off-topic on crypto-SE than programming is. But perhaps on [reverseengineering-SE](https://reverseengineering.stackexchange.com/)?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.