Score:2

How to exploit Java's RNG to find clusters?

sy flag

enter image description here

In the above image, you can see a coordinate grid containing some random green points. Each point has a pseudorandom 1/10 chance of being green. What I'm looking for are clusters of these green points within a radius of ~8 (ignore the inner mask shown). Said another way I am looking for statistically unlikely high density areas of these green points. At the core of this problem is the Java RNG found in java.util.Random (source here). The code for determining whether a point is green boils down to this hash function. The inputs are some constant, $k$, and the coordinates of the point, $x$ and $y$.

long seed = ((k + (long) (x * x * 4987142) + (long) (x * 5947611) + (long) (y * y) * 4392871L + (long) (y * 389711) ^ 987234911L) ^ 0x5DEECE66DL) & ((1L << 48) - 1);
int bits, val;
do
{
    seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
    bits = (int)((ulong)seed >> 17);
    val = bits % 10;
} while (bits - val + 9 < 0);

return val == 0;

There has been minor research on this problem in the past but I am not knowledgeable enough to contribute further. What was found was that potential clusters with small sizes of 2x2 and 3x3 create a pattern when comparing with different $k$ values. enter image description here

This may provide clues as to what coordinates a search should focus more compute given a certain $k$, but I am not convinced. As an exmaple, here is a heatmap of cluster sizes for a particular $k$. You can find more information about how these images were derived from here.

enter image description here

As of now I am just brute force checking the cluster count of each coordinate, and skipping a coordinate if the cluster is too low for the next one over to have a cluster of sufficient size, which is most of them because I am looking for statistical outliers.

What I am hoping is that there is some exploitable pattern in this algorithm, it is practical to reverse this hash in some way, or there are major optimizations to be had in my current method.

Maybe a possible avenue forward would be to see if the lattice pattern keeps persisting for larger and larger clusters, but another image on that post seems to indicate that it would just get lost in the noise.

kr flag
Why do you think this question is related to cryptography? The code uses pseudo RNG. This means, for the same seed it will always produce **the same** results. This is pure programming question and can be better answered on SO.
Paul Uszak avatar
cn flag
@mentallurg Hold on. Whilst very pretty, the underlying theme of this question is RNG manipulation /exploitation. That's an attack which is exactly on topic here. Variable $k$ varies by definition, and that's the crux. Many of the .SE forums overlap which I guess is a consequence of uncontrolled organic growth.
kr flag
@PaulUszak: The question is about **pseudo RNG**. The mentioned code uses class Random which implements a PRNG. One essential part of the code is, that for every new argument a new seed is calculated. Setting the same seed leads to generation of **the same** results. This has nothing to do with manipulation. And the question is asking about the way to **improve performance**.
Gabe avatar
sy flag
@mentallurg I do think this question is very much in the spirit of cryptography. I'm not asking for code to improve performance, I'm looking for a generalized shortcut. What's the point of a **pseudo-random-generator** tag if questions about PRNG are off topic???
Maarten Bodewes avatar
in flag
I can see this as somewhat helpful for performing analysis of PRNG's, although I'm wondering how much this goes into a research question instead of a question that can be answered objectively without research.
kr flag
@Gabe: 1) The tag on this site has sense because pseudo-random-generators are used for many cryptographic tasks. If PRNG is not used for cryptographic tasks, it is off topic this site. 2) You write "*I am just **brute force**... What I am hoping is that... there are **major optimizations***". This means, you are looking for performance optimization. If the problem was related to cryptography, this would be relevant question. But since you don't show any connection of your problem to cryptography, also this optimization is off topic on this site.
the default. avatar
id flag
This is not a cryptographic attack, but rewriting the code so that it doesn't recalculate `deltas` every time and reuses the results of past computations when incrementing x allowed it to process the same 100 million chunks in 2.3s on a far more modest Ryzen 2500U 8-thread CPU: https://pastebin.com/UuDpVPQg (compile with `-fopenmp` for multihtreading). Porting this back to OpenCL should make it fast enough for most practical purposes.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.