Score:17

Examples of frauds discovered because someone tried to mimic a random sequence

la flag

[Moderator note: this question now lives there]

So, I'm preparing a talk about the well known fact that humans are bad at the task of generating uniformly random sequences of numbers when asked to do so, which is a huge flaw for simple cryptographic systems.

I would like to spice the talk a bit by present some real cases where perhaps some tax fraud or bad science was revealed by a simple frequency analysis of the compromised data. For example a case were a scientist might have displaced some data points to better fit a specific conclusion and that it was discovered by analyzing the end digits of the values he manipulated. Or perhaps some person that changes a few small numbers here and there while working for a bank to get some money flowing to his account and then was discovered by another simple analysis.

In short, I would like to know easy to explain examples of people been caught in some fraud because they though that they were able to emulate random numbers by themselves.

I don't know if this is the correct place to ask for this kind of question but Physics and Law stacks seemed less related.

Geoffroy Couteau avatar
cn flag
Nothing comes to mind that fits perfectly in what you ask, but there are several nice stories involving WW2 or the cold war where spies were identified or important communications decrypted because of some reuse of one-time pads. Would that fit your storyline?
Swike avatar
la flag
@GeoffroyCouteau It certainly would be nice to have some of those indeed
Swike avatar
la flag
I've found a story that might fit my question: https://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html The problem is that I don't know if he was caught because of that particular time he fabricated data by generating "random" numbers by himself.
Vilx- avatar
cn flag
Well... not exactly, but messing with random number generators and subsequent analysis have been used to expose some cheaters in computer game speedrunning competitions. In particular, look up Karl Jobst on youtube. I know of at least 3 videos so far. 2 were about the Dream Drama back 2020 (one condemning Dream and one somewhat exonerating him later when it became clear that it was something of an accident); and another one 13 days ago about a new cheater that has been freshly caught.
FoundABetterName avatar
ws flag
How about this? https://www.youtube.com/watch?v=Qe5WT22-AO8
D.W. avatar
fr flag
https://en.wikipedia.org/wiki/Benford%27s_law#Applications
D.W. avatar
fr flag
I’m voting to close this question because it is not a question about cryptography within the scope of the site.
Swike avatar
la flag
@D.W. Where should I ask then a question related to shannon entropy and people trying to deceive by making a wrong sampling of the uniform distribution? Stack Statistics apparently was not the place, nor computer theory stack.
D.W. avatar
fr flag
Not every question has a home somewhere on the Stack Exchange network. Even if it is not on-topic anywhere else on Stack Exchange, that doesn't make it on-topic here.
Flydog57 avatar
ax flag
Look up the the Wikipedia article on the Montreal Casino (https://en.m.wikipedia.org/wiki/Montreal_Casino), then scroll down to _"Keno Scandal"_. It turns out that running a casino using `Rand` is a bad idea. I've used it many times as an example of the importance of a cryptographically strong RNG
gnasher729 avatar
kz flag
There was the Egyptian Covid-"scientist" not long ago who forged the data for his conclusions that horse manure or something similar can be used to treat Covid. Was found out purely by statistical analysis: He swapped the results for vaccination and horse dewormer, then mad 7 copies of the data, filling names and birthdates "randomly" including people born on the 31st of June.
Paul Uszak avatar
cn flag
@D.W. Hi: Have you noticed how comments criticizing censorship on this forum seem to get deleted? Very Musk...
Score:13
cn flag

Following my comment, and even though it's a bit different from what you ask: I really enjoyed the story here where the use of an incorrect pseudorandom number generator led to the arrest of members of a "russian espionnage network". Disclaimer: I have no idea how truthful all of this is, I'm leaving you the task of checking how serious the book "compromised" is (that's the book this blog post is based upon, I did not read it).

Roughly, a very stupid PRG was apparently used to create "filler transmission", in order to hide when an actual transmission was happening. The trouble is, this broken PRG was producing decimal digits from 0 to 8 - i.e., you never had any 9! This observation allowed the FBI to distinguish transmissions from fillers, which opened the door to a traffic analysis. Eventually, they identified that communications were happening periodically for a fixed duration, always at the same time slot, and correlated that with the schedule of suspects (e.g. observing that they were never out of their home when a communication was happening), which led to their arrest.

There is also a lot of documents on the VENONA project, which was dedicated to decrypting encrypted transmissions where a one-time pad was reused, you'll find plenty resources on this one online.

Score:12
cn flag

Actually, bank and expenses fraud is identified for the very opposite reason.

Many of human societal numbers begin with a small digit, with number 1 appearing as the leading significant digit about 30% of the time. It would occur approximately one third as frequently in a random situation (11.1%).

This phenomenon is called Benford's law and produces the following distribution for the first three digits of many human made numbers:-

enter image description here

So I guess that in the generalized case, humans are poor at not just generating uniformly distributed random numbers, but poor at generating any specific distribution.


I haven't done this, but it would be interesting to manually generate a sequence according to a Normal distribution and then test it for normality.

Swike avatar
la flag
That is surely interesting but my question in particular is about uniform distributions. I agree that Benford's Law shows up in leading digit values but what about frauds that involve not changeing the leading digits but the last two of them? I would like examples like that
quarague avatar
ke flag
Benford's law doesn't depend on the numbers being human made, it occurs due to some sort of scaling invariance. If for example you measure the typical weight of various different animals and look at the digit distribution of the first significant digit, this distribution will be independent of whether you measure in kilograms or pounds or ounces or whatever. This invariance gives rise to Benford's law. No human interaction needed.
Paul Uszak avatar
cn flag
@quarague Really? What if you measured in octal rather than decimal?
Neil Slater avatar
in flag
@PaulUszak Then the distribution of leading digits will be similar to Benford's law in decimal, but the percentages different. Taken to the extreme, in binary Benford's law is not so useful - https://www.linkedin.com/pulse/benfords-law-binary-data-daniel-mccarville/ You can re-phrase Benfords law as e.g. a match to some lognormal distribution. The interaction with leading digits is a consequence of that distribution
Mark avatar
jp flag
@Swike, people are actually pretty good at generating uniform random distributions in the sense that each digit shows up equally often (the errors tend to be second-order, where things like sequences are far less common than they should be). This means that most fraud is spotted when a uniform distribution is seen where something like a Benford or Zipf distribution would be expected.
Grooke avatar
pn flag
@Swike The book Humble Pi by Matt Parker has a chapter on errors due to randomness (Chapter 12). In particular there is an example of a restaurant that made up their daily sales totals. The lead digits did not match Benford's law, and the trailing 2 digits (which should be uniformly distributed) had a large spike at "40".
Score:12
md flag

Since you mentioned you're also interested in crypto examples, here's one from probably the most famous cipher machine in history: Enigma.

Most of Enigma's key settings were distributed in a keysheet and changed daily but there was also a per-message key that had to be randomly chosen by the operator, corresponding to the initial rotor positions: Enigma machine (three white squares in the middle of this picture, you can just about make out the settings VGT - these actual positions changed after every letter but that's not relevant here)

To transmit these the sender must create six random letters and send the first three (Grundstellung) in plain, then the next three (Spruchschlüssel) encrypted using the settings of the Grundstellung. The Spruchschlüssel would become the actual initial settings and wasn't particularly useful without the daily key settings but could be used as a first step in cracking. It was therefore very important that it was not predictable (like AAA or your initials), not related to the Grundstellung (the same or two halfs of a 6 letter word) and neither part was reused between messages.

Obviously all three of these things happened a lot and while it wasn't the biggest flaw in the system it certainly helped cracking some messages faster.

More details: https://www.bbvaopenmind.com/en/technology/innovation/the-human-errors-that-defeated-enigma/

Tangurena avatar
ae flag
The most common 6 letter words chosen were London, Berlin and Hitler. Those were bad enough that Memos! Were! Sent! The folks at Bletchley Park were happy.
Score:9
cn flag
ttw

Allan Franklin wrote about this in "The Mendel-Fisher Controversy." As practical tests of results against theoretical distributions became better, Sir R. A. Fisher noted that Mendel's results were "too good" according to the expected variation that would be noted in a real experiment. At the time Mendel did his work, not as many goodness-of-fit tests were available and the idea of fitting "too well" wasn't so claer.

Score:2
ve flag

Not necessarily discovering fraud but causing it: people who played online casino games in the early days used to predict slot and poker cards by knowing the random number generator logic being used.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.