Score:6

Why is mod calculation necessary in a one time pad encryption?

de flag

Considering the English alphabets to be encrypted why is the computation of mod 26 necessary after adding the pad to plain text. Is it just that it adds to another level of encryption or is it used so that two letters may end up with the same symbol in the cypher? 3 mod 26 is 3 and 29 mod 26 is 3.

fgrieu avatar
ng flag
Hints: (1) with mod used, and then not used, ask yourself the following: what is the highest possible ciphertext value? When would it occur? Would an adversary noticing that value in the ciphertext learn something about the plaintext? (2) Dig deeper: assume plaintext consisting of random independent characters, and a random pad: how much information (in bit or fraction thereof per plaintext character) is learned from the plaintext with mod used, and not used?
jjj avatar
cn flag
jjj
No two letters end up with the same symbol. Think of it like the alphabet repeats after 25. So 26 would be "a" again. 3 and 29 correspont to the same letter. Mod just makes the number smaller, so you only need 0 to 25
Joshua avatar
cn flag
Am I the only one whose first use of One Time Pad used five digit numbers with the numbers being nowhere close to either end? If it was random Gaussian rather than random uniform it leaks very little information.
Paul Uszak avatar
cn flag
@Joshua No, you can't have a Gaussian OTP key. Unless you're missing the randomness extractor from your TRNG
Score:21
fr flag

There are two main reasons.

First, when we encrypt data with a symmetric algorithm, we generally want each unit to encrypt or decrypt to a unit of the same size (ignoring padding and MACs). In your case, when we're using English letters, we'd want to also get English letters out, and not a set of random numbers. Similarly, when we're encrypting a byte, we also want to get a byte out, since computers usually work with bytes and it's most convenient to process them that way.

Secondly, and more important, not using modular arithmetic here leaks information, sometimes a lot of information, about the data. For example, if we're using the range 0-25 to represent our letters, if we see a 0 as the encrypted output, we know that both the pad and the input were 0, and if we see 50, we know that both the pad and the input were 25. Similarly, 49 tells us that the two numbers involved were 24 and 25 in some order. With that type of information and statistical analysis, we can probably decrypt the ciphertext.

However, if we used modular arithmetic, then the output value doesn't teach us anything about the pad or the input, since every output value is equally likely. If the pad is truly random and used only once, then it provides perfect confidentiality.

fgrieu avatar
ng flag
"we can probably decrypt the ciphertext": assuming English without space, I think that's going to be challenging. Studying when that can be done is actually an interesting problem!
jp flag
@fgrieu Challenging, but not necessarily impossible.
John Coleman avatar
jp flag
@user253751 With a random key stream I would say that a ciphertext only attack which succeeds in recovering the plaintext is typically impossible. What you can do is rule out some candidate plaintexts, so it definitely leaks information, but not generally enough for a complete break. For some shorter messages there might be enough information for a lucky guess, especially if there is some prior information about the likely content of the plaintext.
fgrieu avatar
ng flag
If the plaintext was random, I get we'd learn ≈0,7191 bit/symbol, which I think is not enough to reconstruct English even semi-reliably. How much we learn for a particular language plaintext depends on letter frequencies and the value assigned to each letter, but I think it remains significantly less than 1 bit/symbol, and that's still not much.
cn flag
Given that the question asks specifically about mod 26, I assume the message would be in plain english with punctuation and spaces not encrypted. This would be silly if you're trying for real security. You could create tables based on the likelihood of each letter given the value and take into account how often the letter occurs and what range is possible. You could do the same for words and constrain it with the possible and likely letters to come up with a likelihood for each word. Then you could constrain that with parsing the language, and any information you know about the message...
cn flag
For example say the original message is 'a-z' and without mod the values could be added to be 'A-Z' (i.e. 'e' is 4, adding 21 gives 'z', adding 22 gives 'A', etc.). If the ciphertext has a 'd', you know the letter has to be a, b, c, or d. If the ciphertext for a word is 'aB', you know that it is 'a' and some other letter from c to z. That could be 'at', 'an', 'as', but not 'aj', 'ao', 'ap' because those aren't words. 'as' is an advertb or conjunction and the 17th most popular english word, 'at' is the 20th and a preposition, 'an' is 32nd and an article. You could not decrypt the...
cn flag
message with 100% certainty in most cases, but you could write software to come up with the most *likely* decryptions without too much work.
John Coleman avatar
jp flag
@JasonGoemaat In classical cryptography, spaces and punctuation are *suppressed* rather than left unencrypted. Of course this is silly from the point of view of true security, but the unicity distance with a random key would still be close to the whole key length. You could come up with a notion of the most likely decryption, but given the astronomically large number of possible decryptions the probability in question could be well below 1%.
dan04 avatar
in flag
Well, to test this idea, here's a message where each letter is encoded A=0...Z=25 plus a pseudorandom “key” number between 0 and 25: 13-28-30-10-38-22-38-21-36-8-21-15-15-30-39-14-28-16-19-20-39-19-25-18-14-21-26-36-41-31-23-24-21-22-14-14-17-35-11-7-31-28-23-11-36-26-42-20-40-29-12-11-31-28-37.
John Coleman avatar
jp flag
@dan04 interesting idea. There are `37979562564556140050029561655702064479432532295680000000000000000000` possible decryptions, though the vast majority wouldn't correspond to sensible English. Looking at just the first three letters it is easy to find multiple 3-letter words which could encrypt that way (and of course it need not start with a 3-letter word). To make more serious progress I would construct a [DAWG](https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton) for a large word list and write a script for turning cipher-text initial sequences into paths through it.
jp flag
@dan04 I'm going to assume it's not "congratulations, dramatic illustration, milfhunter privileges" (yes "milfhunter" seriously appears in google's 10k word list). But "congratulations" is a very likely starting word given the context, and "dramatic-" is also reasonably likely. Long words are much more likely to be guessed correctly because there are more constraints on them. Of course, it could just be a coincidence that "congratulations" fits. I did not try smaller words such as "congratulations, you have" or "congratulations, this is"
dan04 avatar
in flag
@user253751: “Congratulations” is indeed the intended first word. Now you just need to find a combination of other English words that (after removing punctuation and spaces) matches the regex `[A-O][D-Z][A-Q][A-T][A-U][O-Z][A-T][A-Z][A-S][A-O][A-V][B-Z][L-Z][Q-Z][G-Z][A-X][A-Y][A-V][A-W][A-O][A-O][A-R][K-Z][A-L][A-H][G-Z][D-Z][A-X][A-L][L-Z][B-Z][R-Z][A-U][P-Z][E-Z][A-M][A-L][G-Z][D-Z][M-Z]`
jp flag
@dan04 "dramatic illustration, milfhunter privileges" is such a combination :P
John Coleman avatar
jp flag
@dan04 'congratulations: dictionaries provide many choices: hunt secret' is a possible decryption for which I have zero confidence. I wrote a script to find matches in segments of the cipher text using the 10K word list and was surprised at how many there were. Unless my script is buggy, there are nearly 2000 possibilities for the next word after 'congratulations'. Lucky guesses aside, this encryption method seems resistant to complete recovery of the plaintext (which of course doesn't mean it is any good).
Score:19
jp flag
Fax

Not using modulus leaks information.

For the English language, it's not so obvious. For an image, on the other hand...

what appears to be random noise on the left, and a noisy yet clearly identifiable tux on the right

Left half uses addition and modulus, right half uses addition and division by two (i.e. half brightness).

Paul Uszak avatar
cn flag
Not sure that young TUX is an appropriate example (re. OTPs), as uneven character values can't be divided into integers. So of course you've rounded your RBG values (integers). You really need a completely different algorithm/character/key mapping. Which then wouldn't be uniformly random, but rather a group of some sort.
dan04 avatar
in flag
@PaulUszak: To be technically correct, the output format would need to support RGB values in the range [0, 510], and use lossless compression. The image is just being approximated here because of technical restrictions. Point is, if you “encrypt” a raster image and get something that's easily recognizable to the human eye (albeit with random “noise”), then you have a really bad encryption algorithm.
Fax avatar
jp flag
Fax
@PaulUszak It's certainly possible to produce a 9-bit lossless image without division and rounding, but even if you have a monitor capable of displaying it, it won't change the conclusion.
Score:3
cn flag

It's not.

Encryption/decryption can be done in anyway you can think of. Maths isn't needed at all. All it represents is a 1:1 mapping (bijection) between the message and cipher text. The only important thing is that the key material is generated through a truly random (physical) process. It's just that either a modulo operation or XOR makes it simpler for a computer implementation.

This is encryption/decryption without maths using a reciprocal DIANA table:-

table

Also realise that there are more than 26 characters in cryptography. There are numbers, punctuation (very important as it can change the meaning of a sentence when declaring war on someone). Then there are enumerated codewords, like 73 (meaning something to someone). From the wiki page: " The JN-25 code used in World War II used a code book of 30,000 code groups superencrypted with 30,000 random additives. ". You might heretofore perform the mapping modulo 10, on a digit by digit basis.

Maarten Bodewes avatar
in flag
The question reads: "why is the computation of mod 26 necessary *after adding the pad to plain text*". Your post is certainly interesting, but it doesn't answer the question. The second part of your answer is also at odds with the assumption in the first line of the question: "*Considering the English alphabets* to be encrypted"
ilkkachu avatar
ws flag
and if you look into how the table is generated, you see e.g. under column `N` the two entries `Ma` and `Nz` next to each other. That tells of modular arithmetic.
Paul Uszak avatar
cn flag
@MaartenBodewes My answer is correct because the question is moot. We can do "A/9" $\to$ which only involves friendship, and no maths.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.