Score:0

Decrypting Mono-Alphabetic Substitution Ciphertext

pr flag

I am reading Introduction to Modern Cryptography 3rd Edition (Google Books Preview of Relevant section, pages 10-11) and am struggling to understand the description of an attack method on a monoalphabetic substitution cipher.

It seems to be an improved version of a letter-frequency analysis approach, eliminating the need to "check what makes sense" manually. Some given information:

  1. All letters of the English language are assigned a value 0-25
  2. An unknown key, used to encode the plaintext to ciphertext, is made of a random, 1:1 mapping of letters from (1). e.g. (a -> x, b -> r, c -> o, ...)

The authors describe this approach as follows:

We now describe an attack that does not suffer from these drawbacks [of the manual frequency analysis approach]. As before, associate the letters of the English alphabet with $0, ..., 25$. Let $p_i$, with $0 <= p <= 1$, denote the frequency of the $ith$ letter in the normal English tex (i.e. 0.082 as an example). ... [using these figures] gives:

$(1)$ $\displaystyle \sum_{i=0}^{25} p_i^2 \approx 0.065$

Note: Where $0.065$ is contextual to a corpus of a given English text (i.e. may vary based on the source; e.g. Webster's dictionary words.)

Now, say we are given some ciphertext and let $q_i$ denote the frequency of the ith letter of the alphabet in the ciphertext divided by the length of the ciphertext. If the key is k, then $p_i$ should be roughly equal to $q_{i+k}$ for all i because the ith letter is mapped to the $\left(i+k\right)th$ letter. (we use $i+k$ instead of the more cumbersome $\left[I+k \mod 26\right]$.) Thus, if we compute

$(2)$ $I_j := \displaystyle \sum_{i=0}^{25} p_i \cdot q_{i+j}$

for each value of $j \in \left \{0, ..., 25\right\}$, then we expect to find that $I_k \approx 0.065$ (where k is the actual key), whereas $I_j$ for $j \neq k$ will be different from 0.065. This leads to a key-recovery attack that is easy to automate: compute $I_j$ for all $j$, and the output the value $k$ for which $I_k$ is closest to 0.065.

My question relates to the general approach of the second summation $(2)$ -- I don't understand what the value of $j$ is for each term of the summation. Is $j$ meant to be the ith term of a key $k$? Such that each term in $2$ would be calculated for each possible value of $j$?

I understand the frequency generation, understand what's going on the concept of $(1)$ and that finding the closest $q_i$ to $p_i$ would approach $p_i^2$ whereby the actual $k_i$ would minimize that -- thus finding the most appropriate mapping.

However, it appears this is just a brute-force approach given how the final calculation is being compared to the initial $0.065$ value, as though the proper mapping for each letter would have been taken into account.

Am I missing something? Otherwise I feel as though just sorting each frequency distribution by value, assuming the presence of an equal number of letters in the ciphertext to that of the source message space (i.e. no possible gaps in the distribution).

Score:2
ru flag

The section is not describing an attack on a general monoalphabetic substitution cipher, but on a shift cipher (Caesar cipher) where each letter is cyclicly advanced by the same amount through the alphabet.

The $j$ value represents a choice of one of the 26 possible values of the shift key i.e. the key which would map the $i$th letter to the $(i+j)$th letter mod 26. This mapping would hold for all $i$ as we're talking about a shift cipher.

For a general substitution cipher the key would be a permutation $\pi(x)$ and the equivalent approach would be to evaluate all 26! possible choice of $\pi(x)$ according to $$I_\pi:=\sum_{i=0}^{25}p_iq_{\pi(i)}.$$ This not an easy calculation, and a lot of ciphertext would be necessary for the correct $q_{\pi(i)}$ values to standout from other candidates.

alphazwest avatar
pr flag
You are 100% correct. The arrangement of the text is what confused me -- they presented the shift cipher, then the substitution, frequency-based substitution solving, *then* "An improved attack on the shift cipher." It makes complete sense in the context of a shift cipher.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.