I am reading Introduction to Modern Cryptography 3rd Edition (Google Books Preview of Relevant section, pages 10-11) and am struggling to understand the description of an attack method on a monoalphabetic substitution cipher.
It seems to be an improved version of a letter-frequency analysis approach, eliminating the need to "check what makes sense" manually. Some given information:
- All letters of the English language are assigned a value 0-25
- An unknown key, used to encode the plaintext to ciphertext, is made of a random, 1:1 mapping of letters from (1). e.g. (a -> x, b -> r, c -> o, ...)
The authors describe this approach as follows:
We now describe an attack that does not suffer from these drawbacks [of the manual frequency analysis approach]. As before, associate the letters of the English alphabet with $0, ..., 25$. Let $p_i$, with $0 <= p <= 1$, denote the frequency of the $ith$ letter in the normal English tex (i.e. 0.082 as an example). ... [using these figures] gives:
$(1)$ $\displaystyle \sum_{i=0}^{25} p_i^2 \approx 0.065$
Note: Where $0.065$ is contextual to a corpus of a given English text (i.e. may vary based on the source; e.g. Webster's dictionary words.)
Now, say we are given some ciphertext and let $q_i$ denote the frequency of the ith letter of the alphabet in the ciphertext divided by the length of the ciphertext. If the key is k, then $p_i$ should be roughly equal to $q_{i+k}$ for all i because the ith letter is mapped to the $\left(i+k\right)th$ letter. (we use $i+k$ instead of the more cumbersome $\left[I+k \mod 26\right]$.) Thus, if we compute
$(2)$ $I_j := \displaystyle \sum_{i=0}^{25} p_i \cdot q_{i+j}$
for each value of $j \in \left \{0, ..., 25\right\}$, then we expect to find that $I_k \approx 0.065$ (where k is the actual key), whereas $I_j$ for $j \neq k$ will be different from 0.065. This leads to a key-recovery attack that is easy to automate: compute $I_j$ for all $j$, and the output the value $k$ for which $I_k$ is closest to 0.065.
My question relates to the general approach of the second summation $(2)$ -- I don't understand what the value of $j$ is for each term of the summation. Is $j$ meant to be the ith term of a key $k$? Such that each term in $2$ would be calculated for each possible value of $j$?
I understand the frequency generation, understand what's going on the concept of $(1)$ and that finding the closest $q_i$ to $p_i$ would approach $p_i^2$ whereby the actual $k_i$ would minimize that -- thus finding the most appropriate mapping.
However, it appears this is just a brute-force approach given how the final calculation is being compared to the initial $0.065$ value, as though the proper mapping for each letter would have been taken into account.
Am I missing something? Otherwise I feel as though just sorting each frequency distribution by value, assuming the presence of an equal number of letters in the ciphertext to that of the source message space (i.e. no possible gaps in the distribution).