Score:3

How do I exactly use the Index of Coincidence in Cyphertext?

au flag

I came in contact with a ciphertext, which is:

        KCCPKBGUFDPHQTYAVINRRTMVGRKDNBVFDETDGILTXRGUD
        DKOTFMBPVGEGLTGCKQRACQCWDNAWCRXIZAKFTLEWRPTYC
        QKYVXCHKFTPONCQQRHJVAJUWETMCMSPKQDYHJVDAHCTRL
        SVSKCGCZQQDZXGSFRLSWCWSJTBHAFSIASPRJAHKJRJUMV
        GKMITZHFPDISPZLVLGWTFPLKKEBDPGCEBSHCTJRWXBAFS
        PEZQNRWXCVYCGAONWDDKACKAWBBIKFTIOVKCGGHJVLNHI
        FFSQESVYCLACNVRWBBIREPBBVFEXOSCDYGZWPFDTKFQIY
        CWHJVLNHIQIBTKHJVNPIST

However when I tried to apply the IC (index of coincidence) I couldn't understand why to use

        Column  2       3       4       5       6       7       8
        1       0.044   0.064   0.049   0.057   0.079   0.050   0.056
        2       0.0524  0.056   0.054   0.057   0.097   0.062   0.062
        3               0.057   0.049   0.048   0.066   0.063   0.057
        4                       0.060   0.049   0.082   0.061   0.063
        5                               0.057   0.060   0.064   0.062
        6                                       0.090   0.064   0.068
        7                                               0.061   0.063
        8                                                       0.077

given the fact that we know that HJV occurs 5 times above with intervals between occurrences equal to 18, 138, 54 and 12. Assuming the size of the key is 6, I continue to not comprehend the above table.

Score:2
ru flag

The index of coincidence is a measure of how different a collection of letters is from a random set based on repetitions. For an alphabet of 26 letters a completely random collection of letters will have index of coincidence about $1/26\approx 0.038$, for English language text the index of coincidence is about 0.067 (some sources don't normalise the index and will instead use values of 1 for random and about 1.73 for English).

If we look at the table, everything looks a bit higher than random, but the sixth column seems to be full of values that have a similar or greater index of coincidence to English (the text may be contrived to help the cryptanalysis). The way the table was generated is using the columns headings, the text is divided into that number of columns. Then looking down each column in turn the coincidence between pairs of letters is used to compute the index. So for example to create the sixth column of the table, we write

KCCPKB
GUFDPH
QTYAVI
NRRTMV
GRKDNB
VFDETD
GILTXR
GUDDKO
TFMBPV
GEGLTG
CKQRAC
QCWDNA
WCRXIZ
AKFTLE
WRPTYC
QKYVXC
HKFTPO
NCQQRH
JVAJUW
ETMCMS
PKQDYH
JVDAHC
TRLSVS
KCGCZQ
QDZXGS
FRLSWC
WSJTBH
AFSIAS
PRJAHK
JRJUMV
GKMITZ
HFPDIS
PZLVLG
WTFPLK
KEBDPG
CEBSHC
TJRWXB
AFSPEZ
QNRWXC
VYCGAO
NWDDKA
CKAWBB
IKFTIO
VKCGGH
JVLNHI
FFSQES
VYCLAC
NVRWBB
IREPBB
VFEXOS
CDYGZW
PFDTKF
QIYCWH
JVLNHI
QIBTKH
JVNPIS
T

and to compute the entry in the sixth column second row of the table we go down the second finding repeats RR, CC, RR, EE, KK, KK which is considerably more than we would expect for a list of 56 letters (where on average we would expect 2.15 repeats fo a random collection). Similarly counting repeats at distance 2, 3 etc. we can aggregate these into an estimate for the index of coincidence for each column. For example, in column 2 we see 5 Cs, 2 Ds, 3 Es, 8 Fs, 3 Is, 1 J, 9 Ks, 1 N, 8 Rs, 1 S, 3 Ts, 2 Us, 6 Vs, 1 W, 2 Ys and 1 Z (and no other letters). This means that if we count repeated pairs, there are 10 pairs of Cs, 1 pair of Ds, 3 pairs of Es, 28 pairs of Fs, 3 pairs of Is, 36 pairs of Ks, 28 pairs of Rs, 3 pairs of Ts, 1 pair of Us, 15 pairs of Vs and 1 pair of Ys (and no other pairs) for a total of 129 total repeated pairs our of 1540 possible pairs. Dividing 129 by 1540 gives a column sample index of 0.0838 (I'm not sure how the 0.097 in the table was calculated, but 0.0838 is still significantly higher than 1/26). Similar calculations for the rest of the entries of the table column six give 0.0649, 0.0838, 0.0494, 0.0649, 0.0429, 0.0733. Conversely our computation when we divide into seven columns is 0.0319, 0.0443, 0.0434, 0.0408, 0.0443, 0.0443, 0.0408 and five columns is 0.0439, 0.0443, 0.0325, 0.0353, and 0.0430. Column six clearly stands out.

The other columns show up as being higher than random for less pronounced effects of the Vigenère cipher, but the 6 column of the table stands out. This tells us that the key length is likely to be 6. We can further check this by taking histogram counts of the above columns and seeing that they look like shifts of the same alphabet.

It is possible to extend the table beyond 8 columns, but we find ourselves dealing with shorter collections of letters to compute our index over. There are more powerful statistical tests that can be used on the collections of letters, but the index of coincidence is quite easy to compute by hand and eye and so was popular with manual cryptanalysts.

João Víctor Melo avatar
au flag
Can you show explicitly the data and the calculation you are handling?
Daniel S avatar
ru flag
Could you tell me where the table in your question comes from? I’d like to relate it to the method used there.
João Víctor Melo avatar
au flag
https://www.cise.ufl.edu/~mssz/Class-Crypto-I/Homework/Homework-1.html
João Víctor Melo avatar
au flag
What do you mean by repeated pair?
Daniel S avatar
ru flag
A repeated pair is a pair of identical letters in the same column.
João Víctor Melo avatar
au flag
You said there was 5C's and after you say there are 10 pairs of C's?
Daniel S avatar
ru flag
Yes, because there are $({5\atop 2})=10$ ways to choose 2 things out of 5.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.