Score:1

Analysis of the Vigenere cipher

eg flag

Im just starting out learning some cryptoanalysis techniques. I came across an idea which analyzes the vigenere cipher. Essentially the video explains that there is a standard english probability density function for each letter of the alphabet. And the letters used in the encryption of the message is called the key. And they have an effect of shifting the probability density function. The probabilities of each probability density function as a function of a letter key are represented using a vector e.g pdf probailities as a function of key letter A. Given pdf's generated from the same keys and different keys, calculate the probability of selecting letters which are the same. For example, key_pdf=A and key2_pdf=H, finding the probability of the letters being the same e.g key_pdf=A, selected_letter=d and key2_pdf=H, selected_letter=d key_pdf=A, selected_letter=d and key2_pdf=A, selected_letter=d. And that this is found from taking the dot product better the two pdf vectors of different letters and same letters. v1.v2 and v1.v1. It is found from the definition of the dot product that the probbability of selecting the same letter is larger when the keys are equivalent rather than different. Essentially measuring the probability of coincidence of selecting the same letter as a rsult of same keys or different key generation. The cipher text is then duplicated and shifted in order to determine the number of coloumns where the pdf's are the same. And the greatest number of the same desnity function identifies the length of the key.

I have a few problems with the last part. Why does the shift in the duplicated cipher text identify the key length? The only way to get the same selected cipher letter given two probbaility density functions generated from two keys which are the same is when both the original message letter are the same.

e.g message and key JONNYBIGWALK

CATCATCATCAT

JONNYBIGWALK

CATCATCATCAT

With no shift, the probability density functions match the most, which is seen from the matching keys and the letters are also equivalent for each column.

JONNYBIGWALK

CATCATCATCAT

JONNYBIGWALK

CATCATCATCAT

Now the probability density functions keys match on 3shifts but the letters of the original message do not match. Fair enough, the cipher letters are not displayed and it should be the matching of the cipher letters but the cipher letters are essentially derived from translation of the message letter by the same key C. So N+Cmod26 and J+Cmod26 such that N+Cmod26 != J+Cmod26, you can see that even when the proability density functions match generated by the same key, the letters of the original message or the cipher text do not match. So how can teh shfiting of the duplicated cipher text be used to identify the key length when they believe the same letter arises under the same column when shofting? Often the letters do not match anyway, in the above example, most of the letters do not match while we perform the shifting but the pdf's match every shift of 3. But originally we are only given the cipher message... It just doesnt seem robust for me, is there anything im missing here?

Thanks for taking your time, relaly appreciate it!

enter image description here enter image description here

ph flag
If I understand right, you want to run the incidence counts on the ciphertext and shifts of it. What is the ciphertext here? I think "JONNYBIGWALK" is your message and "CAT" is your key, right?
ThreadBucks avatar
eg flag
Yes but im asking why does correlating the shifted cipher and the cipher itself determine the whether they have the same key
ph flag
Have you seen https://en.wikipedia.org/wiki/Index_of_coincidence ?
ThreadBucks avatar
eg flag
no, thanks for that though
Score:0
ph flag

The Wikipedia page for Index of Coincidence is a good start. To summarize, if you lay two texts from a natural language next to each other and count the rate at which the characters coincide, you will get (approximately) a particular value that varies language to language. If you encrypt both texts with the same monoalphabetic substitution cipher, you get the same value, because the same positions will coincide - if they match before encryption, they will match after. If you have two texts encrypted with different monoalphabetic substitution ciphers, you would expect the coincidence rate to be approximately random chance (1/26 for English).

The idea is that you can do this same calculation with only 1 ciphertext, if you shift it and lay it on top of itself. If the shift is a multiple of the key length, the characters in each position were encrypted with the same substitution, and so you would expect to see the higher coincidence rate. If the shift is not a multiple of the key length, the aligned characters will be uncorrelated and you would expect to see something closer to random (i.e. 1/26).

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.