Im just starting out learning some cryptoanalysis techniques. I came across an idea which analyzes the vigenere cipher. Essentially the video explains that there is a standard english probability density function for each letter of the alphabet. And the letters used in the encryption of the message is called the key. And they have an effect of shifting the probability density function. The probabilities of each probability density function as a function of a letter key are represented using a vector e.g pdf probailities as a function of key letter A. Given pdf's generated from the same keys and different keys, calculate the probability of selecting letters which are the same. For example, key_pdf=A and key2_pdf=H, finding the probability of the letters being the same e.g key_pdf=A, selected_letter=d and key2_pdf=H, selected_letter=d key_pdf=A, selected_letter=d and key2_pdf=A, selected_letter=d. And that this is found from taking the dot product better the two pdf vectors of different letters and same letters. v1.v2 and v1.v1. It is found from the definition of the dot product that the probbability of selecting the same letter is larger when the keys are equivalent rather than different. Essentially measuring the probability of coincidence of selecting the same letter as a rsult of same keys or different key generation. The cipher text is then duplicated and shifted in order to determine the number of coloumns where the pdf's are the same. And the greatest number of the same desnity function identifies the length of the key.
I have a few problems with the last part. Why does the shift in the duplicated cipher text identify the key length? The only way to get the same selected cipher letter given two probbaility density functions generated from two keys which are the same is when both the original message letter are the same.
e.g
message and key
JONNYBIGWALK
CATCATCATCAT
JONNYBIGWALK
CATCATCATCAT
With no shift, the probability density functions match the most, which is seen from the matching keys and the letters are also equivalent for each column.
JONNYBIGWALK
CATCATCATCAT
JONNYBIGWALK
CATCATCATCAT
Now the probability density functions keys match on 3shifts but the letters of the original message do not match. Fair enough, the cipher letters are not displayed and it should be the matching of the cipher letters but the cipher letters are essentially derived from translation of the message letter by the same key C. So N+Cmod26 and J+Cmod26 such that N+Cmod26 != J+Cmod26, you can see that even when the proability density functions match generated by the same key, the letters of the original message or the cipher text do not match. So how can teh shfiting of the duplicated cipher text be used to identify the key length when they believe the same letter arises under the same column when shofting? Often the letters do not match anyway, in the above example, most of the letters do not match while we perform the shifting but the pdf's match every shift of 3. But originally we are only given the cipher message... It just doesnt seem robust for me, is there anything im missing here?
Thanks for taking your time, relaly appreciate it!