We will now consider the class of substitution ciphers where a number of alphabets are used, the number and choice of alphabets depending on a key word or equivalent and being used periodically throughout the message. In this class belong the methods of Vigenere, Porta, Beaufort, St. Cyr, and many others. These methods date back several hundred years, but variations of them are constantly appearing as new ciphers. The Larrabee cipher, used for communication between government departments, is the Vigenere cipher of the 17th Century. The cipher disk method is practically the Vigenere cipher with reversed alphabets. In using these ciphers, there is provided a number of different cipher alphabets, usually twenty-six, and each cipher alphabet is identified by a different letter or number. A key word or phrase (or key number) is agreed upon by the correspondents. The message to be enciphered is written in lines containing a number of letters which is a multiple of the number of letters of the key. The key is written as the first line. Then each column under a letter of the key is enciphered by the cipher alphabet pertaining to that letter of the key. For example, let us take the message, “All radio messages must hereafter be put in cipher,” with the key Grant, using the Vigenere cipher alphabets given below. Each of these alphabets is identified by the first or left hand letter which represents A of the text. We thus will
Using the alphabet indicated by G, we get
Continuing for the other alphabets, we get
This method of arranging the message into lines and columns and then enciphering whole columns with each cipher alphabet is much shorter than the method of handling each letter of the message separately. The chance of error is also greatly reduced. All these cipher methods can be operated by means of squares containing the various alphabets, cipher disks or arrangements of fixed and sliding alphabets. For example, this was the original cipher of Vigenere:
The first horizontal alphabet is the alphabet of the plain text. Each substitution alphabet is designated by the letter at the left of a horizontal line. For example, if the key word is BAD, the second, first and fourth alphabets are used in turn and the word WILL is enciphered XIOM. The Larrabee cipher is merely a slightly different arrangement of the Vigenere cipher and is printed on a card in this form:
The large letters at the left are the letters of the key word. It will be noted that these letters correspond to the first letters of the cipher alphabets (in small letters) as in the Vigenere cipher. A much simpler arrangement of the Vigenere cipher is the use of a fixed and sliding alphabet. Either the fixed or sliding alphabet must be double in order to get coincidence for every letter when A is set to the letter of the key word.
As shown here, A of the fixed or text alphabet coincides with T of the movable cipher alphabet. This is the setting where T is the letter of the key word in use. The lower movable alphabet is moved for each letter of the message and the A of the fixed alphabet is made to coincide in turn with each letter of the key before the corresponding letter of the text is enciphered. It is obviously only a step from this arrangement to that of a cipher disk, where the The well known U. S. Army Cipher Disk has just such an arrangement of the fixed alphabet but the alphabet of the disk is reversed. This has several advantages in simplicity of operation but none in increasing the indecipherability of the cipher prepared with it. The arrangement of fixed and sliding alphabets which is equivalent to the U. S. Army cipher disk is this:
It will be noticed that with this arrangement of running the alphabets in opposite directions, it becomes immaterial which alphabet is used for the text and which for the cipher for if A = G then G = A. This is not true of the Vigenere cipher. It is perfectly feasible to substitute a card for the U. S. Army cipher disk. It would have this form:
The first horizontal line is the alphabet of the text. The other twenty-six lines are the cipher alphabets each corresponding to the letter of the key word which is at the left of the line. One of the ciphers of Porta was prepared with a card of this kind:
In this cipher the large letters at the left correspond to the letters of the key and, in each alphabet, the lower letter is substituted for the upper and vice versa. For example, with key BAD to encipher WILL we would get JVXY. Note that with either B or A as the key letter, the first alphabet would be used. A combination of the Vigenere and Porta ciphers is this:
Here again the large letters at the left correspond to the letters of the key and, in each pair of alphabets, the upper one is that of the plain text and the lower is that of the cipher. This cipher can also be operated by a fixed and sliding alphabet.
The other ciphers mentioned are merely variations of these that have been discussed. It is immaterial, in the following analysis, which variety has been used. The analysis is really based on what can be done with a cipher made up with a mixed cipher alphabet which may be moved with reference to the fixed alphabet of the text, (See Case 7-b). Clearly this is a much more difficult proposition than dealing with a cipher in which the cipher alphabets run in their regular sequence, either backward or forward. In fact, in the analysis of Case 7, we may consider any cipher prepared by the method of Vigenere or any of its variations as a special and simple case. It was long ago discovered that, in any cipher of this class, (1) two like groups of letters in the cipher are most probably the result of two like groups of letters of the text enciphered by the same alphabets and (2) the number of letters in one group plus the number of letters to the beginning of the second group is a multiple of the number of alphabets used. It is evident, of course, that we may have similar groups in the cipher which are not the result of enciphering Changing the key word and message to illustrate more clearly the above points, the following is quoted from the Signal Book, 1914, with reference to the use of the cipher disk in preparing a message with a key word. “—This simple disk can be used with a cipher word or, preferably, cipher words, known only to the correspondents.... Using the key word ‘disk’ to encipher the message ‘Artillery commander will order all guns withdrawn,’ we will proceed as follows: Write out the message to be enciphered and above it write the key word ... letter over letter, thus:
“Now bring the ‘a’ of the upper disk under the first letter of the key word on the lower disk, in this case ‘D’. The first letter of the message to be enciphered is ‘A’: ‘d’ is found to be the letter connected with ‘A’, and it is put down as the first cipher letter. The letter ‘a’ is then brought under ‘I’ which is the second letter of the key word. ‘R’ is to be enciphered and ‘r’ is found to be the second cipher letter.... Proceed in this manner until the last letter of the key word is used and beginning again with the letter ‘D’, so continue until all letters of “DRZCS XOTFG EYRIF HZRWC SXETA EBKSX MQQQW CKBPT DMF.” So much for the Signal Book; now let us examine the above message for pairs or similar groups and count the intervening letters to demonstrate principles (1) and (2);
The key word might contain 2, 4 or 8 letters from the evidence but we may eliminate 2 as unlikely and preparation of frequency tables of each of the four alphabets would soon show that 4 is the correct number. A later and more extensive example (Case 7-a) will show pairs not separated by multiples of the number of alphabets used, but the evidence in nearly every case will be practically conclusive. Especially is this so if chance assists us by giving groups of three or more letters like the group CSX in the above example. The number of alphabets having been determined each alphabet is handled by the methods of Case 6 already discussed. Case 7-a.—The following message appeared in the “personal” column of a London paper: “M. B. Will deposit £27 14s 5d tomorrow,” and the next day we find this one: M.B. CT OSB UHGI TP IPEWF H CEWIL NSTTLE FJNVX XTYLS FWKKHI BJLSI SQ VOI BKSM XMKUL SK NVPONPN GSW OL. IEAG NPSI HYJISFZ CYY NPUXQG TPRJA VXMXI AP EHVPPR TH WPPNEL. UVZUA MMYVSF KNTS ZSZ UAJPQ DLMMJXL JR RA PORTELOGJ CSULTWNI XMKUHW XGLN ELCPOWY OL. ULJTL BVJ TLBWTPZ XLD K ZISZNK OSY DL RYJUAJSSGK. TLFNS UVD VV FQGCYL FJHVSI YJL NEXV PO WTOL PYYYHSH GQBOH AGZTIQ EYFAX YPMP SQA CI XEYVXNPPAII UV TLFTWMC FU WBWXGUHIWU. AIIWG HSI YJVTI BJV XMQN SFX DQB LRTY TZ QTXLNISVZ. GIFT AII UQSJGJ OHZ XFOWFV BKAI CTWY DSWTLTTTPKFRHG IVX QCAFV TP DIIS JBF ESF JSC MCCF HNGK ESBP DJPQ NLU CTW ROSB CSM. The messages in question appeared in an English newspaper. It is fair to presume then that the cipher is in English. This is checked negatively by the fact that it contains the letter W which is not used in any of the Latin languages and that the last fifteen words of the message consist of from two to four letters each, an impossible thing in German. It contains 108 groups which are probably words, as there are 473 letters or an average of 4.4 letters per group, while we normally expect an average of about 5 letters per group. The vowels AEIOU number 90 and the letters JKQXZ number 78. It is thus a substitution cipher (20% of 473=94.6). Recurring words and similar groups are AIIWG, AII; BKSM, BKAI; CT, CTWY, CTW; DLMMJXL, DL; ESF, ESBP; FJNVX, FJHVSI; NPSI, NPUXQG; OSB, OSY, ROSB; OL, OL; PORTELOGJ, PO; SQ, SQA; TP, TP; TLBWTPZ, TLFNS, TLFTWMC; UVZUA, UVD, UV; XMKUL, XMKUHW; YJL, YJVTI.
This clearly eliminates Cases 4, 5 and 6. Referring to the recurring words and groups above noted, we figure the number of letters between each.
The dominant factor is clearly 5, so we may consider that five alphabets were used, indicating a keyword of five letters. Writing the message in lines of five letters each and making a frequency table for each of the five columns so formed, we find the following:
In the table for Column 1, the letter G occurs 9 In the next table, L occurs 19 times and taking it for E with the alphabet running in the same way, A=H. The first word of our message, CT, thus becomes AM when deciphered with these two alphabets and the first two letters of the key are C H. Similarly in the third table we may take either F or O for E, but a casual examination shows that the former is correct and A=B (even if we were looking for a vowel for the next letter of the keyword). In the fourth table, I is clearly E and A=E. The fifth table shows T=14 and J=9. If we take T=E we find that we would have many letters which should not occur. On the other hand, if we take J=E then T=O and in view of the many E’s already accounted for in the other columns, this may be all right. It checks as correct if we apply the last three alphabets to the second word of our message, OSB, which deciphers NOW. Using these alphabets to decipher the whole message, we find it to read: “M. B. Am now safe on board a barge moored below Tower Bridge where no one will think of looking for me. Have good friends but little money owing to action of police. Trust, little girl, you still believe in my innocence although things seem against me. There are reasons why I should not be questioned. Shall try to embark before the mast in some outward bound vessel. Crews will not be scrutinized so sharply as passengers. There are those who will let you know my movements. Fear the police may The key to this message is CHBEF which is not intelligible as a word but if put into figures indicating that the 2d, 7th, 1st, 4th, and 5th letter beyond the corresponding letter of the message has been used the key becomes 27145 and we may connect it with the “personal” which appeared in the same paper the day before reading: “M. B. Will deposit £27 14s 5d tomorrow.” Case 7-b. Message
On the preliminary determination, we have the following count of letters out of a total of 385:
Every letter except K and W occurs at least six times. We may say then that it is a substitution cipher, Spanish text, and certainly not Case 4, 5 or 6. We will now analyze it for recurring pairs or groups
Out of one hundred and one recurring pairs we have fifty with the factors 2×3=6; out of twelve recurring triplets, nine have these factors; and the four recurring groups of four or more letters all have these factors. The percentages are respectively 49.5%, 75% and 100% and we may be certain from this that six alphabets were used. But, before the six frequency tables are made up, there is one more point to be considered; why are there so many recurring groups which do not have six as a factor? The answer is that one or more of the alphabets is repeated in each cycle; that is, a key word of the form HAVANA has been used. If this were the key word, the second, fourth and sixth alphabets would be the same. We will see later that in this example the second and sixth alphabets are the same and this introduces the great number of recurring groups without the factor 6. We will now proceed to make a frequency table for each alphabet. As the message is written in thirty columns, we take the first, seventh, thirteenth, etc., as constituting the first alphabet; the second, eighth, fourteenth, etc., as constituting the second alphabet and so on. The prefix and suffix letter is noted for each occurrence of each letter. The importance of this will be appreciated when the form of the frequency tables is examined. None bears any resemblance to the normal frequency table except that each is evidently a mixed up alphabet. The
We will now set down some of the determinations which can be made at once from these frequency tables. Clearly several mixed alphabets have been used. As was to be expected from the analysis of the recurring groups, we note that the frequency tables for alphabets 2 and 6 are of so nearly the same general form that certainly these two alphabets are one and the same. If a Spanish word has been used as a key word, this means that A is probably represented by a vowel in these two alphabets and probably equals A or O, because these two letters are such common finals in Spanish. 1st Alphabet. Probable vowels T, X; probable common consonants, B, I, N, R. We conclude this because of the frequency of occurrence of T and X and the variety of their prefixes and suffixes. On the other hand, B, I, N, and R have for prefixes and suffixes, in a majority of cases, E, F, O and S which are the probable vowels in the 2d and 6th alphabets. 2d and 6th Alphabets.—Probable vowels E, F, O, S; probable common consonants, D, J, Q, U, Y. 3d Alphabet.—Probable vowels C, I, L; probable common consonants A, Q, T, Y. 4th Alphabet.—Probable vowels, E, G, S, T; probable common consonants, C, M, N, P, U, X. 5th Alphabet.—Probable vowels, D, L, U; probable common consonants, C, H, I. Now this cipher may have been made up from five distinct alphabets with letters chosen at random but it is much more likely to have been prepared with a cipher disk or equivalent, having the regular alphabet on the fixed disk and the mixed alphabet on the movable disk. An equivalent form of apparatus (not using the mixed alphabet in question) is one like this:
Here A of the plain text is enciphered by S and the other letters come as they will. If we move the cipher alphabet one space to the left, A will be enciphered by U and the whole sequence of the alphabet will be changed. We will therefore use some such form as the above and see if we can insert our letters, as they are determined, in such a way as to have each of the cipher slips identical. We may start thus:
In the 1st alphabet, T and X are placed as A and E respectively on the basis of frequency. In the 2d and 6th alphabets, O and E are placed as A and E respectively on the basis of frequency. In the 4th In the second alphabet, O is four letters to the left of E; we may place O four letters to the left of E in the fourth and it comes under V. Note that in the fourth frequency table O (= V) does not occur. In the same way in the fourth alphabet, S is four letters to the right of E; placing it in the same position with respect to E in the second and sixth, we have S under I. We have already noted that S probably represents a vowel in these two alphabets. In this way, we may add D and U to the third alphabet from their position in the fifth with respect to L and we may add I and O to the fifth from their position in the third with respect to L. In every case we check results from the frequency tables and find nothing unlikely in the results. Now in the second and sixth, let us try Q, D and U as D, N and R respectively. We may add these letters to the third, fourth and fifth alphabets by the method of observing the number of letters to the right or left of some letter already fixed. We now add L to the second, third, fourth and sixth from its position with reference to D and U in the fifth. M is probably D in the fourth and we may add it to each of the alphabets, except the first, in the same way. The table is now complete as shown. Let us try these letters on the first line of the message and see if some other letters will be self-evident.
Referring to our frequency tables as a check on suppositions, we find everything agrees well enough if we assume the first line to read: UNAFUERZA DE CABALLERIA ENEMIGA We will now put the newly found letters in the table. The letters previously found are in capitals and the new letters in small letters. The addition of D (=U) to the first alphabet permits us to add all the letters of the other alphabets to the first by the methods already discussed. Each of the other letters may then be added to every alphabet by these methods:
One alphabet checks another in this way and we find everything to fit so far. We will decipher a few words more of the cipher message by the above alphabets and see if we can determine some new letters.
Again referring to the frequency tables the first word is evidently PROCEDENTE. We have also HALLA and MARCHEUSTED. The letter B may be determined from another cipher group, JFBSQDLD (56123456) = POSICION. The letter N may be determined from BETNDQXUC (123456123) = SERRADERO. The letters F and Y may be determined from JCPJOISLYDUASIUPF (23456123456123456)=
The key word is TOLEDO and the completely deciphered message is: “Una fuerza de caballeria enemiga procedente de Aranjuez y Villaseca se halla en Azucaica. Marche usted con su compania partiendo de la casa de la serradero por las alturas de lo este y norte de Azucaica con el fin de reconocer su numero y clase de fuerzas y en disposicion que se halla. (Q) Esta acantonada (Q) Se hallan otras tropas detras de ella (Q). El resultado del reconocimiento necesito saberlo dentro de tres horas y media cuando mas. Pongo a sus ordenes un ciclista (X) Fin.” When a short message is enciphered with a long key word, the methods of analysis already discussed may fail; first, because there will be no recurring pairs to indicate the number of alphabets used and, second, because there will be so few letters in each alphabet that the methods of Case 6 will not be easily applied. However, if we know or correctly assume one word, preferably a fairly long one, in the cipher text, a solution is very simple. For example, the following message is believed to refer to reËnforcements and to contain that word.
Let us assume that REINFORCEMENTS is the first word and that it is represented by the cipher group YANZVZNLPPKQFX. We may put the test in this tabular form, using a cipher disk and a Larrabee cipher card to determine the value of A for each letter under these two systems. Any other alphabets suspected may be tried out at the same time. If
equals
then, with cipher disk, A equals
and, in Vigenere cipher, A equals
It is evident that the guess as to the appearance of the word REINFORCEMENTS was correct, that it is the first word of the message, that the cipher disk was used in preparing the cipher and that the key words are PERMANENT BODY. This is, of course, an especially favorable case and we will take one less favorable to show how this method can be applied. Two Mexican chieftains, A and B, have been communicating with the following cipher alphabet:
This alphabet has been determined from many radio messages from A, the superior, to B, his sub-ordinate, who has a force of about 2,000 men near the border. A uses the form ORDENO QUE instead of the more familiar MANDO QUE in all his messages giving orders to B. The following message is received from A by B’s radio station (and other listening stations) and about an hour
This is a substitution cipher, but it is not Case 6 using the usual alphabet of the communications from A to B and, in fact, is not Case 6 at all. The recurring pairs and triplets point to a key word of ten letters and this would give us but six letters per alphabet if it is Case 7. The preparations for a move lead us to believe that A has given an order to B and he has, in that case, probably used the expression ORDENO QUE in the message. We will try the first nine letters of the message as in the other example, first preparing a cipher disk or equivalent sliding arrangement having on it the alphabet usually used between these chieftains or A-B cipher.
Clearly there is nothing here and the assumed words, if they occur, are in the middle of the message. We may jump to the combination PEGQGV at once since the preceding letters do not make ORDENO QUE. We try this without result and proceed to EGQGVJ, GQGVJJ, QGVJJE, GVJJEE, If
equals
Then, in the A-B cipher, A equals
The key is found; VIVA_ADERO and a trial of M in the blank space shows correct results. This checks with our theory that a ten letter key word was used and deciphering the message we have: PARA EL ATAQUE CONTRA TORREON ORDENO QUE SUS TROPAS MARCHEN ESTE NOCHE X. The reason for breaking camp is now evident. This method may be used, with some labor, on short words like THE, AND, etc. Parts of the key will appear whenever an assumed word is found in the message and the whole key may be assembled if enough of the parts are available. Even if only part of the key may be so recovered, it will always lead to the ultimate solution of the cipher by trial of the partially recovered key on the message letter by letter. As an example of recovery of a key by use of short common words, let us refer to the message of Case 7-a. There are twenty-four groups of three letters each in this message and we will try them against THE, ARE and YOU, assuming that the Vigenere cipher is used.
In column 5, we have, for YOU, the key BEF; column 6 gives the same key for ARE; column 10 gives the key FCH for THE and column 15 gives the same key for YOU; column 12 gives the key HBE for ARE and column 16 gives the same key for THE; column 23 gives the key EFC for YOU. The only possible key for the message is a five-letter one made up of the letters BEFCH or EFCHB or FCHBE or CHBEF or HBEFC. If the key in this case were a word, we would have no difficulty in determining it; as it is, there is no real difficulty in the matter as we may now divide the message into blocks of five letters and note that ZSZ (= YOU) form the 3d, 4th and 5th letters of a group. The corresponding key letters, BEF, are then the 3d, 4th and 5th letters of the key which must be CHBEF. This special solution for Case 7 depends so largely on the intuition of the operator in choice of a word that it is not, in general, advisable to use it unless the message is very short and the regular methods of analysis have been tried unsuccessfully. It is, however, a wonderfully short cut in difficult cases where the other methods fail. |