Examination of Substitution Ciphers

Previous

When an unknown cipher has been put into the substitution class by the methods already described we may proceed to decide on the variety of substitution cipher which has been used.

There are a few purely mechanical ways of solving some of the simple cases of substitution ciphers but as a general rule some or all of the following determinations must be made:

1. By preparation of a frequency table for the message we determine whether one or more substitution alphabets have been used and, if one only has been used, this table leads to the solution.

2. By certain rules we determine how many alphabets have been used, if there are more than one, and then isolate and analyze each alphabet by means of a frequency table.

3. If the two preceding steps give no results we have to deal with a cipher with a running key, a cipher of the Playfair type, or a cipher where two or more characters are substituted for each letter of the text. Some special cases under this third head will be given but, in general, military ciphers of the substitution class will usually be found to come under the first two heads, on account of the time and care required in the preparation and deciphering of messages by the last named methods and the necessity, in many cases, of using complicated machines for these processes.

Case 4-a.

Message

OBQFO BPBRP QBAML OBHIF PILFQ FJBOX OFLNR BIXOZ EL

From the recurrence of B, F and O, we may conclude that a single substitution alphabet was used for this message. If so and if the alphabet runs in the same order and direction as the regular alphabet, the simplest way to discover the meaning of the message is to take the first two words and write alphabets under each letter as follows, until some line makes sense:

O B Q F O B P B R P
P C R G P C Q C S Q
Q D S H Q D R D T R
R E T I R E S E U S

The word RETIRESE occurs in the fourth line, and, if the whole message be handled in this way we find the rest of the fourth line to read USTED POR EL MISMO ITINERARIO QUE MARCHO. The message was enciphered using an alphabet where A = X, B = Y, C = Z, D = A, etc. noting that as this message is in Spanish the letters K and W do not appear in the alphabet.

Case 4-b.

Message

HUJZH UIUPN OZYTS VQXMI SMOMX MQHUD UMREI SESJU AG

This is a message in Spanish. We will handle it as in case 4-a, setting down the whole message.

HUJZHUIU PNOZY TSV QX MISMO MXMQHUDUMR EIS ESJUAG
IVLAIVJV QOPAZ UTX RY A=A NYNRIVEVNS FJT FTLVBH
JXMBJXLX RPQBA VUY SZ OZOSJXFXOT GLU GUMXCI
LYNCLYMY SQRCB XVZ TA PAPTLYGYPU HMV HVNYDJ
MZODMZNZ TRSDC YXA UB QBQUMZHZQV INX IXOZEL
NAPENAOA USTED ZYB VC RCRVNAIARX JOY JYPAFM
OBQFOBPB A=U AZC XD SDSXOBJBSY LPZ LZQBGN
PCRGPCQC BAD YE TETYPCLCTZ MQA MARCHO
QDSHQDRD CBE ZF UFUZQDMDUA NRB A=S
RETIRESE DCF AG VGVARENEVB OSC
A=Q EDG BH XHXBSFOFXC PTD
FEH CI YIYCTGPGYD QUE
GFI DJ ZJZDUHQHZE A=O
HGJ EL ALAEVIRIAF
IHL A=M BMBFXJSJBG
JIM CNCGYLTLCH
LJN DODHZMUMDI
MLO EPEIANVNEJ
NMP FQFJBOXOFL
ONQ GRGLCPYPGM
POR HSHMDQZQHN
A=E ITINERARIO
A=D

Here each word of the message comes out on a different line, and noting in each case the letter corresponding to A, we have the word QUEMADOS which is the key. The cipher alphabet changed with each word of the message.

A variation of this case is where the cipher alphabet changes according to a key word but the change comes every five letters or every ten letters of the message instead of every word. The text of the message can be picked up in this case with a little study.

Note in using case 4 that if we are deciphering a Spanish message we use the alphabet without K or W as a rule, altho if the letters K or W appear in the cipher it is evidence that the regular English alphabet is used.

Case 5-a.

Message

DNWLW MXYQJ ANRSA RLPTE CABCQ RLNEC LMIWL XZQTT QIWRY ZWNSM BKNWR YMAPL ASDAN

This message contains K and W and therefore we expect the English alphabet to be used. The frequency of occurrence of A, L, N, R and W has lead us to examine it under case 4 but without result. Let us set down the first two words and decipher them with a cipher disk set A to A and then proceed as in case 4.

Cipher message DNWLWMXYQJ
Deciphered A to A XNEPEODCKR
B YOFQFPEDLS
C ZPGRGQFEMT
D AQHSHRGFNU
E BRITISHGOV

The message is thus found to be enciphered with a cipher disk set A to E and the text is: BRITISH GOVERNMENT PLACED CONTRACTS WITH FOLLOWING FIRMS DURING SEPTEMBER.

Case 5-b.

Same as case 4-b except that the cipher message must be deciphered by means of a cipher disk set A to A before proceeding to make up the columns of alphabets. The words of the deciphered message will be found on separate lines, the lines being indicated as a rule by a key word which can be determined as in case 4-b.

The question of alphabetic frequency has already been discussed in considering the mechanism of language. It is a convenient thing to put the frequency tables in a graphic form and to use a similar graphic form in comparing unknown alphabets with the standard frequency tables. For instance the standard Spanish frequency table put in graphic form is here presented in order to compare with it the frequency table for the message discussed in case 4-a.

Standard Spanish frequency table Table for Message Case 4-a
A 111111111111111111111111111 27 A 1 1
B 11 2 B 1111111 7
C 111111111 9 C
D 1111111111 10 D
E 1111111111111111111111111111 28 E 1 1
F 11 2 F 11111 5
G 111 3 G
H 11 2 H 1 1
I 111111111111 12 I 111 3
J 1 1 J 1 1
L 1111111111 10 L 111 3
M 111111 6 M 1 1
N 111111111111 12 N 1 1
O 1111111111111111 16 O 111111 6
P 11111 5 P 111 3
Q 11 2 Q 111 3
R 111111111111111 15 R 11 2
S 11111111111111 14 S
T 11111111 8 T
U 1111111 7 U
V 11 2 V
X X 11 2
Y 11 2 Y
Z 1 1 Z 1 1

Our first assumption might be that B = A and F = E but it is evident at once that in that case, S, T, U and V (equal to R, S, T and U) do not occur and a message even this short without R, S, T or U is practically impossible. By trying B = E we find that the two tables agree in a general way very well and this is all that can be expected with such a short message. The longer the message the nearer would its frequency table agree with the standard table. Note that if a cipher disk has been used, the alphabet runs the other way and we must count upward in working with a graphic table. Note also that if, in a fairly long message, it is impossible to coÖrdinate the graphic table, reading either up or down, with the standard table and yet some letters occur much more frequently than others and some do not occur at all, we have a mixed alphabet to deal with. The example chosen for case 6-a is of this character. An examination of the frequency table given under that case shows that it bears no graphic resemblance to the standard table. However, as will be seen in case 7-b, the preparation of graphic tables enables us to state definitely that the same order of letters is followed in each of a number of mixed alphabets.

General Remarks

Any substitution cipher, enciphered by a single alphabet composed of letters, figures or conventional signs, can be handled by the methods of case 6. For example, the messages under case 4-a and 5-a are easily solved by these methods. But note that the messages under case 4-b and 5-b cannot so be solved because several alphabets are used. We will see later that there are methods of segregating the different alphabets in some cases where several are used and then each of the alphabets is to be handled as below.

Case 6-a.

Message

QDBYP BXHYS OXPCP YSHCS EDRBS ZPTPB BSCSB PSHSZ AJHCD OSEXV HPODA PBPSZ BSVXY XSHCD

This message was received from a source which makes us sure it is in Spanish. The occurrence of B, H, P and S has tempted us to try the first two words as in case 4 and 5 but without result. We now prepare a frequency table, noting at the same time the preceding and following letter. This latter proceeding takes little longer than the preparation of an ordinary frequency table and gives most valuable information.

Frequency Table

Prefix Suffix
A 11 2 ZD JP
B 11111111 8 DPRPBSPZ YXSBSPPS
C 11111 5 PHSHH PSSDD
D 11111 5 QECOC BROA
E 11 2 SS DX
F
G
H 111111 6 XSSJVS YCSCPC
I
J 1 1 A H
L
M
N
O 111 3 SDP XSD
P 111111111 9 YXCZTBHAB BCYTBSOBS
Q 1 1 D
R 1 1 D B
S 111111111111 12 YYCBBCPHOPBX OHEZCBHZEZVH
T 1 1 P P
U
V 11 2 XS HX
X 11111 5 BOEVY HPVYS
Y 1111 4 BHPX PSSX
Z 111 3 SSS PAB

It is clear from an examination of this table that we have to deal with a single alphabet but one in which the letters do not occur in their regular order.

We may assume that P and S are probably A and E, both on account of the frequency with which they occur and the variety of their prefixes and suffixes. If this is so, then B and H, are probably consonants and may represent R and N respectively. D and X are then vowels by the same method of analysis. Noting that HC occurs three times and taking H as N we conclude that C is probably T. Substitute these values in the last three words of the message because the letters assumed occur rather frequently there.

PBPSZBSV X Y X SHC D
I I I
ARAE_RE_ _ ENT
O O O

Now Z is always prefixed by S and may be L. Taking X=I and D=O, (they are certainly vowels), V=G and Y=M, we have

ARA EL REGIMIENTO

Substituting these values in the rest of the message we have

Q DBYPBXH YSOXPCP YSHCSED RBSZPTPB
_ ORMARIN ME_IATA MENTE_O _RELA_AR
BSCSB PSHSZ AJHCD OSEXVHPODA
RETER AENEL __NTO _E_IGNA_O_

We may now take Q=F, O=D, E=S, R=B, T=C, A=P and J=U and the message is complete. We are assisted in our last assumption by noting that S=E and E=S, etc., and we may on that basis reconstruct the entire alphabet. The letters in parenthesis do not occur in the message but may be safely assumed to be correct.

Ordinary A B C D E F G H I J L M N O P Q R S T U V X Y Z
Cipher P R T O S (Q) (V) N (X) (U) (Z) (Y) (H) D A F B E C J G I M L

It is always well to attempt the reconstruction of the entire alphabet for use in case any more cipher messages written in it are received.——

Case 6-b.

Message

Lt. J. B. Smith, Royal Flying Corps, Calais, France.

DACFT RRBHA MOOUE AENOI ZTIET
ASMOS EOHIE YOCKF NOHOE NOUTH
OMEAH NILGO OSAHU OHOUE APCHS
TLNDA CFTEN INTWN BAFOH GROHT
AEIOH ABRIS ODACF TRREN OSTSM
AYBIS DFTEN EFAPH OSMNI ZTIEA
HLILL TWSOU GDENO UTHOM EAHBH
AMOOU EAYOE QISUU OLEHA DENOE
NHOOQ OBBOR TSLHO BAHEO UBHOB
IHTSW ENOHO PAHIH ITUAS BIHTL

Graham-White.

The address and signature indicate that this message is in English.

There are 250 letters in the cipher; the vowels AEIOU occur 109 times or 43.6%, the letters LNRST occur 62 times or 24.8%, and the letters KQVXZ occur 5 times or 2%. The proportion in the case of the vowels is somewhat too large and, in the case of the letters LRNST, it is too small. It is then questionable whether this is a transposition cipher altho, at first glance it might appear to be one.

On examination for parts of possible words we are at once struck by the occurrence at irregular intervals of recurring groups, viz:

DACFTRR ENO BHAMOOUEA
DACFTEN ENOUTHOMEAH BHAMOOUEA
ENO
DACFTRR DENOUTHOMEAH IZTIE
FTEN DENO IZTIE
ENO

This is a strong indication that the cipher is a substitution cipher, so, to make an examination a frequency table will be constructed.

Frequency Table

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
23 11 7 6 24 7 3 26 16 0 1 8 8 15 36 3 2 8 14 17 11 1 3 0 3 2

Superficially, this looks like a normal frequency table, but O is the dominant letter, followed by H, E, A, T, I, N, S, in the order named. It is certainly Case 6 if it is a substitution cipher at all.

Let us see what can be done by assuming O=E; the triplet ENO, occurring six times might well be THE and E=T and N=H. A glance at the frequency table shows this to be reasonable. Now substitute these letters in some likely groups. FNOHOENO becomes _HE_ETHE; FTEN becomes _TH; ENOENHO becomes THETH_E; ENOHO becomes THE_E. A bit of study will show that F=W, T=I and H=R and the frequency table bears this out except that H(=R) seems to occur too frequently. The recurring groups containing DAC (see above) occur in such a way that we may be sure DAC is one word, FTRR is another and FTEN(=WITH) is a third. Now FTRR becomes WI__, which can only be completed by a double letter. LL fills the bill and we may say R=L. As DAC starts the message and is followed by FTRR (=WILL) it is reasonable to try DAC=YOU. Looking up DAC in the frequency table it is evident that we strain nothing by this assumption. We now have:

Letters of cipher ONTAHECFD
Letters of message EHIORTUWY

Now take the group ENOUTHOMEAH which occurs twice. This becomes THE_IRE_TOR and if we substitute U=D and M=C we have THE DIRECTOR. Next the group (FTRR)BHAMOOUEA becomes (WILL) _ROCEEDTO and the context gives word with missing letter as PROCEED, from which B=P. Next the group (ENO) IZTIETASMOSEOHIEYOCK(FNOHO) becomes (THE)__I_TIO_CE_TER_T_EU_(WHERE) and the group (FTEN)EFAPHOSMNIZTIEAHL becomes (WITH)TWO_RE_CH__I_TOR_. The substitution of A for I, V for Z, N for S and F for P makes the latter group read (WITH TWO FRENCH AVIATORS and the former read (THE)AVIATION CENTER AT _EU_(WHERE).

Now the word YOCK = (_EU_) is the name of a place, evidently. We find another group containing Y, viz: ENOSTSMAYBISD which becomes THENINCO_PANY so that evidently we should substitute M for Y. The other occurrence of Y (=M) is in the group EAYOEQISU which becomes TOMET_AND. A reasonable knowledge of geography gives us the words MEUX and METZ so that X should be substituted for K and Z for Q.

We now have sufficient letters for a complete deciphering of the message.

Letters of cipher ABCDEFGHIKLMNOPQRSTUVWYZ
Letters of message OPUYTW_RAXSCHEFZLNID__MV

The message deciphers:

YOU WILL PROCEED TO THE AVIATION CENTER AT MEUX WHERE THE DIRECTOR HAS _EEN ORDERED TO FURNISH YOU WITH A HI_H POWER _LERIOT AEROPLANE. YOU WILL THEN IN COMPANY WITH TWO FRENCH AVIATORS ASSI_NED _Y THE DIRECTOR PROCEED TO METZ AND DESTROY THE THREE ZEPPELINS REPORTED PREPARIN_ THERE FOR A RAID ON PARIS.

The substitution of B for G, G for W and K for V completes the cipher. This cipher is difficult only because the cipher alphabet is made up, not haphazard, but scientifically with proper consideration for the natural frequency of occurrence of the letters. In cipher work it is dangerous to neglect proper analysis and jump at conclusions.

In the study of Mexican substitution ciphers, several alphabets have been found which are made up in a general way, like the one discussed in this case.

Case 6-c.—It is a convenience in dealing with ciphers made up of numbers or conventional signs to substitute arbitrary letters for the numbers and signs. Suppose we have the message:

”??2& 45x15 )“8&# &&1x4 %&4&%
6x?&” 8&*x4 6°*°& %“4&”

By arbitrary substitution of letters this is made

ABBCD EFGHF IJKDL DDHGE MDEDM
NGBDA KDOGE NPOPD MAEDA

This message is now in convenient shape to handle as Case 6-a and on solution is found to read:

ALL PERSONS HAVE BEEN ORDERED TO LEAVE FORTIFIED AREA.

In the same way the message

1723 3223 2825 1828 3630 2336 1423 2827 2324 3120 2317 3123
3036 2120 2415 3029 1512 2831 1721 2715 2811 2715 1923 3030
1215 1130 2128 3623

is found to be made up entirely of numbers between 11 and 36 with the numbers 23, 28 and 30 occurring most frequently. This immediately suggests an alphabet made up of the numbers from 11 to 36 inclusive and each cipher group of figures represents two letters. By arbitrary substitution of letters for groups of two numbers we obtain:

AB CB DE FD GH BG IB DJ BK LM BA LB
HG NM OP HQ PR DL AN JP DS JP TB HH
RP SH ND GB

and this message is also in shape to handle as Case 6-a. It reads, on solution,

SEVEN HUNDRED MEN LEFT YESTERDAY FOR POINTS ON LOWER RIO GRANDE.

                                                                                                                                                                                                                                                                                                           

Clyx.com


Top of Page
Top of Page