The code has been studied by the devoted cryptologists Professor Derek Abbott with Dr Matthew Berryman (supervisors) and Andrew Turnbull and Densley Bihari (Honours students). Critical design review 2009: who killed the Somerton man? www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Cipher_Cracking_2009 and if anyone can break it by computer, they will. The work is ongoing.
Following is mathematician David Greagg’s analysis of the code which also shows the workings of the box code. Box codes were popular between the wars. The idea for this approach comes from Dorothy L Sayers’ Have His Carcase, where there is an elegant description of how a box code works.
A lot of problems are immediately apparent. There are (possibly) 50 characters in the message. Everyone else seems to think that the opening letter is M. However, whoever it is, he writes M differently. It may well be a W. The letters are arranged in five lines as follows:
w/m rgoababd
m liaoi
w/m tbimpanetp
mliabo aiaq (c?)
(i?) ttmtsamstgab
The Ws are possibly M, though this, is in my view, questionable. It could even be an H. There is rough underlining beneath (and indeed through) line 2 and a double line with a cross just above line 4 over the O. The MS is messy, not well calligraphed and there is doubt over the two letters between lines 4 and 5. Our writer was arguably not accustomed to writing and it is possible that these are merely accidental scribbles. For this reason, the possibility than line 2 is meant to be deleted is also probably not sustainable. There is also some doubt about the M at the beginning of line 2.
The most tempting hypothesis in the first instance is a box code based on the keyword TAMSHUD. The absence of J anywhere might well point to a 5 x 5 box code where I and J are traditionally conflated. Against that, there is also the problem that there is also no F, H, K, U, V, X, Y or Z. So we have only a 20-letter alphabet to choose from. If other writers are correct, there is also no W. There are also major problems because a box code needs all letters in pairs. The letter distribution per line is 9, 6, 11, 11, 13; or else 9, 6, 11, 10, 12 if the C and I are omitted.
How this works is that the key letters of Tamam Shud are arranged first in a 5 x 5 box, with I and J treated as equivalent letters. All the remaining letters are then written in ordinary alphabetical order thereafter. Using this box the letters must be arranged in pairs, and the pairs of letters are swapped around a vertical axis. In this code, the word ‘so’ is coded as AQ. The word ‘go’ remains as it is, since G and O are on the same vertical axis.
The idea behind a crib is that cryptologists look at incomprehensible ciphertext (or encrypted text) and use a clue about a word or phrase that might be expected in the ciphertext. In this case, Omar Khayyam, Rubaiyat, FitzGerald and so on. If this phrase was appropriate, this would create a ‘wedge’ or a test to attempt to break the code. A sample of cribs with tamshud as the keyword is as follows:
1st line with initial W: ZOGOMDMD (plus an extra letter)
1st line with initial M: HPGOMDMD (plus an extra letter)
1st line with first letter omitted: OLOADMDB
2nd line: HIGMPG
This really doesn’t seem to be getting us anywhere. I also attempted box codes with other keywords: omarky (Omar Khayyam), rubaiyt (Rubaiyat), fitzgerald and jestyn (the possible name of the nurse). Sample cribs are shown below:
omarky:
1st line with initial W: XAEAABAB (plus an extra letter)
1st line with initial M: RMEAABAB (plus an extra letter)
1st line with first letter omitted: AIAOBADB
2nd line: KFGRRE
rubaiyt:
1st line with initial W: SBHNBABA (plus an extra letter)
1st line with initial M: MRHNBABA (plus an extra letter)
1st line with first letter omitted: FUPBABAC
2nd line: QFAIQB
fitzgerald:
1st line with initial W: VAISHEHE (plus an extra letter)
1st line with initial M: CDISHEHE (plus an extra letter)
1st line with first letter omitted: DIPRHEME
2nd line: KDTROI
jestyn:
1st line with initial W: YPGOBABA (plus an extra letter)
1st line with initial M: LUGOBABA (plus an extra letter)
1st line with first letter omitted: OLOAABDB
2nd line: LMIAOI
Another possibility is that the use of tamam shud (the final phrase of The Rubaiyat) is an inverted clue and the true keyword should be based on the first two words of FitzGerald’s Rubaiyat. These words are Awake! For, giving awkefor. Not only are there two words in both cases, but seven keyword letters:
1st line with initial W: WRGOKOKO (plus an extra letter)
1st line with initial M: HDGOKOKO (plus an extra letter)
1st line with first letter omitted: HOOAOKDB
2nd line: LMGKBG
It would seem fruitless to persist with this theory.
ETAOINS and extended ETAOINS
If it is a letter substitution code, we can use tamshud equals ETAOINS. The questionable letters here are no longer so important as in a box code, where one extra (or fewer) letter changes every single letter in the crib. Begin with a restricted ETAOINS, leaving code letters as lower case and cribs as upper case.
Here t is replaced by E, a by T, m by A, s by O, h by I, u by N and d by S:
wrgoTbTbS
AliToi
wEbiApTneEp
AilTboTiTqc
iEEAEOTAOEgTb
Immediately we run into difficulties. There are too many vowels clumped together and not enough elsewhere, given that the only other true vowel available is U.
If we replace w with m, as others seem to think it is, we get:
ArgoTbTbS
AliToi
AEbiApTneEp
AilTboTiTqc
iEEAEOTAOEgTb
If we are to extend the substitution to the entire alphabet, then rather than using modern computer analysis of letter frequencies, we should use the printers’ type boxes as used by Samuel Morse. This list is given as:
ETAOINSHRDLUCMFWYGPBVKQJXZ
Using this list, we may assign the other letters of our alphabet as follows:
Ta m shud b c e f g i j k l n opq r v wxy z
are to be replaced by:
ETAOINSHRDLUCMFWYGPBVKQJXZ
The great advantage of this system is that is does offer a plausible explanation for the absence in our 50-odd characters of the least common letters V, K, Q, J, X and Z. Unfortunately, our possible crib then (with ‘m’ rather than ‘w’) becomes:
AVUGTHTHS
AWCTGC
AEHCAPTYDEP
ACWTHGTCTBR
CEEAEOTAOEUTH
Little, or far too much, can be made of this.
Using awkefor instead of tamshud gives us the following ETAOINS code.
Here a is replaced by E, w by T, k by A, e by O, f by I, o by N and r by S:
Our message begins:
T/m SgNababd
m liEoi
T/m tbimpEnOtp
mliEbN EiEq (c?)
(i?) ttmtsEmstgEb
Since so few of our main code letters appear in the message, we can probably rule this possibility out straightaway.
All authors have commented on the cryptic nature of the message. It does not appear that using all 50 letters that anything can be made of this which resembles English, or any other plausible language. One author has suggested that the letters are simply the first letters of a message. This would account for the fact that it seems impossible to crack this code using any traditional methods. The fatal drawback to this is that if this is so, virtually any message can be constructed with those initial letters. There is not nearly enough information to show that any one message is superior to any other.
If the message is indeed cryptic, then it is possible that the cross in line 3 is meant to be the end of the message, and the other letters are there simply to confuse the issue. Our reduced text is:
wrgoababd
mliaoi
wtbimpanetp
mliabo
The fact that lines 2 and 4 are similar may be of importance. The absence of rare letters in English also tends to support the idea of an ETAOINS code. Beginning with the reduced ETAOINS we have:
wrgoTbTbS
AliToi
wEbiApTneEp
AilTbo
Possible words for line 1 are: ????T?T?S equals allstates, annotates, hesitates, apostates, appetites, irritates, meditates, thestates, detonates, and a number of less likely words. Most of these may be ruled out at once, since none of the missing letters can be any of tamshud. The initial A is not an insuperable obstacle, since if other authors are correct that the first letter is m, this construes as A in this code. The letter b would almost certainly be a vowel, and could be E, I or O, but not A. This rules out every one of the suggested words. The idea that b equals E is plausible, since our reduced coded sequence has four bs. This means our initial line is now ????TETES. However, no English words appear to fit this sequence. There is also the problem that b should not be E, since if tamshud equals ETAOINS then t should be E rather than b.
The second line has possible words including acetic and arctic, which obey the rule that the missing letters cannot be any of tamshud. Arctic looks promising, since our code letter i occurs at letter 3 and 6, which would then construe as C. For line 4, however, this gives a pattern AC?T?? which only gives us words like ACETYL and other even less likely cribs. This does not seem to be leading anywhere productive.
If part of the message is indeed gibberish, another possibility is that the message does not in fact begin until after the cross. This gives our message as (assuming the c and i are just scribbles as previously suggested) two lines of 4 and 12 letters:
aiaq
ttmtsamstgab
Our box code gives us:
MGSO
TTTMASSMAFMD
The use of TT here seems to rule out a box code using tamshud as our keyword with lines of 4 and 12 letters. If we expand using the extra c and i we get:
aiaqcittmtsamstgab:
In this system our crib is:
MGSOBKTTTMASSMAFMD
Again, we don’t seem to be getting anywhere. The problem with adding two extra letters is that the problem with the letter T is still there. Arbitrarily choosing only one of the two extra letters gives an uneven number of letters.
Using omakhy as our keyword, the crib is RGAQ (CI) TTKPRRPLQAB.
Using rubaiyt as our keyword, the crib is IAIP (EB) TTNYXRMSTGBA.
Using fitzgerald as our keyword, the crib is TRLP (CI) TTHGPDMSGTEH.
Again, no plausible solution emerges from this. The use of the TT is an endemic problem on its own, and if it is a box code at all then the keyword must be something entirely different.
If we reduce the sample this way, the problem with tamshud equals ETAOINS remains:
aiaq (c?)
(i?) ttmtsamstgab
becomes
TiTqc
iEEAEOTAOEgTb
The vowel combination in the last line is an insuperable obstacle.
Neither box codes nor ETAOINS letter substitution would appear to be a solution to our code. The code still seems unbreakable.