David Greagg breaking down the code

The code has been studied by the devoted cryptologists Professor Derek Abbott with Dr Matthew Berryman (supervisors) and Andrew Turnbull and Densley Bihari (Honours students). Critical design review 2009: who killed the Somerton man? www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Cipher_Cracking_2009 and if anyone can break it by computer, they will. The work is ongoing.

Following is mathematician David Greagg’s analysis of the code which also shows the workings of the box code. Box codes were popular between the wars. The idea for this approach comes from Dorothy L Sayers’ Have His Carcase, where there is an elegant description of how a box code works.

The Message

A lot of problems are immediately apparent. There are (possibly) 50 characters in the message. Everyone else seems to think that the opening letter is M. However, whoever it is, he writes M differently. It may well be a W. The letters are arranged in five lines as follows:

w/m rgoababd

m liaoi

w/m tbimpanetp

mliabo aiaq (c?)

(i?) ttmtsamstgab

The Ws are possibly M, though this, is in my view, questionable. It could even be an H. There is rough underlining beneath (and indeed through) line 2 and a double line with a cross just above line 4 over the O. The MS is messy, not well calligraphed and there is doubt over the two letters between lines 4 and 5. Our writer was arguably not accustomed to writing and it is possible that these are merely accidental scribbles. For this reason, the possibility than line 2 is meant to be deleted is also probably not sustainable. There is also some doubt about the M at the beginning of line 2.

Box Codes

The most tempting hypothesis in the first instance is a box code based on the keyword TAMSHUD. The absence of J anywhere might well point to a 5 x 5 box code where I and J are traditionally conflated. Against that, there is also the problem that there is also no F, H, K, U, V, X, Y or Z. So we have only a 20-letter alphabet to choose from. If other writers are correct, there is also no W. There are also major problems because a box code needs all letters in pairs. The letter distribution per line is 9, 6, 11, 11, 13; or else 9, 6, 11, 10, 12 if the C and I are omitted.

How this works is that the key letters of Tamam Shud are arranged first in a 5 x 5 box, with I and J treated as equivalent letters. All the remaining letters are then written in ordinary alphabetical order thereafter. Using this box the letters must be arranged in pairs, and the pairs of letters are swapped around a vertical axis. In this code, the word ‘so’ is coded as AQ. The word ‘go’ remains as it is, since G and O are on the same vertical axis.

The idea behind a crib is that cryptologists look at incomprehensible ciphertext (or encrypted text) and use a clue about a word or phrase that might be expected in the ciphertext. In this case, Omar Khayyam, Rubaiyat, FitzGerald and so on. If this phrase was appropriate, this would create a ‘wedge’ or a test to attempt to break the code. A sample of cribs with tamshud as the keyword is as follows:

1st line with initial W: ZOGOMDMD (plus an extra letter)

1st line with initial M: HPGOMDMD (plus an extra letter)

1st line with first letter omitted: OLOADMDB

2nd line: HIGMPG

This really doesn’t seem to be getting us anywhere. I also attempted box codes with other keywords: omarky (Omar Khayyam), rubaiyt (Rubaiyat), fitzgerald and jestyn (the possible name of the nurse). Sample cribs are shown below:

omarky:

1st line with initial W: XAEAABAB (plus an extra letter)

1st line with initial M: RMEAABAB (plus an extra letter)

1st line with first letter omitted: AIAOBADB

2nd line: KFGRRE


rubaiyt:

1st line with initial W: SBHNBABA (plus an extra letter)

1st line with initial M: MRHNBABA (plus an extra letter)

1st line with first letter omitted: FUPBABAC

2nd line: QFAIQB


fitzgerald:

1st line with initial W: VAISHEHE (plus an extra letter)

1st line with initial M: CDISHEHE (plus an extra letter)

1st line with first letter omitted: DIPRHEME

2nd line: KDTROI


jestyn:

1st line with initial W: YPGOBABA (plus an extra letter)

1st line with initial M: LUGOBABA (plus an extra letter)

1st line with first letter omitted: OLOAABDB

2nd line: LMIAOI

Another possibility is that the use of tamam shud (the final phrase of The Rubaiyat) is an inverted clue and the true keyword should be based on the first two words of FitzGerald’s Rubaiyat. These words are Awake! For, giving awkefor. Not only are there two words in both cases, but seven keyword letters:

1st line with initial W: WRGOKOKO (plus an extra letter)

1st line with initial M: HDGOKOKO (plus an extra letter)

1st line with first letter omitted: HOOAOKDB

2nd line: LMGKBG

It would seem fruitless to persist with this theory.

ETAOINS and extended ETAOINS

If it is a letter substitution code, we can use tamshud equals ETAOINS. The questionable letters here are no longer so important as in a box code, where one extra (or fewer) letter changes every single letter in the crib. Begin with a restricted ETAOINS, leaving code letters as lower case and cribs as upper case.

Here t is replaced by E, a by T, m by A, s by O, h by I, u by N and d by S:

wrgoTbTbS

AliToi

wEbiApTneEp

AilTboTiTqc

iEEAEOTAOEgTb

Immediately we run into difficulties. There are too many vowels clumped together and not enough elsewhere, given that the only other true vowel available is U.

If we replace w with m, as others seem to think it is, we get:

ArgoTbTbS

AliToi

AEbiApTneEp

AilTboTiTqc

iEEAEOTAOEgTb

If we are to extend the substitution to the entire alphabet, then rather than using modern computer analysis of letter frequencies, we should use the printers’ type boxes as used by Samuel Morse. This list is given as:

ETAOINSHRDLUCMFWYGPBVKQJXZ

Using this list, we may assign the other letters of our alphabet as follows:

Ta m shud b c e f g i j k l n opq r v wxy z

are to be replaced by:

ETAOINSHRDLUCMFWYGPBVKQJXZ

The great advantage of this system is that is does offer a plausible explanation for the absence in our 50-odd characters of the least common letters V, K, Q, J, X and Z. Unfortunately, our possible crib then (with ‘m’ rather than ‘w’) becomes:

AVUGTHTHS

AWCTGC

AEHCAPTYDEP

ACWTHGTCTBR

CEEAEOTAOEUTH

Little, or far too much, can be made of this.

Using awkefor instead of tamshud gives us the following ETAOINS code.

Here a is replaced by E, w by T, k by A, e by O, f by I, o by N and r by S:

Our message begins:

T/m SgNababd

m liEoi

T/m tbimpEnOtp

mliEbN EiEq (c?)

(i?) ttmtsEmstgEb

Since so few of our main code letters appear in the message, we can probably rule this possibility out straightaway.

Restricting the Sample Part 1

All authors have commented on the cryptic nature of the message. It does not appear that using all 50 letters that anything can be made of this which resembles English, or any other plausible language. One author has suggested that the letters are simply the first letters of a message. This would account for the fact that it seems impossible to crack this code using any traditional methods. The fatal drawback to this is that if this is so, virtually any message can be constructed with those initial letters. There is not nearly enough information to show that any one message is superior to any other.

If the message is indeed cryptic, then it is possible that the cross in line 3 is meant to be the end of the message, and the other letters are there simply to confuse the issue. Our reduced text is:

wrgoababd

mliaoi

wtbimpanetp

mliabo

The fact that lines 2 and 4 are similar may be of importance. The absence of rare letters in English also tends to support the idea of an ETAOINS code. Beginning with the reduced ETAOINS we have:

wrgoTbTbS

AliToi

wEbiApTneEp

AilTbo

Possible words for line 1 are: ????T?T?S equals allstates, annotates, hesitates, apostates, appetites, irritates, meditates, thestates, detonates, and a number of less likely words. Most of these may be ruled out at once, since none of the missing letters can be any of tamshud. The initial A is not an insuperable obstacle, since if other authors are correct that the first letter is m, this construes as A in this code. The letter b would almost certainly be a vowel, and could be E, I or O, but not A. This rules out every one of the suggested words. The idea that b equals E is plausible, since our reduced coded sequence has four bs. This means our initial line is now ????TETES. However, no English words appear to fit this sequence. There is also the problem that b should not be E, since if tamshud equals ETAOINS then t should be E rather than b.

The second line has possible words including acetic and arctic, which obey the rule that the missing letters cannot be any of tamshud. Arctic looks promising, since our code letter i occurs at letter 3 and 6, which would then construe as C. For line 4, however, this gives a pattern AC?T?? which only gives us words like ACETYL and other even less likely cribs. This does not seem to be leading anywhere productive.

Restricting the Sample Part 2

If part of the message is indeed gibberish, another possibility is that the message does not in fact begin until after the cross. This gives our message as (assuming the c and i are just scribbles as previously suggested) two lines of 4 and 12 letters:

aiaq

ttmtsamstgab

Our box code gives us:

MGSO

TTTMASSMAFMD

The use of TT here seems to rule out a box code using tamshud as our keyword with lines of 4 and 12 letters. If we expand using the extra c and i we get:

aiaqcittmtsamstgab:

In this system our crib is:

MGSOBKTTTMASSMAFMD

Again, we don’t seem to be getting anywhere. The problem with adding two extra letters is that the problem with the letter T is still there. Arbitrarily choosing only one of the two extra letters gives an uneven number of letters.

Using omakhy as our keyword, the crib is RGAQ (CI) TTKPRRPLQAB.

Using rubaiyt as our keyword, the crib is IAIP (EB) TTNYXRMSTGBA.

Using fitzgerald as our keyword, the crib is TRLP (CI) TTHGPDMSGTEH.

Again, no plausible solution emerges from this. The use of the TT is an endemic problem on its own, and if it is a box code at all then the keyword must be something entirely different.

If we reduce the sample this way, the problem with tamshud equals ETAOINS remains:

aiaq (c?)

(i?) ttmtsamstgab

becomes

TiTqc

iEEAEOTAOEgTb

The vowel combination in the last line is an insuperable obstacle.

Conclusion

Neither box codes nor ETAOINS letter substitution would appear to be a solution to our code. The code still seems unbreakable.

Загрузка...