In the beginning were transposition and substitution. Transposition ciphers, which date back to at least the fifth century B.C., are giant anagrams. You scramble all the letters according to an agreed-upon pattern and put them back together the same way. A grade-school example is the rail-fence. Extract the odd-numbered characters from a message, write them in sequence, and follow them by the even-numbered characters, so
S H U F F L E O F F T H I S M O R T A L C O I L
S U F E F T I M R A C I H F L O F H S O T L O L
Another transposition cipher, beloved of the Spartans, is the scytale, which is a wooden staff of a certain diameter. Wind a ribbon of leather or parchment around it, and write your message on it. Unwrap the ribbon, and voilà, your message is scrambled, awaiting delivery to someone with another wooden stick of precisely the same diameter. A scytale, if it’s of a fixed diameter, is just another version of the rail-fence cipher. If it isn’t fixed, then you’ve bought yourself a major hardware distribution problem.
Substitution ciphers are sometimes called Caesar ciphers after Julius Caesar, who didn’t invent them but employed them frequently. If you’ve ever seen a decoder ring from a cereal box you know what a Caesar cipher is. Take the alphabet and shift it three places. A becomes D, Q becomes T, Y goes around the bend and becomes B, and there you have it. Obviously if you restrict yourself to a simple shift you have only 25 distinct ciphers. (A cipher is a code in which each symbol stands for a single letter of the alphabet.) But if you allow any rearrangement of the alphabet, then you have 2525 unique ciphers, which is a lot.
This enormous number of ciphers to choose from makes substitution an advance over transposition. Once you know you’re dealing with a transposition cipher you’re halfway home, because there are only so many plausible ways to transpose letters. A transposition cipher, in other words, depends on the secrecy of the algorithm. But with a substitution cipher, even if you know that’s what it is, you need to discover the precise letter mapping, and there are far, far too many to try them all. A substitution cipher depends on the secrecy of the key, and it’s a helluva lot easier to change the key than it is to change the algorithm. The first principle of cryptography, laid down by Kerckhoff in 1883, is that any cipher that relies on the secrecy of the algorithm, as opposed to the key, is inherently insecure, because, sooner or later, someone will steal your scytale, or your Enigma machine. To this day you will see supposed security experts touting their new, proprietary, double-secret algorithms, which is an infallible sign of snake oil. “Security by obscurity” is the derisory term among cryptanalysts.
A few Arab scientists finally got wise to substitution ciphers around the ninth century AD. They began with the simple observation that some letters are a lot more common than others. And not only letters, but digrams and trigrams, sequences of two and three letters. Once you realize that 13% of the letters in ordinary English prose are E’s, 10% are T’s and 8% are A’s, and that three consecutive vowels or consonants are pretty rare, breaking substitution ciphers becomes straightforward. A decent armchair cryptanalyst, given enough ciphertext, can usually crack a simple substitution cipher in about ten minutes. Here’s the letter frequency table in English, along with a list of the commonest digrams and trigrams. (Scrabble follows the table fairly accurately, although not exactly, since in Scrabble it matters only how many words contain a given letter, not how common it is. The highest scoring letter relative to its frequency is H, which scores 4 despite being the 9th most common letter in the alphabet, its frequency being largely due to the definite article. The lowest is U, which scores 1 despite being 15th.)
Code makers tried many ruses to foil frequency analysis. Null symbols, which represent no letter at all, were added. More insidious were operation symbols, which represented not a letter but an instruction, such as “delete the previous letter.” Since one of the standard techniques to break a substitution cipher is to look for common words like “and” and “the,” special symbols were substituted for them. Messages were deliberately spelled badly, with, say, k’s for c’s. The best code breakers defeated all of these techniques.
The code makers needed a breakthrough, and it was supplied by Blaise de Vigenère. around 1550. Vigenère realized that the fundamental weakness of substitution ciphers is consistency: the same symbol in the ciphertext always maps to the same letter in the plaintext. He proceeded to invent the first polyalphabetic cipher. (This is incorrect; see the update below.) You begin with what’s called a Vigenère square:
Reading across the table, you see a simple series of Caesar ciphers: A is unshifted, B is shifted one letter, and so forth. Now you choose a code word; this is the key. So suppose we want to encrypt SHUFFLE OFF THIS MORTAL COIL with the keyword HAMLET. You begin by repeating the keyword to the length of the message. Our message has 24 letters, so we have HAMLETHAMLETHAMLETHAMLET. Now you encrypt the first letter, S, using the H row of the Vigenère square, so it becomes a Z. The next letter, H, is encrypted with the A row, which is unshifted, so it remains an H. The U is encrypted with the M row, becoming a G. And so forth. The encrypted message reads:
Z H G Q J E L O R Q X A P S Y Z V M H L O Z M E
You can try it yourself here.
In the Vigenère square the key is very small, just a single word, which vastly simplified the distribution problem. Modified substitution ciphers had grown so large that passing around complex codebooks was required. But its principal advantage is that it stymies basic frequency analysis, because one-to-one mapping between plaintext and ciphertext symbols no longer holds. In our message, for example, Z appears three times, representing two different letters, S and O. M appears twice, representing H and I. Conversely, the plaintext F is represented by Q, J, and R. The result is so nasty to analyze that Vigenère’s cipher was nicknamed le chiffre indechiffrable, with considerable justice. It remained unbroken for 300 years, and it took a genius to do it.
In Part II we’ll break the Vigenère cipher, defeat the Nazis, and complete this little history.
(Update: Actually it was Leon Alberti, in 1466, who first proposed polyalphabetic substitution. The “Vigenère” square first appears in Trithemius, in 1499, and Belaso, in 1553, added the repeating keyword. Vigenère himself synthesized these results in 1585. Thanks to AC Douglas for pointing some of this out.)