Variable vs. Fixed-Width Encodings

In 1836, people wanted to communicate with each other across great distances instantly—just like we do today—but technologies such as mobile phones and the Internet were still well over 100 years away.

Telegraphy was the first step toward bridging this communication gap. However, in its infancy, long-distance communication was limited to sending two states—either an electric signal or no electric signal. Samuel Morse developed a code comprising dots and dashes (or dits and dahs) that could be sent electronically via wires that spanned miles and miles from one city to another. On one end, a telegraph operator would use a key to send a message and on the other, an operator would hear the Morse code and transcribe it into letters, numbers, and punctuation. A skilled operator could instantaneously translate the Morse code into alphanumeric symbols. It was easily the fastest way to communicate with people across the state, country, or even internationally.

How effective was Morse code at sending messages? Let’s compare it to SMS text messaging:

https://www.youtube.com/embed/pRuRE-Bwk1U

This video seems a bit dated—they are using flip phones after all. What factors do you think might change the outcome today?

Morse Code

Morse code is a variable-width encoding. This means that each of the characters represented by the dashes and dots of Morse may be different lengths. This was done by design for efficiency. Samuel Morse knew that some characters would be sent much more often than others, and so the information required to send them should be less. For example, “E” and “T” each occur often and, as such, are represented respectively by one dot and one dash. “Z,” on the other hand, occurs infrequently, and so requires 4 bits to send (dash dash dot dot).

However, variable-length codes introduce some added complexity—how does one know where one character ends and another begins? In other words, what differentiates two “E’s” in a row from one “I”? Morse used time—the delay between characters—to delimit characters. But, in a way, this sacrifices robustness for efficiency. The sender’s perception of time may be different than the receiver’s—particularly at high speeds.

Baudot Code

Émile Baudot, a French telegrapher, sought to minimize this ambiguity by creating a fixed-width code. Every character sent was 5 bits long. There is no confusion over where one character ends and another begins because they are all the same length.

This is effectively similar to ASCII and Unicode. When the computer was young and the standards committee was designing a character set, they selected Baudot’s method rather than Morse’s. They were selecting for robustness over efficiency.

Like so many choices made in the history of computer science, there is no single correct answer. Each choice is a selection among trade-offs—efficiency, robustness, correctness, ease of use, etc.