Unicode vs. ASCII

Unicode vs. ASCII

This chunk of rock is the Rosetta Stone. Historically, it is important because it allowed the first deciphering of otherwise strange symbols found in ancient Egyptian ruins. It contained one piece of narrative text in three different forms—in ancient Egyptian hieroglyphics, Ancient Demotic, and Ancient Greek. This allowed historians to translate the hieroglyphics into Greek, which was already well-understood.

It is important to remember that not all languages are represented with the same 26 symbols that comprise American English.

As you may remember, ASCII stands for American Standard Code for Information Interchange. So it probably comes as no surprise that it was designed to meet the alphabetic needs of American languages like English. However, with this narrow bias, ASCII fails to provide for the many letters and characters that are common to the multitude of other world languages beyond English. As computers have become more and more ubiquitous, and people who speak languages other than English have begun to use computers, program, and participate in social media, the need to extend this venerable standard to accommodate them has grown.

The problem, of course, is that the ASCII table allows a very limited set of symbols—seven bits’ worth. In order to make room for more symbols, more bits are needed. How many are needed?

The current solution is to use the newer Unicode standard. Unicode is a binary encoding system that can represent much more of the world’s text than ASCII can. Unicode allows computers to represent most of the world languages’ alphabets, not just English.

  • The ASCII table includes 27 values (7 bits).
  • Unicode includes 216 values (16 bits).

which means:

  • ASCII can represent 128 different characters.
  • Unicode can represent 65,536 different characters!

Anything encoded with ASCII will work with Unicode; the first 128 symbols are the same. Extra zeroes are used to pad the beginning of the Unicode equivalent to an ASCII encoding to account for Unicode’s extra state space (216 vs. 27 possibilities).

For example, the letter “U” is represented by both as:

ASCII Unicode
1110101 0000000001110101

Using Unicode with an ASCII Keyboard

Common misconception: The characters available for use are restricted to those on the keyboard.
    Unicode interfaces (alternative codes, character maps, even extended keyboards) allow for a huge range of symbols. Before Unicode we ate “jalapenos.” Now we can eat “jalapeños,” which are much tastier.

Can somebody type in other languages using Unicode and any keyboard? Yes! Even with the keyboards that are primarily found in the United States, we can use, type, and encode in other languages using Unicode. Here’s how:

  • Instructions for entering Unicode using:

For a web accessible application that generates Unicode character tables, try The Unicode Range Viewer.

Additionally, you may like to see some of the many examples of world language symbols in context. Try entering some common English words and phrases into a language translation site such as Foreignword.com or Google Translate.

An Arabic keyboard. Click here to see (and use) more language-specific keyboards.