Compression Algorithms

The Importance of Efficiency

Mathematicians and computer scientists do not like to waste their time repeating themselves. They want to save time, space, and complexity wherever possible. In fact, many of the most common mathematical notations can be considered compression algorithms. For example, consider repeated multiplication, which is rife with redundancy:

Efficiency Expression
Original 5 × 5 × 5 × 5 × 5 × 5 × 5

Exponential notation was created to alleviate this degree of redundancy and improve legibility:

Efficiency Expression
Original 5 × 5 × 5 × 5 × 5 × 5 × 5
Compressed 57

Multiplication itself is really nothing more than repeated addition:

Efficiency Expression
Original 5 + 5 + 5 + 5 + 5 + 5 + 5
Compressed 5 x 7

Watch the video below to learn more about how compression works in terms of mathematical operations, which as you know, are critical to our ability to manipulate digital media:

https://www.youtube.com/embed/cbBFnlwK98M

The ultimate goal of compression algorithms is to condense information into its most space-efficient, compact form and minimize any waste of space.

Let’s take a look at a concrete way of how compression works in computer science and how it may affect you.

Compressed Web Pages

Which Internet browser do you use? Chrome? Firefox? Internet Explorer? Safari? Each of these browsers has their advantages and disadvantages.

Opera is another browser option, and it takes a different approach to mobile Internet browsing. Basically, web pages are pre-loaded on Opera’s servers. The compressed pages are then sent to the mobile device when a request is made. By compressing the web pages before sending them to the mobile phone, Opera reduces the quantity of data that needs to be sent over the data network. As a result, web pages load faster on mobile devices and incur fewer charges for data transfers.

https://www.youtube.com/embed/MVZNTV-cAMg

Evaluating Compression Algorithms

Not all compression algorithms are created equal. The following guidelines may help users evaluate the effectiveness of algorithms:

  • Important characteristics of useful compression “algorithms":
  1. The process/procedure nature of algorithms allows for the act of transforming the text.
  2. The consistency of algorithms ensures the reliability of results (always same result).
  3. The fact that algorithms are ordered processes implies reversibility, so that the compressed text can be decompressed (not guaranteed—only for lossless).
  • A good measure for comparing the effectiveness of compression algorithms is the ratio of length from original to compressed (perhaps denoted as a percentage). Note: a better compression ratio over one string of text does NOT guarantee that one algorithm is more effective than another. Some forms of compression may be tuned to a specific type of data (like text, music, images, etc.).

Watch the video below to see how two compression algorithms vary and to identify the advantages and disadvantages of these compression algorithm strategies.

https://www.youtube.com/embed/xyKA4arxQ5I

Common Misconception: One compression algorithm is the best.
  • It depends on the context and the patterns of repetition in the data compressed. Also, lossy compression may be acceptable for some types of data (images) and not for others (text).

Since no single compression algorithm is best, it’s important that one can analyze and evaluate compression algorithms to identify the optimal strategy for given data in a given context.

Lossy vs. Lossless Compression

Not all data can be compressed! There is a point at which compression results in a loss of pertinent information. Otherwise one could theoretically compress the content of all the books in the world onto a single sheet of paper (On a related note, read this forum post about compressing a compressed file).

For example, JPG images and MP3 files are already compressed media files. There are firm limits to how much information can be compressed without losing too much, and we use the terms lossless and lossy to describe different compression algorithms that adhere to these limits and those that don’t.

Lossless means “without loss,” just like hopeless means “without hope.” Lossless compression means that compression has occurred with zero loss of information. On the other hand, lossy compression indicates that there has been some data lost through compression. Different lossy compression algorithms can result in different amounts of lost data. Let’s consider an example of lossy compression.

The most common lossy compression occurs when you “rip” an audio CD and convert the tracks to MP3s. The images below represent the digital soundwave that exists before and after various stages of compression from raw audio to MP3. As you scroll down and the audio becomes more compressed, what differences can you notice?

Perhaps it’s easier to see quality loss in lossy compression with compressed images:

As the image is compressed, there are obvious data/quality losses.

Though these examples are indicative of lossy compression, not all compression is lossy. As mentioned previously, lossless compression involves no data loss. With lossless compression, you can convert back to the original at any time (may be time-intensive, however). For example, FLAC audio loses no quality over CD.

This raises the question:

If lossless compression exists, why would anybody want to use lossy compression?

Lossy compression algorithms often result in greater reductions of file size, offer the best compression ratios, and are designed to be “good enough” approximations. A good example might be the use of a thumbnail by a website as a link to a larger image. The compressed thumbnail version reduces the amount of data necessary to load the page, but if users would like to see the original image, they can follow a link to that version.

However, lossy compression can never be “undone,” because the lossy compression algorithms remove information that isn’t necessary for representation, but can never be reconstructed once lost. In other words, you cannot go from a lossy-compressed image back to the original image.