Data compression

From Conservapedia
Jump to: navigation, search

Data compression is the process of compacting information to reduce redundancy and conserve storage space or transit time. This is used primarily by computers, as they either store or transfer information, but it can also be used in simpler communication or even storage, if the consumer is willing to "decompress" the information before it is meaningful.[1]


As with many slow, repetitive tasks, there are a variety of algorithms used to compress and decompress information. These algorithms can but processed by hand, but are almost always done by computers. Most use some form of the LZ (or Lempel and Ziv) adaptive dictionary-based algorithm to compress information. This method involves the creation of an index, which is then referred to each time a specific data fragment is needed.
For example, in Ronald Reagan's first inaugural speech, he spoke the following words, "In this present crisis, government is not the solution to our problem; government is the problem." If this sentence is compressed using the LZ adaptive dictionary-based algorithm, the result would look something like this:

  1. government
  2. is
  3. the
  4. problem
"In this present crisis, 1 2 not 3 solution to our 4; 1 2 3 4."

Now both the compressed quote and the index and saved, and a small amount of space is saved as well. While very little is actually saved in this case, the compression ration is fairly high, which means that a complete speech or even a book of speeches will probably compress very well.[2]
This is not the full extent of compression, however. While a human might not want to take hand-processed compression any farther, a computer has no difficulty in finding other patterns within words and indexing them as well. In this single quote, such an attempt would not be very profitable, but when a larger block of information is processed, this will become much more useful. Portions of words and letter groupings could be compressed, for example.[3]

Lossy vs lossless

Although pattern recognition is about the best that can be done when the exact information must be preserved, there are now even more effective methods of compression. These methods are referred to as "lossy" compression. "Lossless" compression must be used when "loss" is not an option. The previous example is one such case were loss is not acceptable, since the text would become meaningless. However, the "quantity over quality" method (lossy) is useful when a few incorrect and deleted bytes do not really matter. This is more common with multimedia and audio. If the byte is corrupted or lost occasionally while a video is being streamed, the consumer is very unlikely to even notice. The worst that will happen is that one pixel, for a single frame, will show the wrong color, or a single tone for a single instant will not be produced properly. When such as massive amount of information is being presented so quickly and for such a short time, the consumer is content, and Internet "bandwidth" is saved.[4][5]