Data Compression and Decompression Algorithms

Table of contents

INTRODUCTION

Data compression is a common requirement for most of the computerized applications. There are a number of data compression algorithms, which are dedicated to compressing different data formats. Even for a single data type, there are a number of different compression algorithms, which use different approaches. This paper examines lossless data compression algorithms.

1. DATA COMPRESSION: In computer science, data compression involves encoding information using fewer bits than the original representation. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. Because compressed data must be decompressed to be used, this extra processing imposes computational or other costs through decompression.

1. 1 Classification of Compression:

  • a) Static/non-adaptive compression.
  • b) Dynamic/adaptive compression.
  • c) Static/Non-adaptive. Compression: A static method is one in which the mapping from the set of messages to the set of codewords is fixed before transmission begins so that a given message is represented by the same codeword every time it appears in the message ensemble. The classic static defined-word scheme is Huffman coding.
  • d) Dynamic/adaptive compression: A code is dynamic if the mapping from the set of messages to the set of codewords changes over time.

2. 2 Data Compression Methods:

  • a) Lossless Compression: Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in Lossless compression is possible because most real-world data has statistical redundancy. For example, an image may have areas of color that do not change over several pixels; instead of coding “red pixel, red pixel, … the data may be encoded as “279 red pixels”. Lossless compression is used in cases where it is important that the original and the decompressed data be identical, or where deviations from the original data could be deleterious. Typical examples are executable programs, text documents, and source code. Some image file formats, like PNG or GIF, use only lossless compression
  • b) Lossy Compression: In information technology, lossy compression is a data encoding method that compresses data by discarding (losing) some of it. The procedure aims to minimize the amount of data that needs to be held, handled, and/or transmitted by a computer. Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony. If we take a photo of a sunset over the sea, for example, there are going to be groups of pixels with the same color value, which can be reduced. Lossy algorithms tend to be more complex, as a result, they achieve better results for bitmaps and can accommodate for the loss of data. The compressed file is an estimation of the original data. One of the disadvantages of lossy compression is that if the compressed file keeps being compressed, then the quality will be degraded drastically.

3. Lossless Compression Algorithms: Run-Length Encoding(RLE): RLE stands for Run Length Encoding. It is a lossless algorithm that only offers decent compression ratios in specific types of data. How RLE works: RLE is probably the easiest compression algorithm. It replaces sequences of the same data values within a file by a count number and a single value. It is important to know that there are many different run-length encoding schemes. The above example has just been used to demonstrate the basic principle of RLE encoding. Sometimes the implementation of RLE is adapted to the type of data that is being compressed.

4. Complexity and Data Compression: We’re used to talking about the complexity of an algorithm measuring time and we usually try to find the fastest implementation, like in search algorithms. Here it is not so important to compress data quickly but to compress as much as possible so the output is as small as possible without losing data. A great feature of run-length encoding is that this algorithm is easy to implement.

5. Advantages and disadvantages: This algorithm is very easy to implement and does not require much CPU horsepower. RLE compression is only efficient with files that contain lots of repetitive data. These can be text files if they contain lots of spaces for indenting but line-art images that contain large white or black areas are far more suitable. Computer-generated color images (e. g. architectural drawings) can also give fair compression ratios. Where is RLE compression used? RLE compression can be used in the following file formats: PDF files

6. HUFFMAN CODING: Huffman coding is a popular method for compressing data with variable-length codes. Given a set of data symbols (an alphabet) and their frequencies of occurrence (or, equivalently, their probabilities), the method constructs a set of variable-length codewords with the shortest average length and assigns them to the symbols. Huffman coding serves as the basis for several applications implemented on popular platforms. Some programs use just the Huffman method, while others use it as one step in a multistep compression process.

7. Huffman Encoding: The Huffman encoding algorithm starts by constructing a list of all the alphabet symbols in descending order of their probabilities. It then constructs, from the bottom up, a binary tree with a symbol at every leaf. This is done in steps, where at each step two symbols with the smallest probabilities are selected, added to the top of the partial tree, deleted from the list, and replaced with an auxiliary symbol representing the two original symbols. When the list is reduced to just one auxiliary symbol (representing the entire alphabet), the tree is complete. The tree is then traversed to determine the codewords of the symbols.  BCA is in the Dictionary. BCAA is not in the Dictionary; insert it.

8. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it. LZ78 Compression : No of bits transmitted: Uncompressed String: ABBCBCABABCAABCAAB

Number of bits = Total number of characters * 8 = 18 * 8 = 144 bits

Suppose the codewords are indexed starting from 1:

Compressed string( codewords): (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)

Codeword index 1 2 3 4 5 6 7.

Each code word consists of an integer and a character:

The character is represented by 8 bits. The number of bits n required to represent the integer part of the codeword with index i is given by:

Codeword (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) index 1 2 3 4 5 6 7

Bits: (1 + 8) + (1 + 8) + (2 + 8) + (2 + 8) + (3 + 8) + (3 + 8) + (3 + 8) = 71 bits

The actual compressed message is: 0A0B10C11A010A100A110B

9. Decompression Algorithm: Dictionary empty

Published by Terry Welch in 1984it basically applies the LZSS principle of not explicitly transmitting the next nonmatching symbol to the LZ78 algorithm. The only remaining output of this improved algorithm is fixed-length references to the dictionary (indexes). If the message to be encoded consists of only one character, LZW outputs the code for this character; otherwise, it inserts two- or multi-character, overlapping, distinct patterns of the message to be encoded in a Dictionary. Overlapping: The last character of a pattern is the first character of the next pattern.

10. Algorithm:

Initialize Dictionary with 256 single character strings and their corresponding ASCII codes; Prefix first input character; CodeWord 256; while(not end of character stream){ Char next input character; if(Prefix + Char exists in the Dictionary) Prefix Prefix + Char; else{ Output: the code for Prefix; insertInDictionary( (CodeWord , Prefix + Char) ) ; CodeWord++; Prefix Char; } } Output: the code for Prefix; Example : Compression using LZW Encode the string BABAABAAA by the LZW encoding algorithm. 1. BA is not in the Dictionary; insert BA, output the code for its prefix: code(B) 2.

AB is not in the Dictionary; insert AB, output the code for its prefix: code(A) 3. BA is in the Dictionary. BAA is not in Dictionary; insert BAA, output the code for its prefix: code(BA) 4. AB is in the Dictionary. ABA is not in the Dictionary; insert ABA, output the code for its prefix: code(AB) 5. AA is not in the Dictionary; insert AA, output the code for its prefix: code(A) 6. AA is in the Dictionary and it is the last pattern; output its code: code(AA) Compressed message: The compressed message is: <66><65><256><257><65><260> LZW: Number of bits transmitted

11. Decoding algorithm: Initialize Dictionary with 256 ASCII codes and corresponding single character strings as their translations; PreviousCodeWord first input code; Output: string(PreviousCodeWord) ;

Char character(first input code); CodeWord 256; while(not end of code stream){ CurrentCodeWord next input code ; if(CurrentCodeWord exists in the Dictionary) String string(CurrentCodeWord) ; else String string(PreviousCodeWord) + Char ; Output: String; Char first character of String ; insertInDictionary( (CodeWord , string(PreviousCodeWord) + Char ) ); PreviousCodeWord CurrentCodeWord ; CodeWord++ ; } Summary of LZW decoding algorithm: output: string(first CodeWord); while(there are more CodeWords){ if(CurrentCodeWord is in the Dictionary) output: string(CurrentCodeWord); else utput: PreviousOutput + PreviousOutput first character; insert in the Dictionary: PreviousOutput + CurrentOutput first character; } Example : LZW Decompression Use LZW to decompress the output sequence <66> <65> <256> <257> <65> <260> 1. 66 is in Dictionary; output string(66) i. e. B 2. 65 is in Dictionary; output string(65) i. e. A, insert BA 3. 256 is in Dictionary; output string(256) i. e. BA, insert AB 4. 257 is in Dictionary; output string(257) i. e. AB, insert BAA 5. 65 is in Dictionary; output string(65) i. e. A, insert ABA 6. 60 is not in Dictionary; output previous output + previous output first character: AA, insert AA

Reference

  1. http://www.sqa.org.uk/e-learning/BitVect01CD/page_86.htm
  2. http://www.gukewen.sdu.edu.cn/panrj/courses/mm08.pdf
  3. http://www.cs.cmu.edu/~guyb/realworld/compression.pdf
  4. http://www.stoimen.com/blog/2012/01/09/computer-algorithms-data-compression-with-run-length-encoding/
  5. http://www.ics.uci.edu/~dan/pubs/DC-Sec1.html#Sec_1
  6. http://www.prepressure.com/library/compression_algorithms/flatedeflate
  7. http://en.wikipedia.org/wiki/Data_compression

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with MyStudyWriters
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
Testimonials
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Business and administrative studies
excellent work
Customer 452773, March 9th, 2023
fin571
EXCELLEN T
Customer 452773, March 21st, 2024
Leadership Studies
excellent job
Customer 452773, August 3rd, 2023
10th grade English
very good
Customer 452773, March 26th, 2023
Business and administrative studies
Excellent job
Customer 452773, March 9th, 2023
History
Looks great and appreciate the help.
Customer 452675, April 26th, 2021
Leadership Studies
awesome work as always
Customer 452773, August 19th, 2023
Business and administrative studies
always perfect work and always completed early
Customer 452773, February 21st, 2023
Criminal Justice
The paper was not accused of plagiarism and was written very well. I will let you know the grade once it is graded. Thank you
Customer 452671, April 26th, 2021
Leadership Studies
excellent job
Customer 452773, July 28th, 2023
Criminal Justice
This has been the greatest help while I am recovering from an illness. Thank your team so much.
Customer 452671, May 2nd, 2021
English 101
IThank you
Customer 452631, April 6th, 2021
11,595
Customer reviews in total
96%
Current satisfaction rate
3 pages
Average paper length
37%
Customers referred by a friend
OUR GIFT TO YOU
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat
Close

Sometimes it is hard to do all the work on your own

Let us help you get a good grade on your paper. Get professional help and free up your time for more important courses. Let us handle your;

  • Dissertations and Thesis
  • Essays
  • All Assignments

  • Research papers
  • Terms Papers
  • Online Classes
Live ChatWhatsApp