Todays world is an on line world. Those on line seek to share their experiences with others via images and videos. But this would all not be possible if it wasn’t for the data compression techniques. The reasons for this compression arise from the fact that there are transmission bandwidth limitations in place in order to give everyone fair access to the World Wide Web. Web administrators also look to cut as many costs as possible such as storage capacity in order to hold more for less.
So how do we compress data then? Well the aim is to obtain a more efficient representation of the image while preserving the essential information contained in the data. This is done by using clever information about how we perceive our environment. In images, not all the information is needed to represent it on a computer screen. This is due to the fact that the human eye is less sensitive to fine color detail than it is to fine brightness details due to the densities of receptors in the eye. The human eye is also good at seeing small differences in brightness over large areas but not exact strengths of high frequency brightness variations. This all simply means that the eye does not see all the colors in the environment and is also more sensitive to different brightness variations. Therefore by removing duplicate and redundant information, we can compress an image or video. However audio is a little more difficult due to the fact that our ears have a far more dynamic range of sounds than the eye has to pictures.
There are two main types of compression being Lossless and Lossy. Lossless compression retains more data at the expensive to lower compression. It also offers the advantage that the image can be encoded/decoded without loss of information. Lossy on the other hand provides higher compression ratios though poorer image quality. Despite the loss of information, the lossy algorithms can encode and decode images without any “visual” difference (for the human eye) from the original image.
To compress data, there are 5 main algorithms that exist. These being Run Length Encoding, Chain Coding, Vector Quantization, Arithmetic Coding, Predictive Coding. I shall not go into how each of these work, but if any of you need an explanation, leave a comment and ill be happy to explain it to you.
In terms of compression, the choice of algorithm depends on 3 factors: compression efficiency, compression complexity and the distortion.
A picture is worth a thousand words. But without compression techniques, how would we display the trillions of images on the WWW today.
JPEG (Joint Photographic Expert Group) is the most popular image compression technique designed for gray scale and full color images of natural world scenes due to the fact more data can be removed and can be either simple or progressive for web pages. JPEG uses a lossy compression technique to exploit the known limitations of the human eye as mentioned above. It allows a trade-off between quality and size. The main disadvantage of this algorithm is the fact that repeated compression and decompression results in increased degradation. Gray scale images compress less than that of color images as the human is more sensitive to the brightness variations than to hue variations. Compression ratios of up to 20 to 1 are possible. The algorithm proceeds as follows:
- Translate the image into a suitable color
- Group the pixel values into 8×8 blocks (experiment prove thats the best section size)
- Apply a Discrete Cosine Transformation (DCT) – I assume you know what that is, if not drop me a comment
- Divide each block by a separate quantization coefficient based on luminance and round the result to the nearest integer
- Next to more encoding using either Huffman or arithmetic coding.
- Output the image
MPEG (Moving Pictures Expert Group) is one of the best compression techniques for videos and draws its inspiration from JPEG. It is used to represent video and audio signals exploiting more perceptual redundancies and allows for up to 30 to 1 compression ratios however audio is less due to the fact explained above. The algorithm basically predicts the motion from frame to frame in the temporal direction, and then uses discrete cosine transforms to organize the redundancy in the spatial directions.
- Convert the image to YUV space
- Apply the discrete cosine transforms (DCTs) to 8×8 blocks and use the luminance channel to predict the motion
- Quantize the DCT coefficients
- Encode the DCT coefficients + parameters using Huffman/arithmetic coding
There are many other compression techniques out there but I prefer to narrow it down to the mainstream techniques. For any other questions or more information related to this topic, please do not hesitate to drop us a comment.