Image Processing – JPEG

A Digital Image

Spatial Resolution/Sampling: Number of pixels in x and y axis.
Quantization: Number of bits to represent one pixel. Generally, 8 bits are used store a pixel. $2^8$ = 256 levels. 0 indicates low intensity and 255 indicates high intensity.

Example: A 1000 x 1000 image with each pixel encoded using 8 bits requires 1000 x 1000 x 8 bits = 8,000,000 = 8 Mb = 1 MB.

To conserve space, we compress the image using some codec. One of the famous codec for image compression is JPEG.

JPEG

JPEG_CODEC

1. DCT – Discrete Cosine Transform

Change of basis from standard basis to DCT basis. Or it can also be observed as correlation between the DCT co-efficient image and the input image.

2. Quantization

The encoder used in JPEG to compress the input image is [Huffman coding]. For Huffman coding to work effectively, we need to map groups of values to a single value. For example, 0 – 10 mapped to 0, 10 – 20 mapped to 10 etc.

This is achieved by quantization. We care mostly about the co-efficients in the lower frequency and not so much in the higher frequencies. Hence, level of quantization is less at the beginning and drastically increased at the end.

Quantization Example:

Example 1
$x_{quantized} = \left \lfloor{\frac{x}{2}}\right \rfloor * 2$

For values of $x$ between 0, 1: $x_{quantized} = 0$ For values of $x$ between 2, 3: $x_{quantized} = 2$ etc,

Example 2
$x_{quantized} = \left \lfloor{\frac{x}{10}}\right \rfloor * 10$

For values of $x$ between 0, 9: $x_{quantized} = 0$ For values of $x$ between 10, 19: $x_{quantized} = 10$ etc.

JPEG Quantization Matrix

Quantization Matrix

JPEG quantizes the input image in a zig-zag fashion. First, it will quantize (0, 0) pixel. Then (0, 1). And later, (1, 0) and so on.

The corresponding values in the quantization matrix (zig-zag) is 52, 55, 63, 61, 59, 62 and so on. Using these values, you quantize the DCT co-efficients. Once you hit 0, it just notes down the element that where the quantized co-efficient is zero and all other co-efficients past this element should also be zeros (hence ignored).

By scaling the quantization matrix, the quality of the image can be controlled.

3. Huffman Encoding

It is based on the principle of power law. Only few of the characters in the English language appers more than all the other characters combined. Similarly, the histogram of the DCT co-efficients shows that few values appear more than all other values combined.

By assigning codes of smaller lengths to the most frequently occurring values, the total size of the image can be compressed.

The exact reverse process happens in the decoding stage.

MPEG works on predictive error quantization. It tries to predict the next pixel value based on the previous pixel values and checks the error. It then encodes this error.

ML/DL Page

# Image Processing – JPEG