Theory

The idea of using image self-similarities to achieve compression is first attributable to Michael Barnsley, leading researcher from Georgia Institute of Technology, who found a way of applying this idea to representation and compression with the mathematics of Iterated Functions Systems (IFS). Michael has published a book called "Fractals Everywhere," which explores uses of IFS in real world.

The basic idea of Fractal Image Compression is to use IFS and find contraction maps such that the original image is the attractor. The more accurate the contraction maps the better the quality of the compressed image. However, it becomes very CPU intense to find the right maps, so the transforms are made fairly simple.

We divide the source image into range blocks then we attempt to find a larger area of the image called the domain block that is as similar as possible to the range block after contraction map. Each contraction map in these IFS is a basic transform. As part of the transform we record the x and y coordinates of domain and range blocks, the scale and offset based on the average pixel, and the symmetry operation. We restrict ourselves to the following common symmetry operations: rotation by 90, 180, 270 degrees; flip about horizontal, vertical, two main diagonals.

Kominek's paper describes how the optimum intensity scaling factor, offset and error can be determined algebraically from the assosiated domain and range blocks.

Coloured images in raw form are stored in three channels assosiated with values of red, green and blue. Previous research has shown that our eye is more sensitive to intensity and colour green, so to take advantage of this knowledge my algorithm converts a raw image to YCbCr format. As shown in the results page the encoding process becomes more efficient.

Implemintation

Most examples on the web that I have seen of fractal image compression involve slow Java applets. This is why I have decided to implement my compression using C++ and make it efficient at encoding and decoding RGB images. Research suggests that the most practical encoder to use is quadtree.

Quadtree encoder uses the idea that if no domain block fits our target range block within a certain threshold then we split the range block into four squares and recursively check them. This recurrence theoretically builds a tree that provides the image compression with optimum quality. The approach also makes number of contraction maps variable, since the recurrence can go to the smallest size, which in my case has been chosen to be two by two pixels. The encoder is applied to each channel separately.

The decoder is a simple loop that takes a middle-grey image and applies each transformation one by one to one channel at a time. The decoder's performance is improved proportianately to transform computing performance.

To improve performance of each transform is to make sure each pixel is looked at only once. This is done by determining which way we need to run through the image at the beginning, so no additional work needs to be done as we run through the image (eg. going from left to right or right to left on the x-axis). However, there is no real need to improve the decoding process because it is already very. On the other hand, the encoder provides sufficient area for improvement. To improve encoding I have down sampled the image way before domain and range blocks are matched. This is much better over computing down sampled block per each iteration.

Currently to find the right symmetry operation we try everyone of them. Future improvements can be done by "guessing" which symmetry operation is the best. This is accomplished by taking the average value of each quadrent in a block and applying transformation to total of only four pixels.

To make implementation easier (due to time constraints) I have assumed that the width and height must be the same and divisible by 32. The input and output images are handled by RAW format. This is a very basic format that stores the raw value for each channel. For example, RGB image would be stored as R_0, G_0, B_0, R_1, G_1, B_1, ..., R_n, G_n, B_n where n is equals to width times height.