CS 129 Project: Image Alignment Writeup

Name: Yan Li

Goal of Project:

The goal of this project is to align three images which represents different channel of one real image to form a new image. The collection is from a Russian photographer, Prokudin-Gorsky. He recorded three exposures of every scene, using red, blue, green filter. He reproduced the photo by adding these three images properly, and some shifts of individual image are needed because the three images cannot overlap excellently.

My job is to simulate this process, which is to align the three images of different channels and reproduce a color image. If just adding the three images to one another, the result will be blurry. Therefore, alignment is necessary and essential in this project.

Algorithm:

The algorithm basically uses image matching metric such as SSD(sum of difference) or NCC(normalized cross-correlation) to find the best shift between two images of channel. In this project, I firstly use the green channel to align against the blue channel, and then use red channel to align against blue channel. Therefore the image of blue channel is the base, with no shifts. A Guassian Pyramid is needed in this project. Some of the images' formats are tiff, which are generally larger than 3000*3000 pixels in size. Gaussian Pyramid can speed up the processing to a large extent, because it's a “coarse to fine” strategy.

The algorithm is like the following steps in my project:

Firstly align the green channel to the blue channel:

1 Build Gaussian Pyramid of one image, the maximum level depends on the size of the image itself.

If the highest-level image is too small and blurry, the result will be very bad. I limit the minimum size

of the highest-level image not to be smaller than 128*128 in size.

2 From the top of the pyramid, crops 20% of the pixels from the border, then applies matching metric

(SSD or NCC) with a range of displacement [-10,10], and finds the best shift vector.

3 Jumping to a lower level of the pyramid, firstly shifts the image from the vector got from the last step. Then

crops the image again. The shift vector should be doubled because the level is one step lower than the top.

Secondly shrinks the range of displacement and find the best shift vector again.

4 Continually do 3 until it is the lowest level of the pyramid. A final shift vector is obtained then.

Secondly align the red channel to the blue channel:

the process is the same as the above.

SSD and NCC:

I uses both of the two matching metric to do the alignment. It turns out SSD is faster, because NCC must compute the covariance of an image, and the standard deviation, which is time-consuming. Generally speaking, both of the two methods work very well, except one image:

(Using SSD) (Using NCC)

This is a special image because the color region that is best used for alignment is the person in the middle, specifically his cloth. But his cloth is drastically changed in color, which means the RGB channel image is totally different, and it's hard to align. Using NCC can generate a better result because it's a matching metric used to compute the correlation, which varies from 0.0-1.0 ( The value can be high even if the tendency of the two images are totally reverse, to find the highest NCC value is to find the shift that can best preserve the tendency and the difference).

Extra Credits:

Better Feature:

Instead of computing the color difference, I use the Canny Edge Detection to detect the edges in one image, and align two images based on edge similarity. The results turns to be good, but the speed is a bit lower. This is because doing edge detection is a time- consuming job. To apply the matching metric, no needs to compute NCC or SSD, since the image of edge only has pixels of 1.0 or 0.0, what I only need to do is to compute the difference between pixels directly.

Automatically Cropping:

I use the Hough Transform to detect the straight lines near the border. But some noises can generate false straight lines (e.g. If the pixels near the border are overly dense, some false lines will be detected which disturbs the final result.).Furthermore, the border of the image is not perfectly straight, and some weak lines of the border will be excluded when doing edge detection. Therefore I cannot make sure that all the images can be cropped properly all the time. Here are some of the final results:

Other Images: