CS129 Project #1: Image Alignment with Pyramids

By: Bryce Aebi

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) came up with a way to produce color photos before his time. He would take three exposures of a single scene, covering the camera lense with a different color filter (red,blue, and green) for each. To reproduce the color image, he would project all three exposures on top of one another while covering each projector with an appropriately colored filter.

Objective

The goal of this project was to take three monochrome images and automatically format and align them with each other. After alignment, the images could be inserted into red, green, and blue channels to produce a single color image. Although each set of three images are of the same scene and taken almost simultaneously, they do not initially align with eachother and each possesses its own set of border artifacts.

Method

My program aligned the images two at a time by repeatedly shifting and taking the sum of squared differences. Since an image is a matrix of pixels denoted as brightness values, it is possible to subtract the matrices from eachother. By squaring the difference at each index, negative numbers are removed and a "score" can be reached by taking the sum of all these values. By testing a range of vertical and horizontal shifts, a good alignment can be found by identifying the minimum sum of squared differences out of all the shifts. Small differences generally mean that the brightness levels of both images are aligned with each other. Since each image has a distinct border, the program uses a distinct window centered on each image to compute the sum of squared differences values.

Multi-Scale Implementation

Although the aforementioned method of alignment works well for images in the range of 400x400 pixels or smaller, it does not scale well to larger images. The exhaustive search for the best alignment simply takes too long. To circumvent this problem, one can implement an image pyramid. An image pyramid is a collection of multiple scaled versions of an image. To create one, an image is filtered with a Gaussian blur before it is scaled by one half. The Gaussian blur removes high frequencies from an image, promoting an accurate depiction of the original image after scaling. This process of blurring and scaling is repeated until there are images ranging from the full size to "small."

The algorithm for aligning an image using its image pyramid is significantly faster than an exhaustive search. To align an image, the align function is called on each image starting from the smallest scale and ending at the full size. Each time, the images are aligned to the best of the function's ability for that scale. The calculated shifts are applied to the next largest scaled versions of the images before a new alignment process begins. The "window" used to calculate the sums of squared differences remains a constant size regardless of the image sizes. This is because each successive image does not have to search over as great a range, as it has already been partially aligned by preceeding alignments. The number of shifts tested can also be reduced. A constant range of pixels to shift over affects a much wider portion of a smaller image than a larger image.

A scaled version of an image processed at a size larger than 3000x3000 pixels

Extras

Improved Contrast

The idea behind the contrast improvement algorithm is that the brightest pixel in the image and the dimmest pixel in the image can be set to 1 and 0 respectively (where 1 is as bright as possible and 0 is as dim as possible). The brightness of the remaining pixels can then be scaled to fit the widened brightness range. This method does not work however, as pixels with values of 1 and 0 actually exist in the images. To increase the contrast I essentially defined the brightest and dimmest values in the image to be a value less than 1 and a value greater than 0, respectively, before scaling all the other pixels to the values they would scale to had these defined brightness and dimness values been set to 1 and 0.

An image without enhanced contrast

An image with enhanced contrast

Border Cropping

A variety of border artifacts exist in any color image created from its 3 monochromatic counterparts. Not only do the monochromatic images contain widely varying black and white borders, the result of aligning the images results in "hanging edges." To remove these border artifacts while minimizing overcropping, three separate crops are performed.

An uncropped image

An auto-cropped image

The first round of cropping simply calculates the number of pixels in the x and y directions that were shifted in the aligning process. The edges of the image are then cropped to account for the shifts. The next two rounds of cropping use the same functions for the white and then black borders. The functions could actually be called on each image repeatedly until they no longer find borders to crop, but as all the images had two borders or less the cropping functions are called twice only.

The actual cropping process of the black and white borders uses smaller scales of the full sized images and then scales the cropping measurements so that they can be used on the full-sized images. Using the scaled images greatly improves the speed of the program. To detect the borders, an edge detection function is used, which sets all edge pixels to 1 and everything else to 0. The cropping functions then iterate along the rows going from the edge of the image towards the middle. The program recognizes that it has reached an edge when it "sees" a certain number (tolerance) of white pixels in a single row. It then continues iterating down the rows until it "sees" less than the tolerance number of pixels, which signals that it has passed the border. This is where the image will be cropped. The drawback to using this approach is that one must assume that the image is square and that the borders are relatively horizontal and vertical.

Original Image

Edge-detected Image

First Crop

Second Crop

Conclusion

My program can process 400x400 pixel images in half a second and images on the order of 3000x3000 pixels in 20-30 seconds depending on the user settings. In almost all cases, the images are aligned such that one cannot distinguish that the image is composed of 3 monochromatic images. Finally, the contrast function I created can take different settings which can vary the intensity of the contrast, and my cropping functions rarely leave artifacts along the borders or overcrop the image.

Contact