CS129 Project 1: Image Alignment with Pyramids

Reese Kuppig (rkuppig)

Before the advent of color photography, Sergei Mikhailovich Prokudin-Gorskii (1863-1944) proposed a method for recording a color image as three separate black and white exposures, each taken through a colored filter lens onto a glass plate. By this method, each plate essentially recorded an intensity map for a certain color channel of the RGB colorspace, and by projecting the three resulting intensity plates back through their corresponding color filters, they would combine to reproduce an accurate color image. The following algorithm serves to process the glass plate images of the Prokudin-Gorskii photography collection, automatically aligning and compositing digital images of the RGB color-channel intensity plates to form a unified, color image.


Algorithm

The main steps of the algorithm are laid out below, with additional detail given for the inner steps of the imalign function, which performs the image alignment computation.

Image Pyramids

An image pyramid organizes a given image into a multi-scale representation, where each layer of the pyramid contains a version of the image at a different scale. To generate an image pyramid, the original image is first blurred, then subsampled by the desired scale factor. The blurring operation helps to counteract any aliasing effects that would occur from subsampling. To create each successive layer, the contents of the current layer are again blurred and subsampled, until some minimum dimension condition is met. For this algorithm, the generated image pyramids utilized a gaussian blur radius of 3 and scale factor of 0.5.

Image pyramids allow for a more efficient search, constraining the iterative refinement of the displacement estimate. While the initial displacement estimate must result from an exhaustive search of displacement vectors over an image, performing this initial search at a lower-resolution pyramid level reduces runtime while still providing a meaningful estimate. Assuming the initial estimate is relatively accurate, with each step to a higher-resolution level of the image pyramid, only a small radius around the displacement estimate must be scored to refine the estimate effectively. This radius inversely corresponds to the scale factor, and in this algorithm, it was equal to one pixel.

Alignment Scoring

The score that a certain image alignment received was based on two comparisons. First, the pixel-wise intensity difference between images was scored using the sum of the squared differences. This metric performed surprisingly well on most images, but not all, so a second metric was introduced. For the second metric, each image was filtered with a DoG (derivative of gaussian) filter, in each dimension, to produce a gradient-intensity (edge-detection) image. To remove noise, the mean gradient intensity of each image was used as a threshold, and lower intensities were suppressed to 0. Again the pixel-wise sum of squared differences was used to score the gradient-intesity images. The combination of these two sums yielded the final score of the alignment, and thus lower scores reflected better alignment.

Extra Credit

As stated above, a gradient-intensity image was also used to score image alignments. This method was necessary to achieve sharp alignments, whereas the alignments using only the pixel-difference measures tended to leave slight artifacts. For some images, alignment by gradient-intensity failed, so pixel-difference was retained as a partial measure. The images for which gradient-intensity failed tended to lack distinct edges and were dominated by natural terrain and textures, leaving little, useful gradient-intensity information.

There was also an automatic crop feature that gave reasonable results in most cases. It worked by evaluating the edge pixels against certain thresholds, and if more than a certain percentage of the pixels in an edge row or column fell outside the threshold range, they were removed. The feature also allowed for a certain number of "skips," relative to the size of the image, to account for color inconsistencies.

00153v.jpg crop_00153v.jpg

Results

The results of the algorithm's alignment are displayed below. The composite image before alignment is displayed to the left, while the properly aligned and auto-cropped image is displayed to the right. All of the images displayed here are downsampled versions of the higher-resolution collection images. Despite some inconsistencies in the auto-crop feature, all tested images were properly aligned.

bad_01334u.jpg 01334u.jpg
bad_01602u.jpg 01602u.jpg
bad_01861a.jpg 01861a.jpg
bad_01043u.jpg 01043u.jpg
bad_00458u.jpg 00458u.jpg
bad_00911u.jpg 00911u.jpg
bad_01069u.jpg 01069u.jpg
bad_01047u.jpg 01047u.jpg