CS129 Project 1: Image Alignment with Pyramids

Before the advent of color photography, Sergei Mikhailovich Prokudin-Gorskii (1863-1944) proposed a method for recording a color image as three separate black and white exposures, each taken through a colored filter lens onto a glass plate. By this method, each plate essentially recorded an intensity map for a certain color channel of the RGB colorspace, and by projecting the three resulting intensity plates back through their corresponding color filters, they would combine to reproduce an accurate color image. The following algorithm serves to process the glass plate images of the Prokudin-Gorskii photography collection, automatically aligning and compositing digital images of the RGB color-channel intensity plates to form a unified, color image.

Algorithm

The main steps of the algorithm are laid out below, with additional detail given for the inner steps of the imalign function, which performs the image alignment computation.

The digital image of the three glass plates is segmented into individual plate images, each representing the intensity map of its respective RGB color channel.
Using imalign, the displacement needed to align the green-channel plate to the blue and to align the red to the blue is calculated.

imalign

Generate an image pyramid for each color plate image.
Start at the sixth level of each pyramid, which guarantees that the smaller dimension of the starting image will be between 32 and 64 pixels, which was found to be the smallest, while still being useful, image size range for finding the initial displacement estimate. Just for this starting level, score every possible alignment, while recording the best displacement. This will serve as the first displacement estimate.
After finding a starting displacement, step down the image pyramid (increasing the image size) and score the displacements in the immediate radius of the current displacement estimate.
If there is a better-scoring displacement, update the current estimate, and once all candidate displacements are scored, proceed to the next pyramid level and repeat.
If there are no more pyramid levels, return the final displacement estimate.

The red and green plate images are displaced by the calculated vectors, and are layered with the blue plate image to create an RGB image matrix.

Image Pyramids

An image pyramid organizes a given image into a multi-scale representation, where each layer of the pyramid contains a version of the image at a different scale. To generate an image pyramid, the original image is first blurred, then subsampled by the desired scale factor. The blurring operation helps to counteract any aliasing effects that would occur from subsampling. To create each successive layer, the contents of the current layer are again blurred and subsampled, until some minimum dimension condition is met. For this algorithm, the generated image pyramids utilized a gaussian blur radius of 3 and scale factor of 0.5.

Image pyramids allow for a more efficient search, constraining the iterative refinement of the displacement estimate. While the initial displacement estimate must result from an exhaustive search of displacement vectors over an image, performing this initial search at a lower-resolution pyramid level reduces runtime while still providing a meaningful estimate. Assuming the initial estimate is relatively accurate, with each step to a higher-resolution level of the image pyramid, only a small radius around the displacement estimate must be scored to refine the estimate effectively. This radius inversely corresponds to the scale factor, and in this algorithm, it was equal to one pixel.

Alignment Scoring

The score that a certain image alignment received was based on two comparisons. First, the pixel-wise intensity difference between images was scored using the sum of the squared differences. This metric performed surprisingly well on most images, but not all, so a second metric was introduced. For the second metric, each image was filtered with a DoG (derivative of gaussian) filter, in each dimension, to produce a gradient-intensity (edge-detection) image. To remove noise, the mean gradient intensity of each image was used as a threshold, and lower intensities were suppressed to 0. Again the pixel-wise sum of squared differences was used to score the gradient-intesity images. The combination of these two sums yielded the final score of the alignment, and thus lower scores reflected better alignment.

Extra Credit

As stated above, a gradient-intensity image was also used to score image alignments. This method was necessary to achieve sharp alignments, whereas the alignments using only the pixel-difference measures tended to leave slight artifacts. For some images, alignment by gradient-intensity failed, so pixel-difference was retained as a partial measure. The images for which gradient-intensity failed tended to lack distinct edges and were dominated by natural terrain and textures, leaving little, useful gradient-intensity information.

There was also an automatic crop feature that gave reasonable results in most cases. It worked by evaluating the edge pixels against certain thresholds, and if more than a certain percentage of the pixels in an edge row or column fell outside the threshold range, they were removed. The feature also allowed for a certain number of "skips," relative to the size of the image, to account for color inconsistencies.

CS129 Project 1: Image Alignment with Pyramids

Reese Kuppig (rkuppig)

Algorithm

Image Pyramids

Alignment Scoring

Extra Credit

Results