CS
129 Project 1:
Image Alignment with Pyramids
Daniel Moreno (damoreno)
September 17th, 2012
I have implemented the automatic alignment algorithm in C++.
It features two different alignment metrics: SSD (sum of
squares differences) and MI (mutual information). The software is
essentially multi-scale but it can be run in single-scale mode with a
command line option. Additional options allow to crop he image by a
fixed margin and to perform automatic white balance.
Metrics
SSD is the simplest metric available. The idea is to compare each pixel
in one image with their correspondences in another image just by the
square difference of their values.
MI is defined using entropy: MI
= h(img1) + h(img2) - h(img1, img2).
In this case the idea is that the joint entropy h(img1,
img2) will be minimum when
the images are perfectly aligned because one image can be easily
predicted from the other. When the joint entropy is minimum , MI
reaches its maximum.
The entropy is defined using probabilities: h(img)
= -P(img) log P(img).
More details can be found in "Alignment
by Maximization
of Mutual Information" by
Paul Viola. In my implementation I have approximated the probabilities
using intensity histograms. In the case of the joint entropy we need
the 2-dimensional joint histograms of both images.
After experimenting with both metrics my conclusion is that MI is more
robust and it performs better or equal than SSD. I say that SSD is less
robust because it its accuracy is affected by the noise usually present
at the borders of the test images, whereas MI seems mostly insensible
to them. To mitigate this problem, all the images are
temporally cropped while running the optimization loop so that the
metrics only consider the main content of the images.
Usually the difference in alignment between SSD and MI is a couple of
pixels. The following is an example where MI completely fails, given
the crack on the image, but MI correctly align without further issues:
SSD |
MI |
 |
 |
Multi-scale search
In multi-scale mode a pyramid is built for each of the three images.
The first level in the pyramid has the original image and additional
levels are built halving the resolution of the previous level. The
number of levels is automatically selected to assure that the smallest
image has a width of at least 128 pixels. The search begins in the top
level of the pyramid with an extended search window where all the
possible translations are evaluated (brute-force). The search in the
next levels begin with the solution of the previous level and
the search window drastically reduced to consider only a 3x3
neighborhood. Each level has double resolution than the previous so the
solution of one level is easily transferred to the next on by doubling
is pixel dimensions. A typical alignment of a high-resolution image
(e.g. 3741x3228 pixels) takes around 8 seconds.
White-balance
White-balance is performed as a post-processing step, after alignment,
to improve the color quality of the images. The algorithm searches the
maximum of each color channel and maps that point to white (RGB = 255,
255, 255) by scaling each channel independently.
Normal |
White-balanced |
 |
 |
Results Images
Other experiments
The described algorithm gives good results in all the images I have
tested, however, when zooming them in one can see that some regions are
not perfectly aligned, specially close to the borders. The first guess
is that an integer pixel displacement might not be the best model to
match the different channels. In an effort to improve this issue, I
implemented the complete algorithm described in "Alignment by Maximization
of Mutual Information" by Paul Viola. This algorithm maximizes
the MI using stochastic hill climbing with its gradient. The algorithm
involves estimating probability densities with the Parzen Window method
in order to find the MI gradient from the current transformation.
I was attracted with this idea because it can easily be used with more
complex transformations. In my case I was using an affine matrix to
transform coordinates from one image to the other with subpixel
precision.
Despite that the algorithm works, the result does not look much better
than the previous I had. Maybe a linear transformation is not enough to
model the transformations or maybe extra testing is required.