To create photo panoramas with a normal camera, it is necessary to capture multiple images with overlapping regions in each. To composite these images, key points in the overlap region are identified and aligned, likely requiring stretching and skewing, to account for the changes in perspective. This algorithm accomplishes these steps automatically, first identifying interest points and then aligning those points with a 2D projective transformation.
The algorithm consists of 4 distinct operations, though most of the emphasis here will be on the first two.
First, interest points are identified in each of the two images, and correspondences between points are deduced. The final sets of corresponding interest points will serve as the basis for extracting the transformation matrix that maps points in one image to those in the other.
Second, the algorithm attempts to recover the transformation matrix that maps points in the second image to those in the first image, so that the two images may be properly aligned. The RANSAC algorithm is used to achieve this goal: over thousands of iterations, RANSAC randomly samples 4 correspondence pairs and attempts to recover the transformation matrix from those four pairs, calculating the distortions necessary to warp the points in one image to those in the other. Each transformation matrix is then tested on all of the correspondence pairs, and the total number of inliers, points that mapped correctly within a half-pixel tolerance, are counted. At the end of RANSAC, the matrix with the most inliers is returned as the best estimate of the transformation matrix.
In the baseline algorithm, the region around each interest point is characterized by an 8x8 subsampled image of the region. However, between images it is likely that the region around a certain interest point will not maintain the same orientation. To counteract this problem, it is possible to modify the method for extracting the 8x8 image features so as to create rotation invariance. To achieve this, each initial patch around an interest point is blurred and filtered with a sobel filter in both dimensions, yielding horizontal and vertical gradient response images. Finding the arctan of each pixel's gradient response yields a gradient angle, and by averaging the gradient angles of all pixels in the patch, the overall patch gradient direction is retrieved. The sample region is then rotated by this angle, and a patch oriented to the local gradient is extracted. By this method, patches extracted at the same interst point in different images should appear more similar, regardless of global rotation, because each patch will be oriented to the same local gradient. The scenes that were tested in the results section exhibited relatively small rotations between images, so image features performed well even without the added rotation invariance.
This image was especially difficult to align properly. The left panorama was formed using the baseline features, while the right panorama was created with the rotation-invariant features. Here, the close perspective of the scene produced noticeable rotations between images, so the effects of the rotation-invariant features are more noticeable.
The results of the automatic panorama generation are displayed below. The individual images are shown first, followed by a sample of the correspondence points and then the final assembled panorama. The algorithm performed best for landscape images, taken from a great distance, as distortion between images was low and 3D perspective effects were negligible. Due to the limitaions of the 2D projective transformation, scenes that exhibited strong perspective changes between images performed much worse. Also, due to the fact that the only useable correspondence points were extracted from the overlap region, misalignments increase at the outer edges of the individual images, as any acceptable errors in the overlap region become more severe as they move away from that region.