CS129 Project 6: Automatic Panorama Stitching

Reese Kuppig (rkuppig)

To create photo panoramas with a normal camera, it is necessary to capture multiple images with overlapping regions in each. To composite these images, key points in the overlap region are identified and aligned, likely requiring stretching and skewing, to account for the changes in perspective. This algorithm accomplishes these steps automatically, first identifying interest points and then aligning those points with a 2D projective transformation.


Algorithm

The algorithm consists of 4 distinct operations, though most of the emphasis here will be on the first two.

First, interest points are identified in each of the two images, and correspondences between points are deduced. The final sets of corresponding interest points will serve as the basis for extracting the transformation matrix that maps points in one image to those in the other.

Interest Point Correspondence Extraction

Second, the algorithm attempts to recover the transformation matrix that maps points in the second image to those in the first image, so that the two images may be properly aligned. The RANSAC algorithm is used to achieve this goal: over thousands of iterations, RANSAC randomly samples 4 correspondence pairs and attempts to recover the transformation matrix from those four pairs, calculating the distortions necessary to warp the points in one image to those in the other. Each transformation matrix is then tested on all of the correspondence pairs, and the total number of inliers, points that mapped correctly within a half-pixel tolerance, are counted. At the end of RANSAC, the matrix with the most inliers is returned as the best estimate of the transformation matrix.

RANSAC and Transformation Matrix Reconstruction

In the third and fourth steps, the second image is warped using the recovered transformation matrix and composited with the first image by aligning the correspondence pairs and blending the images in the overlap region.

Rotation Invariant Feature Descriptors

In the baseline algorithm, the region around each interest point is characterized by an 8x8 subsampled image of the region. However, between images it is likely that the region around a certain interest point will not maintain the same orientation. To counteract this problem, it is possible to modify the method for extracting the 8x8 image features so as to create rotation invariance. To achieve this, each initial patch around an interest point is blurred and filtered with a sobel filter in both dimensions, yielding horizontal and vertical gradient response images. Finding the arctan of each pixel's gradient response yields a gradient angle, and by averaging the gradient angles of all pixels in the patch, the overall patch gradient direction is retrieved. The sample region is then rotated by this angle, and a patch oriented to the local gradient is extracted. By this method, patches extracted at the same interst point in different images should appear more similar, regardless of global rotation, because each patch will be oriented to the same local gradient. The scenes that were tested in the results section exhibited relatively small rotations between images, so image features performed well even without the added rotation invariance.

normal.jpg panorama004.jpg

This image was especially difficult to align properly. The left panorama was formed using the baseline features, while the right panorama was created with the rotation-invariant features. Here, the close perspective of the scene produced noticeable rotations between images, so the effects of the rotation-invariant features are more noticeable.


Results

The results of the automatic panorama generation are displayed below. The individual images are shown first, followed by a sample of the correspondence points and then the final assembled panorama. The algorithm performed best for landscape images, taken from a great distance, as distortion between images was low and 3D perspective effects were negligible. Due to the limitaions of the 2D projective transformation, scenes that exhibited strong perspective changes between images performed much worse. Also, due to the fact that the only useable correspondence points were extracted from the overlap region, misalignments increase at the outer edges of the individual images, as any acceptable errors in the overlap region become more severe as they move away from that region.

source001_01.jpg source001_02.jpg
correspondence001.jpg
panorama001.jpg
source002_01.jpg source002_02.jpg
correspondence002.jpg
panorama002.jpg
panorama02_01.jpg panorama02_02.jpg
correspondence003.jpg
panorama003.jpg
panorama03_03.jpg panorama03_04.jpg panorama03_07.jpg
correspondence004.jpg
panorama004.jpg
file0001.jpg file0002.jpg file0003.jpg file0004.jpg file0005.jpg
correspondence005.jpg
panorama005.jpg
yosemite1.jpg yosemite2.jpg yosemite3.jpg
correspondence006.jpg
panorama006.jpg
0.jpg 1.jpg 2.jpg 3.jpg
correspondence007.jpg
panorama007.jpg
01.jpg 02.jpg 03.jpg
correspondence008.jpg
panorama008.jpg
A.jpg B.jpg C.jpg
correspondence009.jpg
panorama009.jpg
D.jpg E.jpg F.jpg G.jpg
correspondence010.jpg
panorama010.jpg