CS129 Project 6: Automatic Panorama Stitching

The algorithm consists of 4 distinct operations, though most of the emphasis here will be on the first two.

First, interest points are identified in each of the two images, and correspondences between points are deduced. The final sets of corresponding interest points will serve as the basis for extracting the transformation matrix that maps points in one image to those in the other.

Interest Point Correspondence Extraction

Identify interest points in each image, using the Harris interest point detection algorithm.
Perform adaptive non-maximal suppression (ANMS). ANMS inspects an expanding radius around each interest point, searching for the nearest 'strong' neighbor point. The intuition for ANMS is that a cluster of points will be more prone to correspondence error than an isolated point, so we give priority to the strong isolated points in choosing our candidates for the correspondence calculation. In addition, since ANMS attempts to defer to nearby stronger points, the final set of candidate points is much more evenly distributed about the image.
Extract image features from a region around each interest point, to create a compact representation of the interest point. In this case, the image feature descriptor is a simple 8x8 image subsampled from the region centered at the interest point.
Compare the image features extracted at each image patch of the different images to find the most similar interest point. By comparing the image features of each interest point to those in the other image, we hope to find a very close match, which would suggest corresponding interest points.
Finally, perform the Lowe ratio test to remove erroneous matches from the final correspondence set. The Lowe ratio test is based on the assumption that, on average, the second most-similar interest points from the above feature comparisons are not valid matches, therefore, by taking the average ratio between the error of the first match and that of the second match for each interest point, we can define a reasonable threshold for rejecting likely false correspondences. Any correspondences that fall below the threshold are removed form the final set.

Second, the algorithm attempts to recover the transformation matrix that maps points in the second image to those in the first image, so that the two images may be properly aligned. The RANSAC algorithm is used to achieve this goal: over thousands of iterations, RANSAC randomly samples 4 correspondence pairs and attempts to recover the transformation matrix from those four pairs, calculating the distortions necessary to warp the points in one image to those in the other. Each transformation matrix is then tested on all of the correspondence pairs, and the total number of inliers, points that mapped correctly within a half-pixel tolerance, are counted. At the end of RANSAC, the matrix with the most inliers is returned as the best estimate of the transformation matrix.

RANSAC and Transformation Matrix Reconstruction

For each of 100,000 iterations, randomly sample 4 correspondence pairs.
Use these pairs to generate a system of equations to solve for the 2D transformation matrix by least squares, mapping the four points in the second image to the 4 corresponding points in the first image.
Use the proposed transformation matrix to map all points in the second image to those in the first, and measure the distance error between the transformed point location and the actual correspondence point location. Count any points that map to within a half-pixel of the actual correspondence point as inliers.
Return the transformation matrix that yielded the most inliers.

In the third and fourth steps, the second image is warped using the recovered transformation matrix and composited with the first image by aligning the correspondence pairs and blending the images in the overlap region.

Rotation Invariant Feature Descriptors

In the baseline algorithm, the region around each interest point is characterized by an 8x8 subsampled image of the region. However, between images it is likely that the region around a certain interest point will not maintain the same orientation. To counteract this problem, it is possible to modify the method for extracting the 8x8 image features so as to create rotation invariance. To achieve this, each initial patch around an interest point is blurred and filtered with a sobel filter in both dimensions, yielding horizontal and vertical gradient response images. Finding the arctan of each pixel's gradient response yields a gradient angle, and by averaging the gradient angles of all pixels in the patch, the overall patch gradient direction is retrieved. The sample region is then rotated by this angle, and a patch oriented to the local gradient is extracted. By this method, patches extracted at the same interst point in different images should appear more similar, regardless of global rotation, because each patch will be oriented to the same local gradient. The scenes that were tested in the results section exhibited relatively small rotations between images, so image features performed well even without the added rotation invariance.

This image was especially difficult to align properly. The left panorama was formed using the baseline features, while the right panorama was created with the rotation-invariant features. Here, the close perspective of the scene produced noticeable rotations between images, so the effects of the rotation-invariant features are more noticeable.

Results

The results of the automatic panorama generation are displayed below. The individual images are shown first, followed by a sample of the correspondence points and then the final assembled panorama. The algorithm performed best for landscape images, taken from a great distance, as distortion between images was low and 3D perspective effects were negligible. Due to the limitaions of the 2D projective transformation, scenes that exhibited strong perspective changes between images performed much worse. Also, due to the fact that the only useable correspondence points were extracted from the overlap region, misalignments increase at the outer edges of the individual images, as any acceptable errors in the overlap region become more severe as they move away from that region.

CS129 Project 6: Automatic Panorama Stitching

Reese Kuppig (rkuppig)

Algorithm

Interest Point Correspondence Extraction

RANSAC and Transformation Matrix Reconstruction

Rotation Invariant Feature Descriptors

Results