CS129 / Project 6 / Automated Panorama Stitching

In this project, part of a panorama stitching pipeline is implemented. Automatic panorama stitching is the process of taking a set of photos with overlapping fields of view and producing a panorama based on the input images.

In general, there are four stages in the automatic panorama stitching pipeline: 1. detect and optionally filter interest points, which are points that are relatively more distinguishable in the two input images, 2. extract feature descriptors for each of the interest points and establish correspondence among features from two input images, 3. recover the homogaphy, that is, a matrix that transforms one of the image to align with the other, 4. warp one image using the recovered homography and composite the two inputs.

In this projects, implementations for interest points detection and image warping and basic composition have already been given. The write-up discusses:

  1. interest points filtering using adaptive non-maximal suppression
  2. rotation-invariant feature descriptor extraction and correspondence discovery
  3. Homography reconstruction using RANSAC.
  4. Min cut seam finding and poisson blending on top of the basic composition to produce more seamless result

Visualization of the interest points correspondence

Filtering interest points using adaptive non-maximal suppression

The given implementation uses Harris corner detector. After obtaining the interest points computed by the Harris corner detector, we would like to further limit the number of interest points that are passed down to the next stage as the cost for matching is superlinear in the number of interest points. One good criterion for interest points filtering is to pick a subset of the interest points that are spatially well distributed over the image. To achieve that, the adaptive non-maximal suppression (ANMS) strategy is used. The intuition behind this strategy is that we would favor points that have larger strengths than its neighbors in a larger area. In the implementation, the algorithm iterates through each point v_i, computes the distance between it and each of its neighbor v_j that satisfies 0.9 * v_j > v_i, and records the minimum distance found during this process for each point. Then, the points are ranked based on the minimum distance computed from the previous step. If for a particular pair of input, there are more than 1200 interest points, then only the top 1200 are taken.

Extract rotation-invariant feature descriptors and establish correspondence

After obtaining a list of interest points, the next step is to create a feature descriptor for each of the points that describes the local image structure around each point. In this project, the feature descriptors are patches centered at each interest point. The patches are obtained by sampling pixels at a lower frequency, and are therefore less sensitive to the exact location of the features. Also, they are oriented based on the dominant gradient and are thus rotation-invariant.

The algorithm loops through all the interest points, and for each of them, a patch of size 40 x 40 is obtained. Then, the gradients on eight directions are computed for that patch. The patch is then rotated so that the direction of the dominant gradient (the largest gradient) is always upward. This way, the feature descriptor will be insensitive to rotation. The examples below show that with the rotation-invariant feature descriptor, the algorithm is able to establish the correspondance between points even if one of the image is rotated. Without taking into account of the orientation, the correspondence algorithm that follows is only able to find a couple corresponding points, and most of which are wrong.

After adjusting the orientation of the patch, the patches are then iteratively filtered and downsampled twice into 10 x 10 patches. Finally, they are resized into 8 x 8 patches, normalized and returned as results.

The final step in this stage is to establish correspondence between the features from the two images. To do that, the distances to each feature in image B is computed for each feature in image A. Also computed for each feature in image A is the ratio between the distance to the best match and the distance to the second best match in B (the 2-NN ratio). Then, the interest points from image A that have 2-NN ratio < 0.6 are preserved, and returned with the corresponding points from image B.

Correspondence without rotation invarianceCorrespondence with rotation invariance

Homography reconstruction

The homography that maps the first image into the space of the second image is computed by solving a system equation as documented in the provided reference on projective mapping. As many of the corresponding interest points might still be false positive, RANSAC algorithm is used. In the implementation, the program goes through 1000 iteration, selecting 4 pairs of points at random each time to generate a homography and computing the number of inliners (points in image A that, after being transformed using the computed homography, are mapped to its corresponding point in B with a squared difference error smaller than 0.5). Finally, the homography with the highest number of inliners is picked and returned for image warping.

Composition Results

(The input images can be found here.)
Stitched panorama

Applying seam finding and Poisson blending in composition

To make the composition more seamless, seam finding is applied when stitching each pair of images to find the boundary within the overlapping area of the two images that can minimize the squared difference between the two source images. Also, Possion blending is applied to further smooth the area near the boundary.

Enhanced image composition

Basic compositionBasic + seam findingBasic + seam finding + Poisson blending