Humans have a very wide field of view, where cameras usually do not. In order to obtain images that are wide angle, we must combine multiple photographs into a single image.
Problems can arise, however, when the images do not line up perfectly so that we cannot just place them side-by-side and obtain a perfect photograph. We must transform the image, align them, and then combine them to make a panorama. This purpose of this project was to automate the process of finding interest points in the images, comparing and matching them, finding the best transformation for the images, aligning them, and finally, combining them.
The steps of creating a panorama are as follows: find the interest points, sort the interest points, extract features from these points, match the features between images, find the transformation that aligns the features, and finally combining the images. To obtain interest points, we used the Harris Interest Point Detection algorithm.
The Harris algorithm returns all the interest points, whereas we only need a few that are well distributed over the images. We used adaptive non-maximal suppression to bring down the number of interest points. This involved looping through all the interest points, comparing their corner strengths to each of the other interest points' corner strengths, and tracking the minimal distance to an interest point with a larger (or 90% as large) corner strength. Then, we took N interest points with the largest distances between them.
Feature extraction involved obtaining an 8×8 patch by sub-sampling a 40×40 grayscale window around the interest point. Then, this feature patch needed to be normalized by subtracting from it its mean and dividing by its standard deviation.
To match features, we calculated the distance between pairs of features in the two images. However, we could not just take the best nearest-neighbor points since many correspondences are spurious. Instead, we took the points whose ratio of first nearest neighbor to second nearest neighbor was below a threshold (I used 0.2).
Once we have matching interest points, we must then recover a transformation that aligns the images correctly according to their corresponding interest points. The transformation was calculated by using 4 corresponding pairs of interest points (so 8 points in total) and solving a matrix of a equations that would yield a 3×3 transformation matrix.
Because we may still have non-corresponding points, we used the RANSAC algorithm to expose and expel false positives. The RANSAC algorithm calculates a large number of transformations (>1000), and compares all the transformed interest points to their corresponding point in the other image to a certain threshold. It keeps track of which transformation had the most points fall below that threshold, and uses it as the final transformation.