This project implements a simple local feature matching algorithm. It is a simplified version of the SIFT pipeline. The algorithm is intended to work f or matching different views of the same physical scene.
On the right is a picture showing the result of running this algorithm on two images of the Notre Dame Cathedral. Correct matches are shown in green circles, while incorrect matches in red.
The pipeline in this algorithm consists of three different stages: detecting interest points, local feature description, and feature matching. In the sections below the implementation details about each stage is presented, along with results from applying the algorithm to several different sets of images.
The program uses a basic Harris corner detector. Scale invariance and keypoint orientation are not implemented, so when the algorithm is applied to images of noticeably different scales or different orientations, performance is poor. It does relatively well on images with small scale and orientation differences.
The Harris corner detection algorithm consists of two steps: computing the cornerness score, and non-maximum suppression. Cornerness score is computed using image gradients as detailed in Szeliski 4.1.1 and lecture slides. Non-maximum suppression in this code works by sliding a window of 11 x 11 pixels across the image to find the pixel with maximum cornerness score in a neighborhood, and suppressing any other points in the neighborhood.
The local feature descriptor in this algorithm approximates the one described in the SIFT paper. For each interest point, it considers a 16 x 16 (by default) neighborhood of the point. It divides the neighborhood into 16 4 x 4 cells, and uses a histogram with 8 bins to store weighted gradients in each cell. Finally, these 16 * 8 = 128 values are stored in a vector, normalized, and used as the feature descriptor for this interest point. The bins used here are 8 bins radially dividing the 2-D plane into 8 equal parts. The value of each bin is then decided by adding the magnitudes of gradients that fall into this bin together.
Feature matching in the program employs the "ratio test". For each feature in image 1, the nearest neighbor distance ratio is computed with respect to all features in image 2. Then if the ratio is less than a threshold value (0.8), the nearest neighbor of the feature is considered a match, and included in the result.
For the Notre Dame Cathedral image pair, this algorithm's finds 79 good matches and 64 bad matches (approximately 55% precision rate). For other image pairs, depending on the set-up of the images, the algorithm can have good or bad performance. Generally speaking, as mentioned above, image pairs with alrge scale or orientation differences are harder to match for the algorithm since it doesn't take these parameters into account. In contrast, image pairs that are similar in scale and orientation are easier to match.
Detailed results are listed below with comments:
![]() |
![]() |
House
Good performance. It's easy to see that a big majority of the matches are correct matches, while only few of them are incorrect. This is because the two images are similar in orientation and scale.
Sleeping Beauty Castle Paris & Pantheon Paris
Poor performance. These two image pairs exemplify different cases in which the algorithm fails to give good matches. The Sleeping Beauty Castle images have very different perspectives (orientations), while the Pantheon images have very different scales. Although there are lots of interest points (and correspondences) in both image pairs, most of them are incorrect.
![]() |
![]() |