The objective of this project is find the local features matches in two images. The baseline of this project follows a simplified version of SIFT pipeline, which consists of three steps. The first step is to implement a Harris corner detector to get the corner points and then find the local maxima of all these points. The second step is build SIFT descriptors for the interesting points from the first step. The final step is to find all matched point pairs by running Ratio Test Algorithm.
In this part, I basically follow the Algorithm 4.1 in the textbook. I first compute the derivatives of Gaussians and convolve it with the x coordinates and y coordinates of the image to get the derivatives of Ix and Iy. And Then I compute the products of these gradients and filter them with a larger Gaussian filter. With these products, I calculate the interesting point quantity proposed by Harris and Stephens (1988).
Before doing a local non-maximal suppression, I use a threshold to filter the results. Then I run a local non-maximal suppression by using a 3 by 3 sliding window to eliminate non-maximal interesting points inside the window. After that, I also tried the adaptive non-maximal suppression to gain a better distribution of the points.
The purpose of adaptive non-maxima suppression is to mitigate the problem that interesting points in some region of the image are denser than other regions. It tries to evenly distribute the interesting points while still keeps the stronger ones. In order to do this, for each insteresting point, a largest radius of area where the corner value of point is more than 10% than other interesting points should be computed. However, this cause a performance issue in the implementation of this algorithm because it requires n X n comparisons. To speed it up, we can first sort the interesting points by its corner values. Thus, each key point only need to be compared with other key points with larger corner values. In matlab, utilizing martrix operations instead of for loop also boosts the speed.
The ANMS turned out being unable to increase the final feature match accuracy. I think the reason is that this method do gain a better distribution of interesting points which may contribute to RANSAC matching. But it also eliminate some of strong points which contains better information that weak ones it preserves.
![]() |
![]() |
![]() |
All Interesting Points | LNMS | ANMS(Top 1000) |
After detecting the key points, we need to build a descriptor to describe the local feature around the key point. A SIFT descriptor stores the information about gradient magnitude and orientation. Moreover, it should take scale and image orientation into account for a robust feature matching system. In this project, I didn't implement scale selection and orientation normalization due to the time allowed. Except for that, I follow most of the SIFT decriptor building process described in Distinctive Image Features from Scale-Invariant Keypoints by David Lowe, including interpolation. So the descriptor is created from Gaussion weighted gradients in a 16 by 16 window with a interesting point at the center. The window is divided into 4 X 4 cells and each cell has a 8-dimension histgram indicating distribution of gradients in 8 orientations.
![]() |
![]() |
In David Lowe's paper, the interpolation is done by contribute one gradient to two nearest orientations and 4 nearest bins with weight factor of 1-d, where d is the distance from the point of the gradient to the center of the bin. The two-orientation contribution is done by resolving the vector to two nearest orientations in this project.To implement 4-bin contribution of one gradient, I first pad the local feature with zeros and then apply a filter whoes size is twice the size of a cell for each cell as shown in the following the figure.
![]() |
Black grid is the 4*4 celled divided from the local feature. The blue cell is the sliding window applied to each cell. |
By implementing interpolation, I went from 77% good matches to 83% good matches with basic Ratio Test matching for the Notre Dame images. In terms of first 100 most confident matches, the accuracy went from 93% to 96%.
Ratio Test is a fairly easy way to compute matched features. It compares all features in one image to those in the other by calculating their distances. It will select feature pair with shortest distance and it must be smaller than 3/4 the second smallest. The difference of the these two values is used as confidence to rank the best matched pairs. Because the larger the difference is, the more distinctive the matched pair is. Similar adaptive non-maxima suppression, it requires lots of comparisons. Matrix operation can also accelarate the speed here in matlab. Although this algorithm works pretty well, it can't eliminate feature pair with great geometric differences, which means they are actually different features.
RANSAC is a good way to eliminate outliners. However, running RANSAC alone to find the right transformation from one point set and another point set produces great computational overhead. Instead, we can optimize this by taking advantage of the result from Ratio Test and eliminate the outliners in the result. As shown in the following image, each time I randomly choose three matched pairs and then compute the affine tranformation A. With A we can make a transformation for the rest of the interesting points in the first image. By comparing the locations of the result with the original locations of matched points in the second image, we can filter the outliners with a threshold on the their geometric distances. In this way, only n-point comparisons need to be done each time. To avoid bad transformation computed by outliners, We can repeat such process a number of times and try to get largest matched feature set. Because bad transformation will result in fewer matched features.
![]() |
RANSAC: Affine Transformation |
By implementing RANSAC, I went from 83% good matches to 97% good matches with basic Ratio Test matching for the Notre Dame images. In terms of first 100 most confident matches, the accuracy went from 96% to 100%.
![]() |
100 good matches, 0 bad match |
![]() |
![]() |
![]() |