Example of feature matching between two pictures of Notre Dame from different viewing angles.
In this project we created a feature detection and matching algorithm, based off of Harris Corner Detection, SIFT Local Feature Detection, and Nearest Neighbors Ratio Test for Feature Matching. We can break down feature detection and matching into three steps, which we have mentioned above and will go into finer detail below:
Example of Feature Detection using Harris Corner Detection.
For this part of the algorithm I implemented the Harris Corner Detector. For this detector, the first step was a blur with a gaussian filter. I then created the x-derivative and y-derivative of the image, Ix and Iy respectively, by convolving with a sobel filter in the appropriate direction. After filtering these two derivatives with a gaussian filter, I then calculated a second moment matrix, which contained the average values for the square of Ix, Iy, and Ix*Iy in a 2x2 matrix. The next step in the algorithm was to calculate the 'cornerness' score, which was done by calculating det(A) - a*trace(A)^2, where a is a constant equal to .05 in my program. Accepted values for a range from .04 to .06 for effective Corner Detectors. The final steps include threshholding and non-maximum suppression. For thresholding, I have a constant threshold value that I only consider cornerness for values greater than that value. I will go more into depth into how that value affects my results later on. For non-maximum suppression, I used Matlab's colfilt function with the max function to use a sliding window accross the image and find local maximum values, and then only keep those local maximum values. These resultant values made up my detected features. One other design choice I made was to suppress/ignore corners within feature_width of the edge of the picture.
For this part of the algorithm I implemented a baseline SIFT feature detector. Basically, the feature detector is supposed to be scale-invariant and rotationally-invariant. For this detector, I first separated each feature into a set of 4x4 = 16 cells based on the feature_width. In each of these cells, there is an 8-dimensional orientation histogram. To calculate this histogram, I first convolved with a gaussian derivative filter in the x and y directions to get me Ix and Iy (I found the gaussian derivative filter gave me better results than the sobel filter in this case, but the opposite was true with feature detection). Each pixel then voted for a bin in the orientation histogram based on its value in Ix, Iy, and the sum of this. The magnitudes of these were then added to the corresponding bin, of which there was +x, -x, +y, -y, and all four quadrants for the sum. At the end of these calculations, the resultant 128-dimensional vector was normalized, then all values were thresholded to below .2, then re-normalized. This is to limit the effect of brightness differences between images. These feature descriptions were then passed along to the next part of the algorithm.
The final part of this algorithm was to compare features and keep matches based on the Nearest Neighbors Ratio Test. In order to implement this I first calculated the distance between each pair of features (based on their 128-dimensional vectors). I then kept track of the nearest neighbor and the second-nearest neighbor. If the ratio between the nearest neighbor and the second-nearest neighbor was less a certain threshold, it was considered a good match. Otherwise, it was thrown out. The smaller this ratio, the better a match was considered to be. Therefore, the confidence of each match was determined by the inverse of this value. For my project, I ended up using around .7 as my threshold, as this gave me the best results.
Most of the improvements I noticed in this project came through changing parameters and threshold values. For instance, I already mentioned using a gaussian filter versus a sobel filter changed my results. I also noticed that changing the values for sigma and the sizes of my filters also changed the results. The biggest results I found were changing threshold values. The table below shows an increasing thresholding value (from left to right row by row). What this produced was a better ratio between good and bad results as long as but some good results were also lost. I went from 56% good results to 74%.
![]() |
![]() |
![]() |
![]() |
The next value I decided to vary was the Feature Matching threshold. I started at .8 and lowered it down to .6. Combined with a high threshold for feature detection, I went from 74% good matches to 95% good matches, though my features were limited to approximately 20 features due to the high Feature Detectino threshold. You can view the results below in the table in order of decreasing Feature Matching threshold (from left to right row by row).
![]() |
![]() |
![]() |
![]() |
Not pictured in the table are tests of lower Feature Matching thresholds with higher Feature Detection thresholds. While this did improve my results from earlier, giving me consistenly closer to 75% - 80% good matches, it also resulted in a lot of matches getting thrown out. It seems like around .65-.75 there were a lot of ambiguous matches: some were good, and some were bad. Having additional tests for feature matching, such as some sort of geometric test, would have helped I think. Below is a table of other results. Each picture has one feature matched to another feature, based on the matched feature with the most confidence. It can be seen that my algorithm can find and match features on multiple pictures, and not just Notre Dame.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
All of my algorithms were baselines. I implemented a baseline Harris Corner detector, a baseline SIFT descriptor, and a baseline Nearest Neighbor Ratio Test Matching Algorithm. With more time I would have liked to have improved these. However, the rate of success of the baseline algorithms across multiple examples shows the potential for good, accurate feature matching. Overall, I think my feature matching implementation does a good job, which is supported by the data I have collected.