CS 143 / Project 2 / Local Feature Matching

Notre Dame cathedral vs. ground truth data

My local feature matcher performs like a boss. On the given notre dame cathedral images, it finds 50 correct matches and 3 bad matches (94% dayyummm).

Interest point detection

I followed the textbook/lecture slides to a T when implementing interest point detection. The only possible deviation is my implementation of nonmaxima suppression. I run a sliding window over every pixel of the image. If that pixel's cornerness is the maximum within the window, then I take the interest point. Otherwise, I throw it out. I found that smaller windows worked pretty well, and settled on a 5 by 5 window.

In particular, the smaller window for nonmaxima suppression allows each window of the circle in the center to be a separate feature, and is matched independently.

SIFT Descriptor

I follow the basic SIFT descriptor algorithm for calculating features. I used the gradient function to retrieve the x and y derivatives of the image. Then, I loop over all interest points detected by part 1. For each interest point, I look at the window of size feature_width and divide it into a bunch of cells. For each cell, I iterate through the cell's x and y deriviative values, calculating the direction of the gradient and its magnitude for each pixel. I bucket the magnitudes based on the gradient.

At the end, after calculating all the features, I raise each element to the power 0.6.

Feature matching

For feature matching, I simply use the ratio test (equation 4.18 in section 4.1.3 of Szeliski). The confidence that I return is 1 - the ratio calculated by equation 4.18.

My nonmaxima suppression

Here's what my nonmaxima suppression looks like.


% Non maxima suppression
rad = 2;
for r = 1:dim(1)
    for c = 1:dim(2)
        sub = cornersM(r:r+2*rad-1, c:c+2*rad-1);
        m = max(max(sub));
        if cornersM(r+rad,c+rad) < m
            cornersM(r+rad,c+rad) = 0.0;
        end
    end
end

Results in a table

I tried my feature matcher on a few images with varying results.

The image of Sacre Coeur worked very well. The images were from similar perspectives, so many of the interest points and associated gradients would be similar. Almost all the matches seem to be correct.

In contrast, for the Statue of Liberty, I used two images of very different perspectives. Only one feature was matched, the top of Lady Liberty's torch. This makes sense as it would have roughly the same SIFT descriptor regardless of where around the base you're looking from. While my feature only found 1 correct match, it didn't make any mistakes. (100% accuracy, what what).

With the Pantheon in Paris, I again got great results. The distinctive interest points on the pediment of the building provided lots of easy, correct matches. Additionally, both images contained a view of a building to the left of the Pantheon. The feature matcher was able to match elements of this building as well, despite the building having few corners / interest points.