CS 143 / Project 2 / Local Feature Matching

At its current settings, my SIFT pipeline implementation gets 72 correct matches and 6 incorrect matches.

The goal of the project was to create a local feature matcher by implementing 3 key parts of a SIFT pipeline: feature detection, feature description, and feature matching. The algorithms for each part, respectively, were: a Harris corner detector, a 128-dimensional SIFT descriptor, and NNDR (nearest neighbor distance ratio test).

Interest Point Detection

The Harris corner detector was used to find interest points. This involved first blurring the input image, finding the blurred image's gradients in the horizontal and vertical directions, and then building the components of the har equation from the two gradient images. Each component of the har equation was blurred with a gaussian before being used in any calculations. This was an important step, and the cutoff frequency of the gaussian (the sigma argument in matlab for fspecial), is a crucial parameter in the sift pipeline. If it is set to anything above 1, then the bwconncomp() threshhold does not detect as many distinct connected regions (regions that should not be connected are blobbed together). This equates to fewer overall corner detections, as only the maximum of each connected region is considered an interest point; fewer connected regions directly translates to fewer maxima, and thus fewer interest points. After har was calculated, the har matrix was put through a threshold, and then bwconncomp was ultimately used to find the connected regions of the image. A maxima was found for each region, and the indices of each maxima were put into the x and y lists that were then output to the next step in the pipline. The maxima represented the point of most interest in the connected region (the highest value in har for the region).

At cutoff = 2 for the gaussian that was applied to the har components, the white blobs (connected regions) are very large and imprecise.

At cutoff = 1 for the gaussian that was applied to the har components, the white blobs (connected regions) are much smaller and more precise. This results in more interest points detected.

Local Features Description

The SIFT feature descriptor first takes the input image and makes 8 versions of it, each representing a different gradient direction magnitude. Every detected interest point near the image boundary (I chose a value of less than 10px from the edge) is suppressed at this stage, as many of the points were deemed spurious artifacts from the initial gaussian blur in the interest point detection step. Then, for each interest point, a window (feature_width x feature_width) around the point is calculated in each of the 8 gradient direction images. This window is divided into 16 equally sized cells, and the mean is taken of each cell. So the 16 means describe 1 direction of the point, and there are 8 directions total. Thus, the direction description vector is appended to the direction description vectors of all the other images, creating a 128-dimension vector (16 x 8 = 128). Each 128d vector is then normalized, thresholded by raising the vector to a power less than 1 (0.4 was found to be a good, safe value through experimentation), normalized again, and added to the features list. Raising the vector to 0.4 proved to be extremely useful in getting rid of incorrect matches, and lowering the ratio (increasing the confidence) of many good matches. The complete features list is then sent to the next and final step of the pipeline, feature matching.

Feature Matching

The nearest neighbor distance ratio test was implemented for the feature matching algorithm. Two features were considered 'matched' if their ratio was under the NNDR_cutoff. The NNDR_cutoff has a tremendous impact on how many features were matched. A cutoff with relatively few incorrect matches was anything below 0.65 (relatively few meaning ~5-10% incorrect matches). Anything higher returned many more correct matches, but also many more incorrect matches. This is a tradeoff that the NNDR algorithm must make (quantity vs. quality). I did not return a set amount of matches, because two images may not have any matches. Thus, it would be fairly useless to return the 'top 100' matches, when even the lowest ratio of the NNDR test for the image pair may be over 0.9 (and thus very likely to be incorrect). The confidence of each match was calculated by dividing 1 by the NNDR ratio. Thus, the lower the ratio, the higher the confidence.

Results

The exact same parameters throughout the SIFT pipeline were used for all images, including the original Notre Dame images. Thus, the general performance of the pipeline can be better evaluated, rather than just evaluated for a heavy fitting to the Notre Dame image pair.

This Parthenon-type building came out pretty well. As you can see, it has a very high correct:incorrect feature match ratio.

This one, not so much. There is too much transformation of the building.



The Disney castle got only a few matches, but almost all of them were correct.



Both pairs of images of the same house did well. There were few incorrect matches, and many correct matches.



These features may be all too familiar to you...