CS 143 / Project 2 / Local Feature Matching

The original image

Interest point detection

To get interest points from image gradient, it is better to filter with gaussian (sigma=2) first to be robust to noise.

And then I obtain vertical and horizontal gradient with the classic filter from Harris corner [-2 -1 0 1 2].

And their combinations filtered again with gaussian (sigma = 4): g(Ix^2) g(Iy^2) g(IxIy).

With those I can calculate the Harris corner detection response, thresholded with 0.005:

Apply filter to obtain local maxima and find connected components to obtain actual interest points:

Local feature description

SIFT descriptors calculate histograms on 8 directions. We can use steerable filters to obtain those efficiently. First we create the vertical and horizontal difference of gaussian filters, and use combinations to create 8 steerable filter responses.

I store these 8 responses as a 8-channel image to get efficiency for filtering operation.

Then obtain the weighted reponse with a gaussian filter with sigma equal to half of the feature width:

For each interest point, we get the local image patch and calculate the actual SIFT descriptors. This is done efficiently by downsampling the 8-channel 16x16 image patch to 8-channel 4x4 image with interpolation, which has each pixel representing a histogram bin.

Feature matching

Instead of naively finding the distances of all correspondences which has time complexity O(n^2), I use k-d trees (O(n log n)) to find out the nearest neighbors of particular interest points and calculate their nearest neighbor distance ratio.

With NNDR = 0.5 and correspondence uniqueness enforced (see below), we get 57 good matches and 1 bad matches (98.3% accuracy):

Discussion

With the above approach, there could be multiple interest points matched to one unique interest point. We can enforce the uniqueness of correspondance and eliminate such redundant correspondances. This can improve accuracy, which will be showed below.

We can furthur reject among the found correspondances the most x% inaccurate ones. This can also possibly improve accuracy.

I also evaluate the effect of different nearest neighbor distance ratios (NNDR) on accuracy.

Note that when NNDR <= 0.3, there are too few samples to faithfully evaluate the accuracy, as seen in the plot: the accuracy is 100% when NNDR = 0.3 (less than 10 samples).

This result shows that enforcing uniqueness of correspondance can improve accuracy for any NNDR, while elimiating part of the bad correspondances only improves accuracy when there are many incorrect matches.

Other results

Note that I didn't implement rotation-invariance. So results here do not include rotated image pairs.