CS 143 Project

CS 143 / Project 2 / Local Feature Matching

Example of local feature matching for images of Notre Dame. 91 points were correctly matched; 29 points were incorrectly matched.

Overview

The goal of this assignment was to create a local feature matching algorithm as a simplified version of the SIFT matching pipeline. To do this, we first had to detect interest points using Harris corner detection. We then had to implement a SIFT-like local feature descriptor. Finally, we needed to match the features using the nearest neighbor distance ratio test.

Interest point detection

The Harris corner detector algorithm is as follows:

Compute the horizontal and vertical derivatives of the image.
Compute the three images corresponding to the outer products of these gradients.
Convolve each of these images with a larger Gaussian.
Compute Harris cornerness values.
Find local maxima above a certain threshold and report them as detected feature point locations

where the cornerness is the difference between the determinant and trace at a given pixel.

I calculated the derivatives using two 3-by-3 Sobel filters. The larger Gaussian I used to blur the outer products was a 25-by-25 Gaussian with standard deviation of 2.25 (for the Notre Dame pair). I determined the threshold by using the graythresh method, and used this threshold to convert the cornerness values from grayscale to black-and-white (i.e. real-valued to binary). I then ran bwlabel on the binary image to get connected components, and chose the pixels with the maximum cornerness value from each component as keypoint locations. I ignored any pixels for which a feature_width-by-feature_width window centered at that pixel would not fit inside the bounds of the image. The confidence value for each potential feature location is the cornerness value at that point.

Local feature description

Each feature consists of a 4x4 grid of cells, each feature_width/4, centered around an interest point. For each cell, an 8-bin histogram is formed, storing the local distribution of gradients.

The histogram is constructed by first calculating the magnitude and direction of the gradient of each pixel in the cell. The gradient magnitudes are then weighted using a feature_width-by-feature_width gaussian with standard deviation feature_width/2, to weight the gradients of pixels closer to the interest point more heavily than those that are farther away. The weighted magnitudes are then added to the appropriate bin, each with a width of 45 degrees, but are weighted again using linear interpolation (i.e. multiplied by (1-d), where d is the distance between the orientation and the center value of the bin to which it belongs.

This creates an 8-dimensional vector of weighted sums of gradient magnitudes for each cell, meaning that each feature is a 4x4x8 = 128 dimensional vector. This vector is normalized, values are capped at 0.2, and then the vector is normalized again.

Weighting the magnitudes of the gradients by a Gaussian roughly doubled the number of matches found, while preserving the percentage of good matches (i.e. true positives). Using linear interpolation seemed to decrease the percentage of good matches in the Notre Dame pair (maybe because of the differences in orientation of the interest points?), but seemed to help in other test cases where scale, but not vantage point, changed between images. Normalizing and capping the feature vector considerably increased the precentage of good matches

Feature Matching

Features were matched by computing the distances to all features in the second image for each feature in the first image. The ratio between the first and second closest features was used to determine whether the pair was a true match: if the distance to the first over the distance to the second was less than 0.75 (for the Notre Dame pair), then the closest feature in the second image was considered a match to the feature being considered in the first image.

The confidence used for pairs was 1 minus this ratio (i.e. 1-NNDR); the larger the distance between the first- and second-closest features, the more confident we are that the first is a true match.

Results