CS 143 / Project 2 / Local Feature Matching
Overview
I implemented the baseline feature matching algorithm as outlined in the project description. It consists of a simple Harris corner detector, a SIFT-like local feature extractor, and a Nearest Neighbor Distance Ratio Test matching function.
Performance on the provided Notre Dame image pair when run on a department machine: 88/100 good matches
get_interest_points / Corner Detection
My get_interest_points function followed the algorithm outlined on the lecture slides/in Szeliski/in the stencil code:
- Blur the initial image
- Filter to get x and y derivatives of the image
- Calculate the squares of the derivatives
- Convolve each of the three resulting values with a larger Gaussian
- Calculate the interest measure given in the lecture slides: har = g(Ix^2)g(Iy^2) - [g(IxIy)]^2 - alpha[g(Ix^2)+g(Iy^2)]^2
- Threshold the resulting image to get only points above a certain value, and produce a binary image using the Matlab function im2bw()
- Use bwconncomp() to identify connected components in the binary image
- Take the maximum value pixel from each of the connected components (in the non-binary image), and add that to the list of interest points
I found the max of each connected component as follows:
% for each connected component, find the pixel with max value
for c = 1:size(cc.PixelIdxList, 2)
[maxval, idx] = max(har(cc.PixelIdxList{c}));
real_idx = cc.PixelIdxList{c}(idx);
[idx1, idx2] = ind2sub(size(har), real_idx);
% suppress points around the edges of the image
if size(x, 1) < max_num_points && idx1 > edge_threshold && idx2 > edge_threshold...
&& idx1 < size(har, 1) - edge_threshold && idx2 < size(har, 2) - edge_threshold
x = [x; idx2];
y = [y; idx1];
end
end
get_features / Local Feature Creation
My implementation of get_features created SIFT-like descriptors for each detected interest point as follows:
- Blur the initial image and filter to get x and y derivatives
- For each interest point, grab a 16x16 window around the point and filter it (using the same larger Gaussian filter used to blur the squared image derivatives in step 4 of get_interest_points)
- Divide each of these windows into a 4x4 grid of 4x4 cells.
- For each 4x4 cell in the grid, create a gradient orientation histogram with 8 buckets: (+x), (+y), (-x), (-y), (+x+y), (+x-y), (-x+y), (-x-y)
- For each pixel in each cell, add the pixel's gradient values to three buckets of the histogram: add the x gradient value to (+x) or (-x) depending on the value of x; the y gradient value to (+y) or (-y) depending on the value of y; and the x+y (diagonal) gradient value to one of (+x+y), (+x-y), (-x+y), or (-x-y) depending on the values of x and y
- Append the histograms from each of the 16 cells in the window together to form a 128-dimensional feature, normalize the feature and raise its values to a sub-1 power, and append the feature to the end of the features list
Here's the code that distributes gradient values into the histogram:
% fill appropriate buckets of histogram
if x_gradient >= 0
% x and y pos
if y_gradient >= 0
histogram(pos_y) = histogram(pos_y) + y_gradient;
histogram(pos_x) = histogram(pos_x) + x_gradient;
histogram(pos_x_pos_y) = histogram(pos_x_pos_y) + diag_gradient;
% x pos, y neg
else
histogram(neg_y) = histogram(neg_y) + y_gradient;
histogram(pos_x) = histogram(pos_x) + x_gradient;
histogram(pos_x_neg_y) = histogram(pos_x_neg_y) + diag_gradient;
end
else
% x neg, y pos
if y_gradient >= 0
histogram(pos_y) = histogram(pos_y) + y_gradient;
histogram(neg_x) = histogram(neg_x) + x_gradient;
histogram(neg_x_pos_y) = histogram(neg_x_pos_y) + diag_gradient;
% both neg
else
histogram(neg_y) = histogram(neg_y) + y_gradient;
histogram(neg_x) = histogram(neg_x) + x_gradient;
histogram(neg_x_neg_y) = histogram(neg_x_neg_y) + diag_gradient;
end
match_features / Feature Matching
My feature matching algorithm simply implements the NNDR test in order to match features. It also uses the NNDR values as confidences; a lower NN1/NN2 ratio means a more confident match. Matches are thresholded to have a NNDR below .8 (this value taken from David Lowe's paper); any matches with a larger ratio are rejected. The algorithm is as follows:
- For each feature in features1, compare to all features in features2
- Keep track of the nearest neighbor and second nearest neighbor, where the nearest neighbor is the feature where the value sum(abs(feature1 - feature2)) is minimized
- After comparing to all features in feature2, if the nearest neighbor of the feature being matched has a NNDR greater than the threshold (.8), reject the match. Otherwise, add it to the list of matches, and add the NNDR value to the confidences matrix
- Sort and truncate so only the 100 most confident matches are returned
Here's the code that finds the nearest neighbors and records matches within the threshold:
for i = 1:num_features1
feature1 = features1(i, :);
nn_1 = 0;
nn_1_val = Inf;
nn_2_val = Inf;
for j = 1:num_features2
feature2 = features2(j, :);
diff = sum(abs(feature1 - feature2));
if diff < nn_1_val
nn_2_val = nn_1_val;
nn_1_val = diff;
nn_1 = j;
elseif diff < nn_2_val
nn_2_val = diff;
end
end
% reject matches whose NNDR is > threshold
NNDR = nn_1_val / nn_2_val;
if NNDR <= nndr_threshold
matches = [matches; [i, nn_1]];
confidences = [confidences; NNDR];
end
end
Running Time
The running time of my algorithm varies with the images it is run on; the match_features function is the slowest part, as it is quadratic in the number of features returned by get_features. The program takes a little under 30 seconds to run on the provided Notre Dame image pair.
Results
Here are some examples of local feature matching produced by my algorithm. The first example is the provided Notre Dame image pair, on which my algorithm had 88/100 good matches.
The second example is two images of Michelangelo's famous fresco
The Last Judgement located in the Sistine Chapel. My algorithm's performace on this pair was by far the best out of all the image pairs I tried on my own. Though I didn't generate ground truth evaluation, all the matches I examined by hand appeared correct. I think this is in large part due to the high level of detail in the fresco, the large number of contrasting colors, and the fact that there are few (if any) repeating patterns.
This third example is a Mount Rushmore image pair. My algorithm matched a fair number of points correctly here, but there are many obviously incorrect points. I think my algorithm struggled with this pair because of angle/lighting differences between the images, as well as the fact that there are fewer sharp edges in this pair than in the previous two examples.
The final example is two images of the old Yankee Stadium. My algorithm had a lot of difficulty with this pair, though it does seem like a more difficult pair to match than the others were. The images were from different days, so elements of the field and various signs behind the bleachers were different. The crowd was also different, and one of the images contained much of the stands while the other was pretty much only the field. My algorithm matched some points correctly, but there were many more that appear incorrectly matched.