CS 143 / Project 2 / Local Feature Matching

My Implementation

I stuck relatively close to the baseline implementation for all parts of the project, but found that I was still able to acheive relatively good performance.


Interest Point Detection

I implemented a simple Harris corner detector.

The code for this algorithm is relatively straightforward and explains my method well:


function [x, y, confidence, scale, orientation] = get_interest_points(image, feature_width)

	% 1. Compute derivatives using sobel filter
	s = [1, 0, -1; 2, 0, -2; 1, 0, -1];
	dx = imfilter(image, s);
	dy = imfilter(image, s');
	dx2 = imfilter(dx, s);
	dy2 = imfilter(dy, s');
	dxdy = imfilter(dx, s');

	% 2. Filter with large guassian
	g = fspecial('gaussian', 25, 1.5);
	ax2 = imfilter(dx2, g);
	ay2 = imfilter(dy2, g);
	axy = imfilter(dxdy, g);

	% 3. Compute scalar interest measure
	alpha = 0.06;
	har = (ax2 .* ay2) - axy .^ 2 - alpha .* (ax2 + ay2) .^ 2;

	% 4. Suppress features close to edges 
	% (we wouldn't be able to build a descriptor there anyways)
	har(1 : feature_width, :) = 0;
	har(end - feature_width : end, :) = 0;
	har(:, 1 : feature_width) = 0;
	har(:, end - feature_width : end) = 0;

	% 5a. Find connected components
	CC = bwconncomp(im2bw(har, graythresh(har)));

	% 5b. Take the max from each component region
	x = zeros(CC.NumObjects, 1);
	y = zeros(CC.NumObjects, 1);
	for i = 1 : CC.NumObjects
	    region = CC.PixelIdxList{i};
	    [~, ind] = max(har(region));
	    [y(i), x(i)] = ind2sub(size(har), region(ind));
	end
end

This algorithm generated a reasonable set of interest points for each image. Below are images representing the sets returned for the two main test images.



Feature Descriptors

I implemented the baseline SIFT-like feature descriptors as outlined in the header comment of the function stencil.

The basic approach is as follows:

  1. Obtain a 16 by 16 window matrix around each keypoint.
  2. Compute the gradient magnitude and direction for each window.
  3. Weight the magnitudes according to their distance from the center pixel by using a gaussian filter.
  4. Break the window up into 16 cells of size 4 x 4.
  5. Form a histogram for each cell, dividing the gradients into 8 buckets, each covering 45 degrees.
  6. Sum the weighted magnitudes of the gradients in each bucket.
  7. Append each cell's histogram to form the 1 x 128 feature descriptor vector.

Here's the equivalent code:


function [features] = get_features(image, x, y, feature_width)

	angle_bins = -180:45:180;
	features = zeros(length(x), 128);

	% for each keypoint
	for i = 1 : length(x)
	    % Get window
	    window = image(y(i) - feature_width/2 : y(i) + feature_width/2 - 1, ...
	                   x(i) - feature_width/2 : x(i) + feature_width/2 - 1);
	    
	    % Get gradient of window
	    [gmag, gdir] = imgradient(window); 
	    
	    % Weigh magnitudes with gaussian
	    gmag_weighted = imfilter(gmag, fspecial('gaussian', 10, sqrt(feature_width/2)));
	    
	    % Transform to cells
	    gmag_cols = im2col(gmag_weighted, [feature_width/4, feature_width/4], 'distinct');
	    gdir_cols = im2col(gdir, [feature_width/4, feature_width/4], 'distinct');
	    
	    % for each cell in 4x4 array
	    descriptor = zeros(1, 128);
	    for j = 1 : size(gdir_cols, 1)
	        col = gdir_cols(:, j);
	        [~, inds] = histc(col, angle_bins);
	        
	        buckets = zeros(1, 8); % Sum magnitudes for each histogram bucket
	        for k = 1 : length(inds)
	            buckets(inds(k)) = buckets(inds(k)) + gmag_cols(k, j);
	        end
	        
	        start_ind = (j - 1) * 8 + 1; % Compute start and end indices into descriptor
	        end_ind = (j - 1) * 8 + 8;
	        descriptor(1, start_ind : end_ind) = buckets;
	    end
	    
	    descriptor = descriptor / norm(descriptor); % Normalize descriptor
	    features(i, :) = descriptor; % Add to features
	end

	% Trick: raise each element of feature matrix to number < 1
	power = 0.8;
	features = features .^ power;

end


Feature Matching

I decided to complete the baseline, brute force implementation of calculating the euclidian distance between each pair of features.

I then sorted each column of the the resulting matrix to obtain the nearest and second nearest neighbors for each feature in the first set.

After computing the ratio of the distances from these two neighbors, I rejected all matches in which it was greater than 0.8.

My confidence values were simply one minus the ratio of nearest neighbor distances.


function [matches, confidences] = match_features(features1, features2)

	num_features1 = size(features1, 1);
	num_features2 = size(features2, 1);
	matches = zeros(num_features1, 2);
	confidences = zeros(num_features1, 1);

	% find the euclidian distance between each pair of descriptors in feature
	% space
	distances = zeros(num_features2, num_features1);
	for i = 1 : num_features1
	    for j = 1 : num_features2
	        distances(j, i) = norm(features1(i, :) - features2(j, :));
	    end
	end

	thresh = 0.8;
	[sorted_distances, inds] = sort(distances); % sort distances in ascending order

	% calculate ratio of nearest neighbor to second nearest neighbor
	% if above threshold, add to matches list and save ratio as confidence value
	for i = 1 : num_features1
	    ratio = sorted_distances(1, i) / sorted_distances(2, i);
	    if ratio < thresh
	        matches(i, :) = [i, inds(1, i)];
	        confidences(i) = 1 - ratio;
	    end
	end

	% keep only those that matched
	match_inds = find(confidences > 0);
	matches = matches(match_inds, :);
	confidences = confidences(match_inds);

	% Sort the matches so that the most confident onces are at the top of the list
	[confidences, ind] = sort(confidences, 'descend');
	matches = matches(ind, :);

end



Results

For the Notre Dame test images, 96 of my 100 highest confidence pairings correctly matched.



Of the total matches, 210 of 229 were correct (92%).



Improvements

The only improvement I made was to perform the 'trick' mentioned in the feature descriptor stencil code (i.e. raising each element in the descriptor to a power less than one).

Other than tweaking a few parameters (I set the threshold for nearest neighbor ratios up from 0.6 to 0.8), I stuck almost entirely to the baseline implementation. I found that it did a good job of matching local features for the initial test images.

Other images were a bit more tricky, and my lack of accounting for scale or orientation definitely hurt my algorithm's performance. Below are the results of matching features in two other sets of images:

Capricho Gaudi
Statue of Liberty

Both have some correct matches, but neither are close to the performance I was able to obtain on the Notre Dame images.