CS 143 / Project 2 / Local Feature Matching

Points matched between two pictures of the Notre Dame cathedral.

The goal of this project was to compare two different images of the same scene and find matching points between them. This was accomplished by finding corners using an implementaiton of the Harris corner detector, calculating SIFT feature descriptors for each of the points, and finally finding the pairs of points between the images that match the most closely.

Harris Corner Detector


%Blurred derivatives
Ix = imfilter(image, derivative_x);
Iy = imfilter(image, derivative_y);

%Blurred square of derivatives
Ix2 = imfilter(Ix .^ 2, gauss2);
Iy2 = imfilter(Iy .^ 2, gauss2);
Ixy = imfilter(Ix .* Iy, gauss2);

har = Ix2.*Iy2 - Ixy.^2 - alpha*(Ix2+Iy2).^2;
The Harris corner detector was implemented in the following way: Images derivatives were combined by convolving the image with two "derivative filters" (which were themselves sobel filters convolved with a gaussian filter). Then, the harris equation is then used to calculate which pixels best approximate a corner; that is, a patch of the image where shifting in any direction yeilds a large change. Pixels above a threshold are taken, then non-maximum suppression is applied to get one pixel that represents each corner.

SIFT Feature Descriptors


Ix = imfilter(image, derivative_x);
Iy = imfilter(image, derivative_y);

magnitude = sqrt(Ix.^2 + Iy.^2);
direction = atan2(Iy, Ix);
direction = round(direction * (4/pi) + 8);
direction = mod(direction,8);

for i = 1:size(x,1);
	direction_chunk = direction((x_val-l_bound):(x_val+u_bound),...
                                (y_val-l_bound):(y_val+u_bound));
    magnitude_chunk = magnitude((x_val-l_bound):(x_val+u_bound),...
                                (y_val-l_bound):(y_val+u_bound));
    blurred_chunk = magnitude_chunk.*feature_gauss;
    magnitude_grid = im2col(blurred_chunk, ... 
        [feature_width/4, feature_width/4], 'distinct');
    direction_grid = im2col(direction_chunk, ... 
        [feature_width/4, feature_width/4], 'distinct')';
    for j = 1:numel(magnitude_grid);
        [y_idx,x_idx] = ind2sub(size(magnitude_grid),j);
        item_magnitude = magnitude_grid(y_idx,x_idx);
        item_direction = direction_grid(y_idx,x_idx);
        descriptor(item_direction+1, x_idx) = ...
            descriptor(item_direction+1, x_idx) + item_magnitude;
    end
    vec = reshape(descriptor, 1, 128);
    features(i,:) = vec/norm(vec);
end
Once corner pixels have been found, a SIFT (scale-invariant feature transform)j descriptor is calculated for each. SIFT descriptors are 128-vectors that describe the direction of the gradients in patches around points. This is done by calculating the x- and y-derivatives, from which a magnitude and direction of the gradient can be calculated at each pixel. Directions are mapped to the nearest pi/4 (1/8 of a circle). A chunk of the image is pulled from around the point, which is then split into 16 square sections. For each section, the magnitudes of the gradient at each pixel (weighted by distance from the keypoint) are summed with those in the same direction. The result is 16 8-vectors that represent the gradients in various sections around the image. These are reshaped into a single 128-vector, which is then normalized. These vectors are invariant to changes in the size of the chunk taken, hence scale invariance.

Feature Matching


for i = 1:size(features1,1);
    for j = 1:size(features2,1);
        pairs(j,i) = norm(features1(i,:) - features2(j,:));
    end 
end

[sorted_pairs, min_inds] = sort(pairs,'ascend');

for i = 1:size(sorted_pairs,2);
    matches(i,:) = [i,min_inds(1,i)];
    confidences(i,1) = sorted_pairs(2,i)/sorted_pairs(1,i);
end
Because features are represented by 128-vectors, they can be easily compared with one-another by finding the Euclidian distance between them. Each point is decided to match with the point in the opposite image for which this distance is the smallest. A confidence that the correct pixel has been chosen as a match can be determined by comparing the distance between the two matching points with the distance between the next-best match. The larger this difference, the better the confidence.

Results

Unfortunately, for reasons unknown, this implementation did not produce good results. On the Notre Dame example, onlly 17 of the top 100 points matched properly. Despite speaking with several teaching assistants and the professor, neither the cause nor a good solution ever became apparent despite change such as reimplementation of the code for matching and throwing out key points that were very similar to others in the same image (and thus may easily be matched improperly).