CS 143 / Project 2

Feature Detection and Matching

Objective

The objective of this assignment was to implement a SIFT pipeline for matching features in images. Given two images of a the same object, we want to find the parts of the image that identify it and then find matches between the images

Algorithm

The algorithm is broken down into three steps:

Getting the interest points

In order to figure out what points are good to use as features in an image, we look for corners in the image using Harris corner detection.

The basic steps are to calculate the gradients of a slightly blurred image (in the x and y direction), find their products, blur them, and calculate an interest value for each pixel. Then we take the local maxima of the resulting image as our interest points.

Blur the image slightly
Compute gradients I_x and I_y by convolving image with Sobel filter [-2 -1 0  1 2] and its transpose
Calculate products I_x*I_x, I_x*I_y, I_y*I_y
Blur each of these with a gaussian
Compute the scalar interest value image
Use COLFILT to replace each pixel value with the maximum interest value in its surrounding pixels
Filter out the pixels where the interest value isn't a local maximum

Getting feature descriptors for each interest point

Now that we have our interest point, we need a way to describe the interest point and its surroundings (the feature patch). We do this by calculating a SIFT descriptor for each feature patch.

The SIFT descriptor is a 128 element vector, where every 8 elements represents a histogram of the orientations of the gradients of a block of pixels. The feature patch is divided into 4x4 blocks, and each of these blocks has a separate histogram (8 elements), resulting in 8x16 = 128 values. These values form a basic SIFT feature descriptor

compute the x and y gradients of the image
compute the orientation of the gradient at each pixel using atan2
for each interest point
    for each of 4x4 blocks
        bin the angles into 8 bins
        append the histogram to the feature descriptor for this interest point
    normalize the feature descriptor
    append the feature descriptor to a list of feature descriptors
take all the elements of the feature descriptor list to a power less than one

I used the suggestion given in the stencil code comments to take the feature descriptors to a power after computing and normalizing them. The power that I chose was 0.9, which seemed to yield the best results.

Finding matches between the features of each image

The final part of the process is to take two lists of feature descriptors (from two images) and find pairs of features that match.

This is done with simple nearest neighbors (NN) approach, with a slight modification. After determining the nearest neighbors for a feature descriptor, we reject the nearest neighbor if the ratio of distance between the nearest and second-nearest neighbors is above a threshold. We also reject the nearest neighbor if the distance is above a certain threshold.

for each feature in the first list
    find distances from each feature in the second list
    sort them in order of increasing distance
    if the nearest neighbor is close enough and the ratio is low enough
        add the pair to the matches array with confidence = 1/distance to nearest

The distance threshold I used was 0.8 and the ratio threshold I used was 0.85.

Results

The accuracy on the Notre Dame image was 93% for the top 100 most confidence feature matches.

Other Results

Here are some more images that were matched. You can see that the majority of the features that are matched are correct.