CS 143 / Project 2 / Local Feature Matching

Feature matching on Notre Dame Cathedral.

For this project, I implemented point detection and local feature matching, as described through the CS143 lecture slides (1, 2, 3), Szeliski, and David Lowe.

The pipeline is as follows:

  1. Find the interest points, by identifying corners using a Sobelian filter.
  2. Take a profile of each feature point, by measuring the response of its pixel neighborhood to gradients in various directions
  3. Find the best feature matches

Finding Interest Points

To find the interest points, I implemented the Harris Corner detector. This involved using a convolved Gaussian and Sobel Filter, in the x and y directions, and filtering each image.

One of the photos of Notre Dame, and the derivatives in the x and y directions.

The Gaussian blur removed nose from the image such as scratches or textures, which would have registered as barriers/borders to the Sobel filter. The Sobel filter, then measured the relative gradients in the x and y directions

This resulted with two images per photo, representing the borders within the photos vertically and horizontally. Now, in order to find the actual corners, I had to identify areas of strong changes in both images. By finding the pixels in each image that had strong eigenvalues, I was thus able to find the corner points.

Part of this step was squaring each individual derivative image, as well as multiplying the two together. That math has been elided here, but can be found in slide 32 of this presentation.

This returned an image of speckles, with each speckle being a point of relatively high contrast in two directions. To actually find our points, I had to perform non-maxima suppression. To do this, I used the colfilt function of Matlab to find the local maxima in each 16x16 neighborhood, and then selected only the pixels of that value within the neighborhood.

Additionally, I searched for areas of contiguous pixels by using the bwlabel function. This found areas of contiguous non-zero pixels, and labeled them. For each group, I selected only the maximum. However, this step takes a substantial amount of time.

This resuled with a speckled black and white image, with each white point representing a real corner--I used the find function to retrieve the x and y coordinates.

A scatter plot of the cathedral's corners, rotated counter clockwise.


Defining Features

Each feature is defined by a 128 vector array, with each vector representing the relative strength of a 2x2 pixel cell in a given direction. Measured in 8 directions, with 16 cells for each corner, this yields 128 vectors.

The process itself is not quite as photogenic as corner detection. First, I make an array of 8 copies of the current image, and then I filter each one with a sobel blur in a given direction (0, 45, 90, 125 ... -45 degrees). I then look at the 16x16 neighborhood around each corner-point. Before I do calculations/tabulations, though, I first do a element-wise multiplication of the region with a Gaussian blur, so that the corners of the window don't get overly represented.

I then use the convenient im2col function to break the image apart into 2x2 cells and convert that into columns, and sum each column. This sum then becomes a vector in the 128 vector representation of the feature.

A histogram of the relative strength of a 16x16 pixel neighborhood. There are 16 2x2 cells, and each cell is measured in 8 directions. There's no meaningful order of the histogram--in fact, I'm not really sure about it myself--but because it's all relative to one another, it works out in the end.


Feature Matching

Now it comes down to feature matching, where we find pairs that have similar feature profiles. I started by taking the sum of square differences, which worked out modestly well, but then realized there was a knnsearch function which found the k-nearest-neighbors for me, except better.

However, rather than scoring on values alone, I'm scoring on the ration of the first best match and second best match. On features that have high repetition (for example, the window bars, the central radial window, or the statues), there will exist many features with many good matches. So if the first best match is not that distinctive from the second best match, I discard it. Literature suggested using a value of 0.75 as my cutoff, but from trial and error I re-callibrated this value to 0.9

This resulted with my final images: for 100 total points in each image, I got a pairing of 54 correct, and 46 incorrect.

54 percent accuracy


Possible Improvements

The best way to begin improving this score would be to also use the geometric data of the images. That is, after finding some set of possible matches, conclude a likely transformation from one image to another, and discard pairings that sit outside this transformation. This would clean up much of the outliers, or points that clearly don't work.

More images, more fun!

I didn't have time to hand annotate each other image for some base-correctness, but I nonetheless ran some images from the internet through my feature detector, to see how well it could perform

Falling Water, by Frank Lloyd Wright

Because this scene had a lot more foliage, there was a lot more background noise in the original two images. Thus, to produce a better result, I had to increase the Gaussian blur I used, before I began my Harris Corner detectors. Furthermore, because of the relatively small aspect ratio of the images I found online, I could only generate 25 pairs of matching points.

Guernica, by Picasso

When it came to doing art, I wanted to do a style that would have sharp corners and defineable features. So naturally, I wanted to see how my feature detector worked on Cubist paintings.