CS 143 / Project 2 / Local Feature Matching

The result of matching two images of the Notre Dame in Paris. The points in green circles are the correct matches.

In this project, the task is to match local features between two images containing the same object but taken at slightly different condition, which means the lighting, color, angle of the image may vary. To implement this matching function, three steps are required:

  1. Implement the Harris corner detector to find out the interest points that are potentially good for matching.
  2. Implement the SIFT (Scale Invariant Feature Transform) descriptor to generate a histogram of gradients of each interest point.
  3. Implement the matching using the SIFT descriptor. We match two points in respective images by calculating the nearest neighbor distance ratio and then find the nearest neighbor.

In the project, we could get considerably different results through tuning several parameters, namely the threshold values. Now I would discuss the best result of my experiments with the Notre Dame image set. The best result is 61 good matches and 36 bad matches. This is achieved by setting the threshold of Nearest Neighbor Distance Ratio to 0.75 and the threshold for Harris corner detector to 0.005. The process of parameter tuning and the resulting consequence would be discussed in greater detail in the following section.

Harris Corner Detector

The result of the corner detector when threshold value is 0.005

In this phase of the project, we will implement the Harris corner detector. Because corners are the "special" points on the image that we can use for identification purposes. (While things like edges are not as good for this purpose) Thus, we will use corners detected as our interest points in the following phases.

The implementation is first to calculate the gradient and second time gradient along x and y direction and then perform the non-maximal suppression to filter out the points with less response. And the points left are considered corners in the image.

Threshold value: an important threshold value here is the threshold for non-maximal suppression. We can tune this value to determine how many interest points we want. The larger the value, the fewer points we get, but these points are the ones with larger response (very obvious corners). Here the default value is 0.005, which gives us about 1000 corners.

SIFT Descriptor

In this phase of the project, we need to implement the SIFT descriptor, and SIFT stands for Scale Invariant Feature Transform. In this descriptor, we will extract the 16x16 patch around the interest point and compute the gradient. (And we use Gaussian fall off function to reduce the effect on the sides.) For every 4x4 pixel in the patch, we will make the gradient a histogram which has 8 bins. We put the gradient values in different bins according to their directions. The histograms of gradient form the descriptor for each patch.

!NOTE! In my implementation, I implemented two different method. The default method is that each gradient contributes to one bin. And the other method (which is commented out) is that each gradient contributes to the two nearest directions. By experiments, these two method generate roughly the same result.

Feature Matching

In this phase of the implementation, the matching is done by calculating the Nearest Neighbor Distance Ratio and select the nearest neighbor.

Threshold: An important threshold value in this phase is the threshold for the ratio. The default value is 0.75, which gives us 61 good matches and 36 bad matches. If we want more matches, then we could increase the threshold. For example, setting the threshold to 0.8 gives us 84 good matches, but the number of bad matches will slightly increase accordingly. Setting the threshold to 0.9 gives us 135 good matches.

Results of Different Image Set

The result of the original Notre Dame image set which contains 61 good matches and 36 bad matches.

The result of the original Notre Dame image set with the nearest neighbor ratio threshold set at 0.8. This gives us 84 good matches and 58 bad matches.

The result of the house image set. From the picture we can see more than half of the matching is correct.

The result of the Gaudi image set. From the picture we can see more than half of the matching is correct.