CS 143 Project

CS 143 / Project 2 / Local Feature Matching

Summary

For the most part, I tried to emulate the very basics of the SIFT pipeline with getting keypoints, creating descriptors for those keypoints, and then matching it up with the descriptors for another image. Overall, for the basic implementation it worked to some extent (generally greater than 50%).

Interest Points

This part was modeled after the Harris corner detection algorithm. The following process was used to determine where the interest points were:

Filter the entire image with a Gaussian filter.
Determine the x and y gradients of the filtered image.
Filter these gradients with a bigger Gaussian filter.
Compute a scalar interest measure: det(A) - alpha * trace(A)^2, in which A is made up of the gradients.
Make the border of the scalar interest measure equal to 0 to prevent random border keypoints.
Using colfilt, create a sliding window that determines the local maxima of the matrix.
Return the x/y coordinates of the local maxima that are above a certain threshold.

The three parameters I was able to tweak here was alpha, the variance of the second Gaussian filter, and the threshold. The optimal values I got to give a high enough accuracy were 0.04 for alpha, 2 for the Gaussian variance, and 1E-7 for the threshold. The threshold value would help me detect enough keypoints for testing purposes. While testing the effects of these variables, all other variables were kept as they were. The Gaussian variance dramatically improved the classification accuracy as well as I made it closer to 2. From 5->3->2, it changed from ~60% to ~78% to ~86%.

Detectors

The following process was used to create the keypoint detectors:

For each keypoint we found, create a 16x16 pixel area around it.
Create eight filtered results of the area, filtering the area with eight different oriented Sobel filters.
Threshold these results to retain only positive values.
Iterate over 4x4 cells within the 16x16 area.
Sum over the 4x4 cells and place these sums into their own bin. There should be eight bins.
Concatenate these bins from the 4x4 cells together to form a 128-dimensional vector.
Normalize the 128-dimensional vector and then put it in the overall feature matrix.
Raise the features matrix to a decimal power to help with feature matching.

Some of the changes I did were using oriented filters for each bin instead of placing items in bins via the arctan circle. The image was convolved with eight slightly modified Sobel filters, either through transposing, multiplying by -1, and/or by diagonalization. Removing any negative magnitude, the sum of these magnitudes of the convolutions were then placed in the bins. This helped improve the classification by about 40%.

Switching from computing the gradient orientation at each pixel and distinctly placing them in bins to using oriented filters to see the pixels' effects on multiple bins helped improve classification by a lot. This is because for the former method, many bins in a 4x4 grid would be empty, as most pixels would be forced to align themselves with one particular bin. They would be clustered around a certain theta range, which made them only fit for one bin. This made overall matching and classification more difficult because there was less to compare with.

One technique suggested was to use clamping and re-normalization. I decided not to go for this route because overall it decreased accuracy for my implementation. This might have been the case because from clamping, there could have been false positives for certain areas of my features.

The parameters in this stage of the pipeline were the power to raise the features matrix at the end. It seemed to provide the most optimal accuracy around 0.5.

Feature Matching

The following process was used to match features:

Iterate over the keypoint descriptors in the features1 matrix.
For each keypoint descriptor, determine the Euclidean distances of this keypoint descriptor against all the keypoint descriptors of the other features matrix.
Sort the Euclidean distances and get the ratio of the first two distances.
If the ratio is less than the threshold, consider this a match and save the feature index matching.
Set the confidence rating of this matching to 1 - ratio.
Once you have all the matchings, sort by descending confidence.

The parameter I was able to tweak here was the threshold. The optimal value seemed to lie at 0.6.

Discussion

The basic implementation of the SIFT pipeline was tested on the Notre Dame image set.

In this image set, 127 were good matches and 20 were bad matches. Overall the accuracy was 86.39%, which is not 90%, but it was more than 50%, which is okay.

I attempted to use this basic implementation on some of the other image sets. The results were not very optimal in the beginning. Most of them do share some common keypoints, and they do look like they should work based on the color schemes of the points and where they are located on each image. However, there are certain aspects of scale and orientation that our SIFT implementation is not accounting for; therefore, it cannot easily pick up these matching keypoints.

In addition, the parameters at the moment were set to optimize (to the extent that I can see) the Notre Dame image Set. Tweaking parameters does not entirely show the most optimal/any keypoint detection. In addition, changing the thresholds on correspondence detection doesn't really help because of the factors of scale and orientation.