For the most part, I tried to emulate the very basics of the SIFT pipeline with getting keypoints, creating descriptors for those keypoints, and then matching it up with the descriptors for another image. Overall, for the basic implementation it worked to some extent (generally greater than 50%).
This part was modeled after the Harris corner detection algorithm. The following process was used to determine where the interest points were:
The three parameters I was able to tweak here was alpha, the variance of the second Gaussian filter, and the threshold. The optimal values I got to give a high enough accuracy were 0.04 for alpha, 2 for the Gaussian variance, and 1E-7 for the threshold. The threshold value would help me detect enough keypoints for testing purposes. While testing the effects of these variables, all other variables were kept as they were. The Gaussian variance dramatically improved the classification accuracy as well as I made it closer to 2. From 5->3->2, it changed from ~60% to ~78% to ~86%.
The following process was used to create the keypoint detectors:
Some of the changes I did were using oriented filters for each bin instead of placing items in bins via the arctan circle. The image was convolved with eight slightly modified Sobel filters, either through transposing, multiplying by -1, and/or by diagonalization. Removing any negative magnitude, the sum of these magnitudes of the convolutions were then placed in the bins. This helped improve the classification by about 40%.
Switching from computing the gradient orientation at each pixel and distinctly placing them in bins to using oriented filters to see the pixels' effects on multiple bins helped improve classification by a lot. This is because for the former method, many bins in a 4x4 grid would be empty, as most pixels would be forced to align themselves with one particular bin. They would be clustered around a certain theta range, which made them only fit for one bin. This made overall matching and classification more difficult because there was less to compare with.
One technique suggested was to use clamping and re-normalization. I decided not to go for this route because overall it decreased accuracy for my implementation. This might have been the case because from clamping, there could have been false positives for certain areas of my features.
The parameters in this stage of the pipeline were the power to raise the features matrix at the end. It seemed to provide the most optimal accuracy around 0.5.
The following process was used to match features:
The parameter I was able to tweak here was the threshold. The optimal value seemed to lie at 0.6.
The basic implementation of the SIFT pipeline was tested on the Notre Dame image set.
In this image set, 127 were good matches and 20 were bad matches. Overall the accuracy was 86.39%, which is not 90%, but it was more than 50%, which is okay.
I attempted to use this basic implementation on some of the other image sets. The results were not very optimal in the beginning. Most of them do share some common keypoints, and they do look like they should work based on the color schemes of the points and where they are located on each image. However, there are certain aspects of scale and orientation that our SIFT implementation is not accounting for; therefore, it cannot easily pick up these matching keypoints.
In addition, the parameters at the moment were set to optimize (to the extent that I can see) the Notre Dame image Set. Tweaking parameters does not entirely show the most optimal/any keypoint detection. In addition, changing the thresholds on correspondence detection doesn't really help because of the factors of scale and orientation.
Originally, I had my Gaussian variance set to 5. The following images were produced with that Gaussian variance. Changing it to 2 actually increased the number of bad matches.