Notre Dame Test Case
The Harris corner detection algorithm was implemented pretty strictly. The only addition was that since the SIFT descriptor uses 16x16 blocks, so the interest points had integer-plus-a-half coordinates, in the direction of local maxima's highest neighbors. This would be more important if the algorithm implemented robustness to rotation, in which case you'd want to have an accurate center for the descriptor. A better strategy would be to use bilinear interpolation for both choosing the interest points and for input into the SIFT descriptor. Performance seemed heavily affected by the blurring after the squaring of the differentials. Somehow, a smaller blur caused the program to pick less cornery background features, which doesn't make sense to me.
Basic SIFT descriptors were implemented, and each value in the resulting vectors was squarerooted before normalization to diminish the effect of outliers.
Each feature was matched to it's nearest neighbor iff it was it's nearest neighbor's nearest neighbor, and both the feature and it's nearest neighbor had a nearest neighbor difference ratio (nndr) less than a threshold. A threshold of about 0.95 seemed best, which is rather high and suggests that the first 2 steps need improvement.
For reasonably similar image pairs, about 70% of the proposed matches were true positives. (See Notre Dame above). This algorithm works surprisingly well with static objects like rocks and buildings, and fails miserably (as expected) with more dynamic objects, like faces. Given 2 unrelated pictures of half dome from a google search, the algorithm successfully found a major horizontal feature if the cliff face and a false peak. However, given two photos of Stephen Colbert, no successful matches were made, because the algorithm only takes into account low level features like gradients.