This project performs local feature matching on two similar objects to find corresponding points that match on both images. The three parts of this project consists of Interest Point Detection, Local Feature Detection, and Feature Matching. The pipeline begins with Interest Point Detection to find key interest points using Harris Corner Detection. After identifying these interest points, we create a feature detector around each interest point to create a feature vector describing the image patch surrounding the point. Finally, we match features between the corresponding images and return the matches with the highest confidence.
The interest point detector used for the project is the Harris Corner Detector. For every pixel, my algorithm computes the sums of the products of derivatives at each pixel. Using these values, I calculate the corner response at each pixel. I determined the values to set the constants for the corner response function and the response threshold by trying out different values and visualizing the corner points detected.
Example of Interest Point Detection on Notre Dame
In the example image above, each circle on the image of Notre Dame displayes the location of a corner detected by the harris corner detector. Looking more closely at the image, most of the corners detected by the Harris Corner Detector appear to be resonable corners. For example, most of the corners of Notre Dame are identified as valid corners. In contrast, the sky and flat surfaces are free of corners and my algorithm correctly identifies that there are no corners in these locations.
My algorithm returns the 100 most matching points. In order to measure the effectiveness of my algorithm, I simply look at how many points out of the top 100 matches are correct. The final score incorportating all my changes achieves a 91% accuracy.
In order to determine the effectiveness of the SIFT descriptors used, I made a simple implementation using normalized image 16x16 patches as the local feature. Using the normalized image patches gives a 55% accuracy compared to my final 91% accuracy, demonstrating that normalized image patches can actually be used to match points between two images to a certain level of accuracy. Normalized image patches can be useful because they are resonable features that are simple and easy to implement. However, in order to obtain better accuracy, SIFT descriptors are prefered to normalized image patches because they are more robust and accurate.
In order to extract the SIFT feature vectors from the input image, I calculate the gradient of each pixel using the 4 surrounding pixels finding the magnitude and angle of each pixel's gradient. As a preprocessing step before calculating the gradient, my algorithm blurs the image with a small gaussian to remove extreme values and obtain a more accurate gradient at each pixel. Using the gradient at each pixel, my algorithm extracts a 16x16 block surrounding each key point found in Interest Point Detection. My algorithm divides each 16x16 image patch into 4x4 blocks. For each pixel in each 4x4 block, the gradient is added to a histogram of size 8 describing the gradients profile of the 4x4 block. The size of the final feature vector is 4x4x8 = 128.
In addition, I used the normalization technique described in the textbook to obtain better results. More specifically, the feature vector is normalized, thresholded to 0.2, and normalized again. Clipping values to 0.2 helped to make the feature decriptor robust to outliers and extreme variations. Without this extra step, the accuracy of the feature matches for the Notre Dame images would drop to 80% from 91%.
Finally, I raise the value of the final feature vector to a power of 0.9. This operation has the effect of reducing the contributation that large values contribute to the distance. Including this change increases the accuracy by 3%.
![]() |
![]() |
![]() |
![]() |
The images above display the results of running my algorithm on multiple images that are similar to each other. The first image displayes the matches between two images of Notre Dame with an accuracy of 91 out of 100 matches found. Looking at the images side-by-side, we see that many of the points accurately match their points in the corresponding image. My algorithm works best when the two input images are similar in scale and orientation because scale and orientation are not accounted for. Therefore, my code will not perform well on images where the picture is obviously tilted or the sizes do not match. However, if the two input images are similar with the same scale, then it can detect matching features with high accuracy as shown in the examples above.