The goal of this assignment is to create a local feature matching algorithm using techniques described in Szeliski chapter 4.1. The pipeline we suggest is a simplified version of the famous SIFT pipeline. The matching pipeline is intended to work for instance-level matching -- multiple views of the same physical scene.
In particular, there are three parts to this projectThe Harris Corner function comes from a first order approximation of the slight shift equation. After simplifying, you find that the cornerness score of a pixel is HC(x,y) = g(f_x^2)*g(f_y^2) -g(f_x*f_y)^2 - alpha*[g(f_x^2) + g(f_y^2)]^2. Here are some pictures for visualizing what those filtered images look like.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Afterwards, create a SIFT local descriptor for each of these interest points. Basically, the idea of a SIFT local descriptor is to capture the motif of an image patch. This is done by calculating the image gradients in the eight compass directions and then summing the magnitude of thoes compass directions in each of the 16 subpatches of your original image patch. For my implementation, I just convolved each image with a rotated sobel filter.
I created another set of visualizations to better explain this process. The three panels in each image include the image patch, the image patch's response to the filter and then the feature vector reshaped to a square. As you can see, the feature vector looks somewhat like the filter response but pixelated.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
And not that it tells you anything, but here's the feature vector in all its imagesc() glory.
For matching, you just do a nearest neighbor search any distance function of your choice. I chose to use knnesarch which used the elucidean distance function by default. Below is the output for the Notre Dame test case.
My algorithm generally does well with images taken from similar view points under similar lighting conditions. As you can see in the first three examples, the color distribution of dots is usually the same in the same neighborhood, meaning that the correct pair of points were matched together.
![]() |
![]() |
![]() |
On the other hand, it performs a lost worse under different lighting conditions and considerably changed viewing angle, which is a known limitation of the detector. The image subject's relationship with the background also has a huge effect. In the following statue of liberty example, you can see that the algorithm picks totally different points because in one the sky is brighter than the statue, while the statue is more bright than the sky in the other. The following are two cases which the algorithm did not perform well on.
![]() |
![]() |
Certainly an effective algorithm under the right scenarios but perhaps more preprocessing needs to be performed on the input images for them.