Feature matching refers to the act of recognizing features of the same object across images with slightly different viewpoints. The goal of Project 2: Local Feature Matching has been to implement an instance of the local feature matching pipeline:
Implementations of this pipeline blur a few of these steps. In my own implementation, steps (2) and (3) are mere matters of defining a parameter, while steps (1), (4), and (5) are of greater interest.
There are a number of different ways to find key points in an image. I elected to use Harris corners detection, largely due to its simplicity.
The Harris corners detection algorithm is based on a somewhat intuitive fact: image intensities adjacent to an object corner are generally dis-similar to the intensities at the corner. One way to find corners, then, is to compute image gradients and determine where there are large changes in all directions. The greater the gradient, the more likely a particular point corresponds to an object corner.
To the right are the (magnified) results of my own Harris corners detector, run on the left image of the Notre Dame cathedral shown above. The detector tends to fire at corners and edges of the building, and in the center, at the paned round glass window.
As with detecting key points, there are quite a few ways to compute a local descriptor for a key point. A local descriptor is a set of data that localizes a key point, i.e. information to make it stand apart from other key points.
For this project, I implemented SIFT (scale-invariant feature transform) descriptors. SIFT descriptors are computed by determining the intensity gradients of a small window, 16x16 in my implementation, surrounding a point of interest (key point). This 16x16 window is further subdivided into a 4x4 batch of 4x4 regions; eight gradients for the pixels in each of these regions are "binned" together, resulting in an 8x4x4 = 128 dimensinal descriptor.
The last step, matching local descriptors from one image to local descriptors in the corresponding image, is non-trivial but also not especially interesting.
Matching descriptors is a two-step process.
The decision to use the given algorithms described previously was made early on. Particular details about the implemtation, however, made large improvements in the performance of this feature-matching pipeline.
![]() |
![]() |
The above pair of images demonstrates changes across one particular change that produced a nearly 10% improvement in performance. The first image shows matched points before the change; the second image shows them afterwards.
As stated above, a SIFT descriptor is computed using the gradients in a 16x16 window surrounding the point of interest. My early implementations computed the window of the image, then performed filtering operations in order to compute the gradients. However, this order of computation causes inaccuracies at the edges of each window: filtering operations must interpolate pixel data beyond the edge of a window, using one of several different strategies to do so. As a result, the computed gradient at these edges was not necessarily equal to the true gradient. Changing the computation order resulted in a nearly 10% improvement to about 87%, shown in the images above.