The two images that we will be working with: two views of the Notre Dame cathedral.
The problem of image matching, where both images are multiple views of the same scene can be accomplished using the SIFT pipeline. Here we attempt to match images using a simplified version of the pipline, broken into three steps.
- Finding distinct 'keypoints' within both images
- Identifying each keypoint's 'features'
- Matching features from both images.
To begin, we have to find distinct keypoints within each image. One such method to do so was proposed by Harris & Stephens, in which they described a method to locate corner and edge points in an image, which are well-suited to serve as our keypoints. We compute a 'cornerness score' for each pixel in the image from the eigenvalues of a self-correlation matrix, derived from the gradient of the image. We threshold out the majority of cornerness scores, and then to prevent duplicate entries for a single keypoint, we keep only the maximal value for each connected group of cornerness scores above the threshold.
The results of this method are promising! We get a nice number of keypoints near the edges, corners, and characteristic areas of each image, and they match well to those in the other image.
Next, we need to find a way to describe the features of each keypoint. To do this we'll use a portion of the SIFT algorithm.
We begin by taking the 16x16 window around each keypoint from the gradient of the original image. (Note that we suppressed keypoints near the edges of the image in the keypoint step, so as to have enough data for all keypoints in this step) We then further separate this window into 4 smaller windows along each side, so as to obtain a 4x4 array. Within each array, we compute a histogram of the vectors within the array, sorting them by eight cardinal directions. (Note that, for simplification purposes, our implementation stores each vector twice, in the nearest two cardinal directions in the histogram)
To prevent vectors farther from the center of the window from overly affecting the description, we'll use a Gaussian falloff with sigma = half the window size to lessen the weight of vectors near the edges.
A visualization of the keypoint descriptors we are computing. Note that the actual implementation uses a 4x4 array computed from a 16x16 sample, whereas the image shows a 2x2 array computed from an 8x8 sample. The blue circle shows the Gaussian falloff.
We now have a 4x4x8, or 128x1 description of each keypoint. We can use this information to perform matching.
Since we have 128x1 keypoint descriptors, we can simply take the keypoints with minimal distance between their descriptors as matches (distance can simply be computed as the norm of the difference of the 128x1 vectors). We can further compute, as a measure of confidence, the ratio of the second-nearest neighbor's distance to the nearest neighbor's distance. This gives us a means to sort our matches by some measure of goodness of fit.
I am unpleased to say that I couldn't get more than 25% good matches with my implementation.
51 total good matches, 161 total bad matches
I am unsure where my implementation was flawed.
1. The Harris corner detector implementation looks like it produced good results.
2. Swapping out my own implementation of get_features for the MATLAB official implementation, extractFeatures, makes the good match percentage go to 30-35% for certain cutoffs of keypoints. However, at my current cutoff of 99.8% for keypoints, swapping for extractFeatures in fact gives me the exact same number of good matches (51 total good matches, 161 total bad matches), though it should be noted that the good matches from extractFeatures have somewhat higher confidence.
3. Is testing the ratio and distance by itself not a good metric of goodness of fit? Do we need to take into account more spatial reasoning?
I am puzzled at where to continue, except for the optional features that should not be required to get a good number of matches. Since my implementation performed poorly enough on the test images, it seemed silly to provide additional examples here, so I decided not to.
I implemented the pipeline to the best of my understanding, to an extent that it somewhat(?) works. However, I seem unable to push past my current level of good matches with only the required implementations.