CS 143 / Project 2 / Local Feature Matching

The goal of this assignment is to create a local feature matching algorithm using techniques described in Szeliski chapter 4.1. The pipeline we suggest is a simplified version of the famous SIFT pipeline. The matching pipeline is intended to work for instance-level matching -- multiple views of the same physical scene.

Code

Disclaimer: I was having trouble with the scaled images so I commented that part out and set the scale factor to one. This should just have the effect of making the code run a bit more slowly.

For the interest points I followed the procedure outlines in the lectures with nothing fancy or special. I spent a little bit of time tweaking alpha to get reasonable results but pretty standard otherwise. For feature detection, I compute Ix and Iy and the gradients from that. I probably pay some price in speed, but from switch as much of the code as possible to matlab built-ins I was able to get the feature detection runtime down to ~5 second from ~30 seconds (which made testing a nightmare). For matching I used matlabs builtin knnsearch function which made the code shocking simple. It also allowed me to play we all the builtin distance functions discover that infact, euclidean distance performed best. Interestingly, the correlation function gave comparable results to the euclidean distance and if I had more time I would have liked to look into how to combine matches from using both euclidean and correlation.

Parameters and tweaks and effects

There are many parameters to play with in this project, and all of them can greatly impact the overall results of the project. For example, the feature width had the greatest impact on my results. By increasing the feature width from 16 to 28, my match percentage increased by over 20% (~72% -> ~92%). Furthermore, normalizing the feature vector increased my match percentage by 5% (55% -> 60%), thresholding the normalized vector increased 4% as well as normalizing again afterward. Raising this final vector to a power less than 1 (.5) increased another 2%.

I also tried taking into account where my features were matching. Specifically, I consider my most confident matchs and use that to build a median angle and length of translation from a feature in one image to another. Then, by updating the confidence so that matches that respect this translation are weighted more heavily (and matches that don't are penalized) I was able to increase the accuracy to 97% (from 92%). This "improvement" compeltely chokes on images that have a large different in scale, which makes sense. A better method would be to try to account for all affine transformation instead of just translation (also accounting for translation more intelligently wouldn't hurt).

The results

Below is the best 100 matches for the Notre Dame image. Notice that although there are three errors the matches look identical when viewing small regions around them. Also notice that they aren't that far from one image to the next. This is likely because of the translation consideration I mentioned earlier.

Below is the best 50 matches for the Notre Dame image. There are no mismatches here which shows that the confidence values are relatively sound.

The next bunch of images I just show side to side, and leave it to the reader to see "how well" the matching did. Without the ground truth it is tedious to describe. =(. Orientation and scale completely kill the quality of the matches which is expected considering that this implementation is neither scale or rotation invariant.

Notre Dame images with different scales.

House. Notice that it tries to match heavily on trees which is rather problematic.

These images of the Statue of Liberty have a fairly strong orientation difference which cause havoc to the matching.

These images of the Statue of Liberty have a weaker orientation difference so the matching is better.

All in all, these matches are pretty poor which seems to be from a case of overfitting for our one training example, the Notre Dame image.