The top 100 most confident local feature matches from a baseline implementation of project 2. In this case, 93 were correct (highlighted in green) and 7 were incorrect (highlighted in red).

Project 2: Local Feature Matching
CSCI 1430: Introduction to Computer Vision

Logistics

Overview

We will create a local feature matching algorithm (Szeliski chapter 4.1) and attempt to match multiple views of real-world scenes. There are hundreds of papers in the computer vision literature addressing each stage. We will implement a simplified version of SIFT; however, you are encouraged to experiment with more sophisticated algorithms for extra credit!

Task: Implement the three major steps of local feature matching:

Potentially useful: MATLAB functions: imfilter(), fspecial(), bwconncomp(), colfilt(), sort(), atan2().

Forbidden functions: gradient(), imgradient(), imgradientxy(), corner(), extractFeatures(), detectSURFFeatures() and similar, pdist/2(), matchFeatures().

Writeup

Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant. We provide you with a LaTeX template. Please compile it into a PDF and submit it along with your code.

Task:

We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.

Rubric

If your implementation reaches 80% accuracy on the first 100 correspondences returned in 'matches' for the Notre Dame pair, you will receive 75 pts (full code credit). We will evaluate your code on the image pairs at scale_factor=0.5 (proj2.m), so please be aware if you change this value. The evaluation function we will use is proj2_averageAccuracy.m (our copy, not yours!). We have included this file in the starter code for you.

Time limit: The Gradescope autograder will stop executing proj2_averageAccuracy.m after 20 minutes. This is your time limit, after which you will receive a maximum of 40 marks for the implementation. You must write efficient code—think before you write.

The Gradescope autograder runs on an Amazon EC2 cluster with a modern desktop CPU about as powerful as your laptop. It does not have a GPU. It has 768MB memory, and going over this may terminate your program unexpectedly (!). Installed, it has MATLAB R2017b and the corresponding Image Processing Toolbox. It does not have the Computer Vision toolbox installed.

Extra Credit (max 10 pts)

Detection:

Description:

Matching:
The baseline matching algorithm is computationally expensive, and will likely to be the slowest part of your code. Let's try to approximate or accelerate feature matching:

Accuracy Competition

Independent of your grade, we will average your top 100 accuracy across the three image pairs—Notre Dame de Paris, Mount Rushmore, and Gaudi's Episcopal Palace. The top ranked submissions will earn a (probably tasty) prize, and `eternal' recognition below.

Notes

proj2.m handles files, visualization, and evaluation, and calls placeholders of the three functions to implement. Running the starter code without modification will visualize random interest points matched randomly on the Notre Dame images. The correspondence will be visualized with show_correspondence.m.

The Notre Dame, Mount Rushmore, and Episcopal Gaudi image pairs include `ground truth' evaluation. evaluate_correspondence.m will classify each match as correct or incorrect based on hand-provided matches (see show_ground_truth_corr.m for details). You can test on those images by uncommenting the appropriate lines in proj2.m. You can create additional ground truth matches with collect_ground_truth_corr.m.

As you implement your feature matching pipeline, check whether your performance increases using evaluate_correspondence.m. Take care not to tweak parameters specifically for the initial Notre Dame image pair. We provide additional image pairs in extra_data.zip (194 MB), which exhibit more viewpoint, scale, and illumination variation. With careful consideration of the qualities of the images on display, it is possible to match these, but it is more difficult.

An Implementation Strategy

  1. Use cheat_interest_points() instead of get_interest_points(). This function will only work for the 3 image pairs with ground truth correspondence. This function cannot be used in your final implementation. It directly loads the 100 to 150 ground truth correspondences for the test cases. Even with this cheating, your accuracy will initially be near zero because the features are random and the matches are random.
  2. Implement match_features(). Accuracy should still be near zero because the features are random.
  3. Change get_features() to cut out image patches. Accuracy should increase to ~40% on the Notre Dame pair if you're using 16x16 (256 dimensional) patches as your feature. Accuracy on the other test cases will be lower (Mount Rushmore 25%, Episcopal Gaudi 7%). Image patches are not invariant to brightness change, contrast change, or small spatial shifts, but this provides a baseline.
  4. Finish get_features() by implementing a SIFT-like feature. Accuracy should increase to 70% on the Notre Dame pair, 40% on Mount Rushmore, and 15% on Episcopal Gaudi. These accuracies still aren't great because the human-selected correspondences might look quite different at the local feature level. If you're sorting your matches by confidence (as the starter code does in match_features()) you should notice that your more confident matches are more likely to be true matches—these pass the ratio test more easily.
  5. Stop cheating (!) and implement get_interest_points(). Harris corners aren't as good as ground-truth points (which we know to correspond), so accuracy may drop. On the other hand, you can get hundreds or even a few thousand interest points, so you have more opportunities to find confident matches. If you only evaluate the most confident 100 matches on the Notre Dame pair, you should be able to achieve 90% accuracy (see the num_pts_to_evaluate parameter).

You will likely need to do extra credit to get high accuracy on Mount Rushmore and Episcopal Gaudi.

Credits

Assignment developed by James Hays.