The top 100 most confident local feature matches from a baseline implementation of project 2. In this case, 93 were correct (highlighted in green) and 7 were incorrect (highlighted in red).

Project 2: Local Feature Matching
CSCI 1430: Introduction to Computer Vision

Logistics

Files: proj2.zip (6.9 MB).
Part 1: Questions
- Questions + template: Now in the zip: questions/
- Hand-in process: Gradescope as PDF. Submit anonymous materials please!
- Due: Friday 29th Sept. 2017, 9pm.
Part 2: Code
- Writeup template: In the zip: writeup/
- Required files: code/, writeup/writeup.pdf
- Hand-in process: Gradescope as ZIP file. Submit anonymous materials please!
- Due: Friday 06th Oct. 2017, 9pm.

Overview

We will create a local feature matching algorithm (Szeliski chapter 4.1) and attempt to match multiple views of real-world scenes. There are hundreds of papers in the computer vision literature addressing each stage. We will implement a simplified version of SIFT; however, you are encouraged to experiment with more sophisticated algorithms for extra credit!

Task: Implement the three major steps of local feature matching:

Detection in get_interest_points.m. Please implement the Harris corner detector (Szeliski 4.1.1, Algorithm 4.1). You do not need to worry about scale invariance or keypoint orientation estimation for your Harris corner detector.
Description in get_features.m. Please implement a SIFT-like local feature descriptor (Szeliski 4.1.2). You do not need to implement full SIFT! Add complexity until you meet the rubric. To quickly test and debug your matching pipeline, start with normalized patches as your descriptor.
Matching in match_features.m. Please implement the "ratio test" or "nearest neighbor distance ratio test" method of matching local features (Szeliski 4.1.3; equation 4.18 in particular).

Potentially useful: MATLAB functions: imfilter(), fspecial(), bwconncomp(), colfilt(), sort(), atan2().

Forbidden functions: gradient(), imgradient(), imgradientxy(), corner(), extractFeatures(), detectSURFFeatures() and similar, pdist/2(), matchFeatures().

Writeup

Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant. We provide you with a LaTeX template. Please compile it into a PDF and submit it along with your code.

Task:

Quantitatively compare the impact of the method you've implemented. For example, using SIFT-like descriptors instead of normalized patches increased our performance from 70% good matches to 90% good matches. Please includes the performance improvement for any extra credit.
Show how well your method works on the Notre Dame, Mount Rushmore, and Episcopal Gaudi image pairs—use eval.jpg as generated by the starter code. For other image pairs, there is no ground truth evaluation; use vis_dots.jpg instead (or make new ground truth!).
Submit writeup/writeup.pdf

We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.

Rubric

If your implementation reaches 80% accuracy on the first 100 correspondences returned in 'matches' for the Notre Dame pair, you will receive 75 pts (full code credit). We will evaluate your code on the image pairs at scale_factor=0.5 (proj2.m), so please be aware if you change this value. The evaluation function we will use is proj2_averageAccuracy.m (our copy, not yours!). We have included this file in the starter code for you.

Time limit: The Gradescope autograder will stop executing proj2_averageAccuracy.m after 20 minutes. This is your time limit, after which you will receive a maximum of 40 marks for the implementation. You must write efficient code—think before you write.

Hint 1: Use 'steerable' (oriented) filters to build the descriptor.
Hint 2: Use matrix multiplication for feature descriptor matching.

The Gradescope autograder runs on an Amazon EC2 cluster with a modern desktop CPU about as powerful as your laptop. It does not have a GPU. It has 768MB memory, and going over this may terminate your program unexpectedly (!). Installed, it has MATLAB R2017b and the corresponding Image Processing Toolbox. It does not have the Computer Vision toolbox installed.

+25 pts: Implementation of Harris corner detector in get_interest_points.m
+40 pts: Implementation of SIFT-like local feature in get_features.m
+10 pts: Implementation of "Ratio Test" matching in match_features.m
+20 pts: Written questions.
+05 pts: Writeup.
+10 pts: Extra credit (max 10 pts).
-05*n pts: Where n is the number of times that you do not follow the instructions.

Extra Credit (max 10 pts)

Detection:

up to 5 pts: Detect keypoints at multiple scales or using a scale selection method to pick the best scale.
up to 5 pts: Use adaptive non-maximum suppression discussed in the textbook.
up to 10 pts: Use a different interest point detection strategy like MSER. Use it alone, or take the union of multiple methods.

Description:

up to 3 pts: Experiment with SIFT parameters: window size, number of local cells, orientation bins, different normalization schemes.
up to 5 pts: Estimate feature orientation.
up to 5 pts: Multi-scale descriptor. If you are detecting keypoints at multiple scales, you should build the features at the corresponding scales, too.
up to 5 pts: Different spatial layouts for your feature (e.g., GLOH).
up to 10 pts: Entirely different features (e.g., local self-similarity).

Matching:
The baseline matching algorithm is computationally expensive, and will likely to be the slowest part of your code. Let's try to approximate or accelerate feature matching:

up to 10 pts: Create a lower dimensional descriptor that is still sufficiently accurate. For example, if the descriptor is 32 dimensions instead of 128 then the distance computation should be about 4 times faster. PCA would be a good way to create a low dimensional descriptor. You would need to compute the PCA basis on a sample of your local descriptors from many images.
up to 5 pts: Use a space partitioning data structure like a kd-tree or some third party approximate nearest neighbor package to accelerate matching.

Accuracy Competition

Independent of your grade, we will average your top 100 accuracy across the three image pairs—Notre Dame de Paris, Mount Rushmore, and Gaudi's Episcopal Palace. The top ranked submissions will earn a (probably tasty) prize, and `eternal' recognition below.

2017 Spring:

Highest Average Accuracy: Katya Schwiegershausen—66%.
Gaudi's Choice Award (Episcopal Palace only): Spencer Boyum—25%

Notes

proj2.m handles files, visualization, and evaluation, and calls placeholders of the three functions to implement. Running the starter code without modification will visualize random interest points matched randomly on the Notre Dame images. The correspondence will be visualized with show_correspondence.m.

The Notre Dame, Mount Rushmore, and Episcopal Gaudi image pairs include `ground truth' evaluation. evaluate_correspondence.m will classify each match as correct or incorrect based on hand-provided matches (see show_ground_truth_corr.m for details). You can test on those images by uncommenting the appropriate lines in proj2.m. You can create additional ground truth matches with collect_ground_truth_corr.m.

As you implement your feature matching pipeline, check whether your performance increases using evaluate_correspondence.m. Take care not to tweak parameters specifically for the initial Notre Dame image pair. We provide additional image pairs in extra_data.zip (194 MB), which exhibit more viewpoint, scale, and illumination variation. With careful consideration of the qualities of the images on display, it is possible to match these, but it is more difficult.

An Implementation Strategy

Use cheat_interest_points() instead of get_interest_points(). This function will only work for the 3 image pairs with ground truth correspondence. This function cannot be used in your final implementation. It directly loads the 100 to 150 ground truth correspondences for the test cases. Even with this cheating, your accuracy will initially be near zero because the features are random and the matches are random.
Implement match_features(). Accuracy should still be near zero because the features are random.
Change get_features() to cut out image patches. Accuracy should increase to ~40% on the Notre Dame pair if you're using 16x16 (256 dimensional) patches as your feature. Accuracy on the other test cases will be lower (Mount Rushmore 25%, Episcopal Gaudi 7%). Image patches are not invariant to brightness change, contrast change, or small spatial shifts, but this provides a baseline.
Finish get_features() by implementing a SIFT-like feature. Accuracy should increase to 70% on the Notre Dame pair, 40% on Mount Rushmore, and 15% on Episcopal Gaudi. These accuracies still aren't great because the human-selected correspondences might look quite different at the local feature level. If you're sorting your matches by confidence (as the starter code does in match_features()) you should notice that your more confident matches are more likely to be true matches—these pass the ratio test more easily.
Stop cheating (!) and implement get_interest_points(). Harris corners aren't as good as ground-truth points (which we know to correspond), so accuracy may drop. On the other hand, you can get hundreds or even a few thousand interest points, so you have more opportunities to find confident matches. If you only evaluate the most confident 100 matches on the Notre Dame pair, you should be able to achieve 90% accuracy (see the num_pts_to_evaluate parameter).

You will likely need to do extra credit to get high accuracy on Mount Rushmore and Episcopal Gaudi.

Credits

Assignment developed by James Hays.

Project 2: Local Feature Matching CSCI 1430: Introduction to Computer Vision