CS 143 / Project 2 / Local Feature Matching

Example of a right floating element.

Local Feature Matching attempts to find correspondences between two scenes through three key steps. First, the Harris Corner detection algorithm pulls out distinctive features in both images. Second, these points are described using a local feature descriptor, in this case a SIFT like descriptor. Finally these features are compared to find the minimum and second minimum Euclidean neighbors for each point. If the ratio between the first and second neighbor is low enough the points are likely the same real-world location.

Harris Corner Detector

The Harris Corner detector works by finding the x and y derivatives of the images. This is accomplished by filtering each image with the Sobel and its transpose. Next the derivatives are squared to form two squared derivatives images and also an x-derivative times y-derivative image. The underlying mathematics relies on eigenvalues to describe how corner-like a given patch of an image is, however, this math can be simplified to use the squared derivatives to speed up computations. Pixels that are in a patch with a high corner score are considered distinct and get passed onto the next stage. In order to only operate on single pixels, non-maxima supression is also run which finds the maximum of patches with scores above the threshold.

The X Gradient

X Gradient Squared

The Y Gradient

Y Gradient Squared

X dot Y Gradient

Local Feature Descriptor

The distinctive points found in the first step now need to have their surroundings described so that matches can be determined. The SIFT local feature descriptor comprises several parts. First it subdivides the feature patch into 16 bins. Each of these bins then sorts the magnitudes of each pixel's image gradient into an 8 way histogram based on the direction of the gradient. All 16 of these 8 part histograms are combined into a single vector. This vector is then normalized, each histogram entry above .2 reduced down to .2, and then the whole vector renormalized. These feature descriptors are then given to the final portion.

Feature Matching

Now that the points have been described using SIFT, the features from each image are compared to find the best matches. First, pdist2() finds all the Euclidean distances between each set of point's feature descriptors. The minumums of this are saved and and then the second nearest neighbor descriptor is found. The ratio between these two points gives an indication of how likely the match is unique. The more unique the match the more likely that it is a genuine correspondence. The top 100 matches based on this criteria are selected

Example Correspondences

With Feature_Width = 60

71 correct matches and 29 bad matches based on ground truth correspondences. Most of the error came from fie repeated structure which careful tuning of the feature matching nearest neighbor ratio can help prevent.

With Feature_Width = 40

68 correct matches and 32 bad matches based on ground truth correspondences.

With Feature_Width = 60

Mostly correct matching on house structure with potential mismatches in finer tree structure.