CS 143 / Project 2 / Local Feature Matching

This local feature matching algorithm takes in 2 images of the same object, and has 3 steps: finds the interest points in each of them, calculates the feature for every interest points and matches the features. Here is a list of detail steps in my implementation:

  1. Get interest points from both images. I first blur the image, then filter it with sobel filters to get the gradients with x and y directions: Ix and Iy. Then I use the formula to detect the Harris corners: Ix^2*Iy^2 - IxIy - alpha*(Ix+Iy)^2, where alpha is a value between 0.04 and 0.06. I put a threshold for the Harris corners to elimenate the low values. I used colfilt() for non-maximum suppression, and I manually assign 0s to the image edges so that I will not get false interest points along image edges. The step returns the coordinates non-zero values, which are the interest pionts, in the non-maxinum surpressed image. In this part, I exprimented different values of different parameters. For example, the threshold for Harris corner: if the value is higher, than only the high contract corners can be kept and the overall accuracy will increace; the window size for colfilt funcion: the samller the window size is, the more interest points we will get and thus improve the accuracy.
  2. Get features of each interest points. I have tried two different approches for this part.
    • First approach: For each interest points, first take the feature_width by feature_width window around it. Filter this window with a gaussian filter of the same size. Then for each feature_width/4 by feature_width/4 window, calculate the gradient of x and y direction: Ix and Iy. Build the histogram of 8 directions using the value of sqrt(x^2+y^2) based on the theta value of arctan(Iy/Ix). Since there are 4*4 cells in the window, the feature is 4*4(cells) * 8(directions) = 128 dimention. Then I normalised each feature and raise the feature vector to 0.6-0.8. I first tried blur each cell individually, but the accuracy was ~60%. Then I blur the window with a bigger gaussian and the accuracy raised to ~80%.
    • Second approach: First blur the image with gaussian filter. Then I constructed 8 filter of 8 different directions. Convolve the image with these 8 filter seperately. For each interest point, take a feature_width by feature_width around it. Within the window, for each feature_width/4 by feature_width/4 cell, I take the sum of the values in the cell as the value of the histgram of that direction. Then for each feature, I normalise it and raise it to power of 0.6 or 0.7. This approach actually works better for me. It raised the accuracy from ~80% to ~90%.
  3. Match the features. For each feature from image1, I calculate the distance between this feature and every feature from image2. Sorting the distance will give me the nearest neighbour and the second nearest neighbour. I take the ratio between the distance of these two neighbours as the confidence. The set of matched features contains the feature that has confidence larger than a threshold. I have tried different values between 0.7 and 0.87 for this threshold.

Some images and the distribution of its interest points

Each grey/white square represents an interest point. This image is saved after the non-maximum suppression. I keep the square so that they are more visible.

The majority of the interest points are not on the statues. Maybe there not many distinctive corners on human faces. Maybe I should experiment a bit more on the parameters.

Results in a table

152 total good matches, 9 total bad matches. Accuracy: 94.4%. Window size for colfilt: 5*5; threshold for Harris: 0.0000001; threshold for ratio:0.73;

26 total matches.

122 total matches.

59 total matches.

164 total matches.

[BAD]The algorithm detected many interest points in the tree or the cloud, which led to mismatches.

[BAD]54 total matches, but I think only half of them are good matches. Most parts of the pictures are sky and grass, which don't contribute to useful interest points.

[BAD]37 total matches, but almost all matches are not on the statues.

The good results are usually generated from the inputs which share similar angles or most of their components have similar background. My algorithm does not work well on photos of statues.