Local Feature Matching is not a single step process. In fact, in this implementation it follows the SIFT (Scale Invariant Feature Transform) pipeline which consists of the following three stages.
In any given two images there are often numerous parts of the image that contain information that is near impossible to match simply because it has no defining features. To accoutn for this, we detect the "key points" of an image using Harris Corner Detection. To accomplish this the gradients (Ix,Iy) of the original image were taken and then blurred using a small gaussian. After this the squared gradients were obtained (Ix2,Iy2,Ixy) and then blurred again using a slightly larger gaussian. Here a scalar interest measure, har, was computed with the formula: har = (ix2 .* iy2) - (ixy.^2) - (alpha * ((ix2 + iy2).^2)); This value was then thresholded and then non maximum suppression took place. To accomplish this colfilt was used with a sliding window to find local maxima in windows which were of size feature_width/4. I also performed non-maximum suppresion using bwconncomp and taking the max of connected components but this produced worse results. I think this is because a lot of important features/ easily detectable ones were probably part of the same components.
%Colfilt code from Get Interest Points
har = (ix2 .* iy2) - (ixy.^2) - (alpha * ((ix2 + iy2).^2));
har_threshold = har;
image_size = size(har);
threshold = mean2(har) + std2(har)*.1;
disp(threshold)
har(har < threshold) = 0;
har([1:(feature_width-1), end-feature_width:end], :) = 0;
har(:,[1:(feature_width-1),end-feature_width:end]) = 0;
har_threshold = har;
har_threshold(har_threshold~=0)=1;
har_threshold_final = zeros(image_size);
har_max = colfilt(har,[feature_width/4, feature_width/4],'sliding',@max);
har_max_final = har.*(har == har_max);
[x,y,confidence] = find(har_max_final);
%Bwconncomp code from Get Interest Points
bw = im2bw(har,threshold);
cc = bwconncomp(bw,8);
x = [];
y = [];
confidence = zeros(size(har));
count = 0;
for i=1:1:size(cc.PixelIdxList,2)
count = count + 1;
[value,index] = max(har(cc.PixelIdxList{i}));
[cx,cy] = ind2sub(size(har),cc.PixelIdxList{i}(index));
x(count,1) = cx;
y(count,1) = cy;
confidence(cx,cy) = value;
end
size(x)
for i=1:1:size(x)
har_threshold_final(x(i),y(i))=1;
end
har_threshold_final([1:(feature_width-1), end-feature_width:end], :) = 0;
har_threshold_final(:,[1:(feature_width-1),end-feature_width:end]) = 0;
[x,y,confidence] = find(har_threshold_final);
Although the code below shows the histogram weighting methods I later settled on simply using bincounts because weighting the bins by gradient magnitude actually decreased the accuracy of my results.
num_directions = 16;
features = zeros(number_features,num_directions*16);
[dx,dy] = gradient(image);
theta_image = atan2(dy,dx);
theta_image(theta_image< 0) = theta_image(theta_image<0) + 2*pi;
gradient_magnitude = sqrt((dx.*dx)+(dy.*dy));
gradient_magnitude(gradient_magnitude>.2) = .2;
fwq= feature_width /4;
%binranges = [0,pi/4,pi/2,3*pi/4,pi,5*pi/4,3*pi/2,7*pi/4,2*pi];
binranges = [0,pi/8,pi/4,pi*3/8,pi/2,5*pi/8,3*pi/4,7*pi/8,pi,9*pi/8,5*pi/4,11*pi/8,3*pi/2,13*pi/8,7*pi/4,15*pi/8,2*pi];
for i=1:1:number_features
key_point_x = x(i,1);
key_point_y = y(i,1);
feature_block_large = theta_image(key_point_x-feature_width/2:key_point_x-1+(feature_width/2) ...
,key_point_y-feature_width/2:key_point_y-1+(feature_width/2));
feature_block_large_gradient = gradient_magnitude(key_point_x-feature_width/2:key_point_x-1+(feature_width/2) ...
,key_point_y-feature_width/2:key_point_y-1+(feature_width/2));
fsb1 = im2col(feature_block_large(1:fwq,1:fwq),[fwq,fwq]);
fsbg1 = im2col(feature_block_large_gradient(1:fwq,1:fwq),[fwq,fwq]);
[bincounts1,ind1] = histc(fsb1, binranges);
weighted_hist1 = accumarray(ind1,fsbg1,[num_directions 1])';
... (15 othe block computations removed for legibility here)
features(i,:) = horzcat(weighted_hist1,weighted_hist2,weighted_hist3,weighted_hist4, ...
weighted_hist5,weighted_hist6,weighted_hist7,weighted_hist8, ...
weighted_hist9,weighted_hist10,weighted_hist11,weighted_hist12, ...
weighted_hist13,weighted_hist14,weighted_hist15,weighted_hist16);
sum_f = sum(features(i,:));
features(i,:) = features(i,:)./sum_f;
xf(i) = key_point_x;
yf(i) = key_point_y;
for i=1:1:num_features1(1)
current_feature1 = features_1(i,:);
ssd = zeros(num_features2(1));
for j=1:1:num_features2(1)
current_feature2 = features_2(j,:);
differences = current_feature1 - current_feature2;
ssd(j) = sum(differences);
end
[ssd,ssd_ind] = sort(ssd,'ascend');
ratio = ssd(1) / ssd(2);
euclidean = ((xf1(i) - xf2(ssd_ind(2)))^2) + ((yf1(ssd_ind(1)) - yf2(ssd_ind(2)))^2);
if ratio < .8
if euclidean < 100
num_matches = num_matches +1;
matches(num_matches,:) = [i,ssd_ind(1)];
confidences(num_matches,:) = ratio;
end
end
end
end
Sample of Notre Dame Image
Notre Dame Image Interest Points
Non-maximum Suppression with Colfilt
Notre Dame Corresponding Points
Notre Dame Evaluation: Uses 16 directionals bin and threshold for non-maximum supression of .003.
After tweaking the gaussians, non-maximum supression method to colfilt and the threshold the Notre Dame image produced excellent top 100 results. It seems that from image to image all that needs to be chagned are these values because matching stays the same and getting features seems to always be better with 16 bins and without gradient weighting.
Non-maximum Suppression with Colfilt
Image Comparison
Here we see that although a person would pick up on the faces, the most "corner" features are the rocks and trees. While the scale is different it is easy to tell that there are clear correspondences that have been picked out.
Non-maximum Suppression with Colfilt
Image Comparison
Although there is different lighting and slightly different perspective the feature matching clearly picks up on the top of the pantheon and the nearby structure creating a very distinct connection between the images.
Non-maximum Suppression with Colfilt
Image Comparison