CS 143 / Project 1 / Image Filtering and Hybrid Images

CS 143 / Project 2 / Local Feature Matching

Local Feature Matching is not a single step process. In fact, in this implementation it follows the SIFT (Scale Invariant Feature Transform) pipeline which consists of the following three stages.

  1. Interest Point Detection
  2. Local Feature Description of Interest Points
  3. Feature Matching

Interest Point Detection

In any given two images there are often numerous parts of the image that contain information that is near impossible to match simply because it has no defining features. To accoutn for this, we detect the "key points" of an image using Harris Corner Detection. To accomplish this the gradients (Ix,Iy) of the original image were taken and then blurred using a small gaussian. After this the squared gradients were obtained (Ix2,Iy2,Ixy) and then blurred again using a slightly larger gaussian. Here a scalar interest measure, har, was computed with the formula: har = (ix2 .* iy2) - (ixy.^2) - (alpha * ((ix2 + iy2).^2)); This value was then thresholded and then non maximum suppression took place. To accomplish this colfilt was used with a sliding window to find local maxima in windows which were of size feature_width/4. I also performed non-maximum suppresion using bwconncomp and taking the max of connected components but this produced worse results. I think this is because a lot of important features/ easily detectable ones were probably part of the same components.

Sample Code from Get Interest Points

Here you can see part of the get inerest points code in which har is computed and thersheld. Following this you can see the non-maximum suppression in addition to the computation of sample images to view which points were kept from the original image. It is also important to note here that all points around the border of the image, feature_width away from an edge were discarded so that only points with completely formed features were grabbed.


%Colfilt code from Get Interest Points
    har = (ix2 .* iy2) - (ixy.^2) - (alpha * ((ix2 + iy2).^2));
    har_threshold = har;
    image_size = size(har);
    
    threshold = mean2(har) + std2(har)*.1;
    disp(threshold)
    har(har < threshold) = 0;
        
    har([1:(feature_width-1), end-feature_width:end], :) = 0;
    har(:,[1:(feature_width-1),end-feature_width:end]) = 0;
    
    har_threshold = har;
    har_threshold(har_threshold~=0)=1;
    har_threshold_final = zeros(image_size);
        
        
    har_max = colfilt(har,[feature_width/4, feature_width/4],'sliding',@max);
    har_max_final = har.*(har == har_max);
    [x,y,confidence] = find(har_max_final);


%Bwconncomp code from Get Interest Points
bw = im2bw(har,threshold);
cc = bwconncomp(bw,8);

x = [];
y = [];
confidence = zeros(size(har));
count = 0;

for i=1:1:size(cc.PixelIdxList,2)
    count = count + 1;
    [value,index] = max(har(cc.PixelIdxList{i}));
    [cx,cy] = ind2sub(size(har),cc.PixelIdxList{i}(index));
    x(count,1) = cx;
    y(count,1) = cy;
    confidence(cx,cy) = value;
end


size(x)
for i=1:1:size(x)
   har_threshold_final(x(i),y(i))=1;
end 
har_threshold_final([1:(feature_width-1), end-feature_width:end], :) = 0;
har_threshold_final(:,[1:(feature_width-1),end-feature_width:end]) = 0;
[x,y,confidence] = find(har_threshold_final);

Bells and Whistles

The only unique things about the interest point detector are that boundary features are removed here instead of the feature descriptor and that the threshold is not a hardcoded value. Instead, I saw online that I could set the threshold to the mean of the har values + .1*standard_deviation of the the har values. This was tweaked at times but it produced what seemed like very reasonable key points. However, I later discarded this method since it made more sense for testing to tweak the values.

Observations get Interest Points

Here are noticed three main things. First that using a sobel filter with imfilter produces very similiar results to simply computing the gradient. More importantly however, I witnessed first hand how powerful blurring can be. Right now my gaussian filter (small and large) are set with cutoffs (fspecial('Gaussian')with standard deviations .5 and 1 respectively. I noticed that even increasing these values by a little drasticaly changed the number of key points and their location as some features were completely lost by the blur. Another important note was that alpha at 0.06 produced the best results (a value obtained from the textbook). Any lower and their were too many values and higher there were too few.

Local Feature Description of Interest Points

Here a SIFT-like local feature detector was implemented by taking the gradients of the image and computing the magnitude and orientation at every pixel. Then splitting each feature into 16 blocks the magnitude and orientation of the block was obtained using histograms so that each feature could be descriped by an array or 8 orientations for each block within the feature. This system was a huge step up from using normalized patches which had a very minimal ability to obtain signifigant enough data about features to match them.

Although the code below shows the histogram weighting methods I later settled on simply using bincounts because weighting the bins by gradient magnitude actually decreased the accuracy of my results.

Sample Code from Get Features


        num_directions = 16;
        features = zeros(number_features,num_directions*16);
        [dx,dy] = gradient(image);
        theta_image = atan2(dy,dx);
        theta_image(theta_image< 0) = theta_image(theta_image<0) + 2*pi;
        gradient_magnitude = sqrt((dx.*dx)+(dy.*dy));
        gradient_magnitude(gradient_magnitude>.2) = .2;
        
        fwq= feature_width /4;
        %binranges = [0,pi/4,pi/2,3*pi/4,pi,5*pi/4,3*pi/2,7*pi/4,2*pi];
        binranges = [0,pi/8,pi/4,pi*3/8,pi/2,5*pi/8,3*pi/4,7*pi/8,pi,9*pi/8,5*pi/4,11*pi/8,3*pi/2,13*pi/8,7*pi/4,15*pi/8,2*pi];
            
            
        for i=1:1:number_features
        key_point_x = x(i,1);
        key_point_y = y(i,1);
            
        feature_block_large = theta_image(key_point_x-feature_width/2:key_point_x-1+(feature_width/2) ...
            ,key_point_y-feature_width/2:key_point_y-1+(feature_width/2));
            
        feature_block_large_gradient = gradient_magnitude(key_point_x-feature_width/2:key_point_x-1+(feature_width/2) ...
            ,key_point_y-feature_width/2:key_point_y-1+(feature_width/2));
        
        fsb1 = im2col(feature_block_large(1:fwq,1:fwq),[fwq,fwq]);
        fsbg1 = im2col(feature_block_large_gradient(1:fwq,1:fwq),[fwq,fwq]);
        [bincounts1,ind1] = histc(fsb1, binranges);
        weighted_hist1 = accumarray(ind1,fsbg1,[num_directions 1])';
        
        
        ... (15 othe block computations removed for legibility here)

        features(i,:) = horzcat(weighted_hist1,weighted_hist2,weighted_hist3,weighted_hist4, ...
        weighted_hist5,weighted_hist6,weighted_hist7,weighted_hist8, ...
        weighted_hist9,weighted_hist10,weighted_hist11,weighted_hist12, ...
        weighted_hist13,weighted_hist14,weighted_hist15,weighted_hist16);
        
        sum_f = sum(features(i,:));
        features(i,:) = features(i,:)./sum_f;
        xf(i) = key_point_x;
        yf(i) = key_point_y;
            
        

Bells and Whistles: Get Features

As seen in the block above a few bells and whistles were added to feature detection to make it faster/ more useful. For starters, to make each feature more unique, each block within a feature was described with 16 directions instead of 8. Although this slowed down the process a little it did improve results. Furthermore, instead of simply using the number of blocks falling in each directional bin in the historgram, I weighted the historgrams by the magnitude of the gradients (after being clipped at .2) to recieve more relevant information about orientation. This imporvement resulted in a drastic increase in the number of matched features from about a 1/3 to a little over 50%. Also it should be noted that I wanted to use a little for loops as possible so I actually performed loop unrolling to obtain the 16 blocks from each feature.

Observations

In the get features method some experimentation was done in terms of number of bins, size of features and size of the blocks within each feature. For this a feature width of 16 seemed ideal because going down was often too little information and going up to a value like 32 resulted in two much information and often the blending of multiple features. Furthermore, the number of blocks and bins produced better results but at the cost of speed. Each extra block and bin resulted in more computations in this step and in the following matching phase which slowed down the whole pipeline. In the end I decided to go with 16 orientations instead of 8 though.

Match Features

Here match features was done pretty much by the book in terms of the ratio test. Each feature from the first image was compared against each from the second and their standard_deviations were taken. Then the ratio between the first and second smallest standard_deviations was exammined to see if the two matching features and a stronger correspondence then other near matches. If this was a match then it was added to matches. Later distances were also factored in as described below.

Sample Code from Match Features


            for i=1:1:num_features1(1)
		    current_feature1 = features_1(i,:);
		    ssd = zeros(num_features2(1));
		    for j=1:1:num_features2(1)
			    current_feature2 = features_2(j,:);
			    differences = current_feature1 - current_feature2;
			    ssd(j) = sum(differences);
			    end
			    
			    [ssd,ssd_ind] = sort(ssd,'ascend');
			    ratio = ssd(1) / ssd(2);
			    euclidean = ((xf1(i) - xf2(ssd_ind(2)))^2) + ((yf1(ssd_ind(1)) - yf2(ssd_ind(2)))^2);
			    if ratio < .8
				if euclidean < 100
					num_matches = num_matches +1;
					matches(num_matches,:) =  [i,ssd_ind(1)];
					confidences(num_matches,:) = ratio;
				end
			   end
		    end
            end
                
            

Bells and Whistles: Match Features

Here it seemed that matching was the place to make the most signifigant improvements on the sucess of the pipeline. Here two improvements were made (both of which did not work for all tests). For starters the euclidean distances were taken between the current_feature1 and the top two matches. Here two tests were made. One was to see the ratio between these distancess which produced some positive results but not signifigant ones. The other was to make sure the offset on the final match was below a certain level. While this weak verification produced great results for the Notre Dame images it did not fair as well when the images were more drastically different such that the features did not match up well if the images were over layed onto eachtoher.

Matching Results

Notre Dame

Sample of Notre Dame Image

Notre Dame Image Interest Points

Non-maximum Suppression with Colfilt

Notre Dame Corresponding Points

Notre Dame Evaluation: Uses 16 directionals bin and threshold for non-maximum supression of .003.

After tweaking the gaussians, non-maximum supression method to colfilt and the threshold the Notre Dame image produced excellent top 100 results. It seems that from image to image all that needs to be chagned are these values because matching stays the same and getting features seems to always be better with 16 bins and without gradient weighting.

Rushmore

Non-maximum Suppression with Colfilt

Image Comparison

Here we see that although a person would pick up on the faces, the most "corner" features are the rocks and trees. While the scale is different it is easy to tell that there are clear correspondences that have been picked out.

Pantheon

Non-maximum Suppression with Colfilt

Image Comparison

Although there is different lighting and slightly different perspective the feature matching clearly picks up on the top of the pantheon and the nearby structure creating a very distinct connection between the images.

House

Non-maximum Suppression with Colfilt

Image Comparison