CS 143 / Project 3 / Scene Recognition with Bag of Words

Get Tiny Images

For this algorithm I simply set the size of the downscaled image to 16 (for a 16 x 16 image), set up an empty matrix to fill with the vector descriptors, and then looped through each image path and used imresize to shrink them to 16x16. Then I normalized them with the double(vect)/norm(double(vect)) function, and reconfigured each image to be a vector rather than a matrix.

Nearest Neighbor Classify

In this function, I transpose the training and test features that are passed in, and then use the vl_alldist2 function to find the L2 distance between all the tests and the training. The result was a matrix of distances indexed by the number of the training and test features that were being compared in the matrices that were passed in. I then took those results and found the minimum in each column, which indicated which training data the test matched most similarly, and used those indices to select the correct training labels.

Tiny Image Features + Nearest Neighbor Classifier

The resulting accuracy reported by the program was .20 (20%)

Build Vocabulary

To build the feature vocabulary I loop through each image and use vl_dsift to extract a number of sift features from each image and store them all in a large matrix. I found a step of 20 to be good for accuracy. After amassing sift features I used the vl_kmeans function to find the 400 (vocab_size) centroid features, which make up our vocabulary.

Get Bags of Features

First the vocabulary found in build_vocabulary is loaded. Then for each image, I find many sift features (here I use a smaller step of 5), and use the vl_alldist2 function to determine their distance from each of the vocab features. I create a histogram using the indices of the minimum distances, the places where the features match with vocab features, to end up with a feature that describes how many of our 400 vocal words appear in each image.

Bags of Features + Nearest Neighbor Classifier

The resulting accuracy reported by the program was .443 (44.3%)

SVM Classify

In this last function, I begin by assigning a Lambda value, which I made .00001 as it provided the best results. I loop through the 15 categories, and for each, create a label vector that indicates with binary (1, -1) values whether or not the training labels are this particular category being looked at. Then I use the training features, these new binary labels, and the lambda value, along with the function vl_svmtrain, to construct the W vector and B scalar needed to test against. I then use the W*X + B calculation, where X is our test data, to find the distances above these boundary lines each image falls under for each category. I then find the max of the columns of the resulting matrix of distances, and use those row indices, which correspond to one of the 15 categories, to label all of the test images.

CS 143 Project 3 results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.570

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.730
Store

Store

InsideCity

Office
Store 0.120
Bedroom

Industrial

Kitchen

LivingRoom
Bedroom 0.120
Store

Kitchen

Office

Office
LivingRoom 0.150
Bedroom

Street

TallBuilding

TallBuilding
Office 0.790
Kitchen

InsideCity

Kitchen

Kitchen
Industrial 0.230
Highway

Kitchen

TallBuilding

OpenCountry
Suburb 0.950
Industrial

LivingRoom

TallBuilding

Street
InsideCity 0.330
Store

Kitchen

Street

Kitchen
TallBuilding 0.840
InsideCity

Industrial

Kitchen

Street
Street 0.790
InsideCity

Store

Highway

TallBuilding
Highway 0.820
Coast

OpenCountry

TallBuilding

Street
OpenCountry 0.420
Coast

Coast

Street

TallBuilding
Coast 0.630
OpenCountry

OpenCountry

Office

OpenCountry
Mountain 0.710
Highway

Store

Forest

Street
Forest 0.920
Mountain

Store

Suburb

TallBuilding
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label