CS 143 / Project 3 / Scene Recognition with Bag of Words

Get Tiny Images

For this algorithm I simply set the size of the downscaled image to 16 (for a 16 x 16 image), set up an empty matrix to fill with the vector descriptors, and then looped through each image path and used imresize to shrink them to 16x16. Then I normalized them with the double(vect)/norm(double(vect)) function, and reconfigured each image to be a vector rather than a matrix.

Nearest Neighbor Classify

In this function, I transpose the training and test features that are passed in, and then use the vl_alldist2 function to find the L2 distance between all the tests and the training. The result was a matrix of distances indexed by the number of the training and test features that were being compared in the matrices that were passed in. I then took those results and found the minimum in each column, which indicated which training data the test matched most similarly, and used those indices to select the correct training labels.

Tiny Image Features + Nearest Neighbor Classifier

The resulting accuracy reported by the program was .20 (20%)

Build Vocabulary

To build the feature vocabulary I loop through each image and use vl_dsift to extract a number of sift features from each image and store them all in a large matrix. I found a step of 20 to be good for accuracy. After amassing sift features I used the vl_kmeans function to find the 400 (vocab_size) centroid features, which make up our vocabulary.

Get Bags of Features

First the vocabulary found in build_vocabulary is loaded. Then for each image, I find many sift features (here I use a smaller step of 5), and use the vl_alldist2 function to determine their distance from each of the vocab features. I create a histogram using the indices of the minimum distances, the places where the features match with vocab features, to end up with a feature that describes how many of our 400 vocal words appear in each image.

Bags of Features + Nearest Neighbor Classifier

The resulting accuracy reported by the program was .443 (44.3%)

SVM Classify

In this last function, I begin by assigning a Lambda value, which I made .00001 as it provided the best results. I loop through the 15 categories, and for each, create a label vector that indicates with binary (1, -1) values whether or not the training labels are this particular category being looked at. Then I use the training features, these new binary labels, and the lambda value, along with the function vl_svmtrain, to construct the W vector and B scalar needed to test against. I then use the W*X + B calculation, where X is our test data, to find the distances above these boundary lines each image falls under for each category. I then find the max of the columns of the resulting matrix of distances, and use those row indices, which correspond to one of the 15 categories, to label all of the test images.

CS 143 Project 3 results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.570

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.730
Store
Store
InsideCity
Office

Store 0.120
Bedroom
Industrial
Kitchen
LivingRoom

Bedroom 0.120
Store
Kitchen
Office
Office

LivingRoom 0.150
Bedroom
Street
TallBuilding
TallBuilding

Office 0.790
Kitchen
InsideCity
Kitchen
Kitchen

Industrial 0.230
Highway
Kitchen
TallBuilding
OpenCountry

Suburb 0.950
Industrial
LivingRoom
TallBuilding
Street

InsideCity 0.330
Store
Kitchen
Street
Kitchen

TallBuilding 0.840
InsideCity
Industrial
Kitchen
Street

Street 0.790
InsideCity
Store
Highway
TallBuilding

Highway 0.820
Coast
OpenCountry
TallBuilding
Street

OpenCountry 0.420
Coast
Coast
Street
TallBuilding

Coast 0.630
OpenCountry
OpenCountry
Office
OpenCountry

Mountain 0.710
Highway
Store
Forest
Street

Forest 0.920
Mountain
Store
Suburb
TallBuilding

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.730			Store	Store	InsideCity	Office
Store	0.120			Bedroom	Industrial	Kitchen	LivingRoom
Bedroom	0.120			Store	Kitchen	Office	Office
LivingRoom	0.150			Bedroom	Street	TallBuilding	TallBuilding
Office	0.790			Kitchen	InsideCity	Kitchen	Kitchen
Industrial	0.230			Highway	Kitchen	TallBuilding	OpenCountry
Suburb	0.950			Industrial	LivingRoom	TallBuilding	Street
InsideCity	0.330			Store	Kitchen	Street	Kitchen
TallBuilding	0.840			InsideCity	Industrial	Kitchen	Street
Street	0.790			InsideCity	Store	Highway	TallBuilding
Highway	0.820			Coast	OpenCountry	TallBuilding	Street
OpenCountry	0.420			Coast	Coast	Street	TallBuilding
Coast	0.630			OpenCountry	OpenCountry	Office	OpenCountry
Mountain	0.710			Highway	Store	Forest	Street
Forest	0.920			Mountain	Store	Suburb	TallBuilding
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Your Name (your cs id)