CS 143 / Project 3 / Scene Recognition with Bag of Words

Implementation

I stuck to baseline implementations for all parts of the project. The only somewhat interesting thing I did was play around with my input parameters (alpha for vl_svmtrain, step and size for vl_dsift) to get better results.

First, for tiny image features, I simply just called imresize to resize each image to 16x16 and then made the image zero mean by subtracting the mean from the image. For nearest neighbors, I only did 1 nearest neighbor and pretty much let vl_alldist2 do all of the work.

For my bag of SIFT representation, I built my vocabulary by finding SIFT features and then clustering them with k-means, where k was the vocabulary size. I then would take input images and determine which cluster center it was closest to and assign that "vocab word" to the image.

Finally, for my linear SVM classifier, I first found all of the 1-vs-many SVM's (one for each category), and then ran test image features against each SVM. Whichever SVM scored highest for the test image feature would specify which category the image feature belonged to.

Results

Using tiny image features and the nearest neighbor classifier, I had an accuracy of 0.204, or 20.4%. Below is the resulting confusion matrix.

Using bag of SIFT features and the nearest neighbor classifier, I had an accuracy of 0.517, or 51.7%. This takes about 10 minutes to run if the vocab.mat file is already built (or at least this is how long it takes on my Macbook Air). Below is the resulting confusion matrix.

Using bag of SIFT features and the linear SVM classifier, I had an accuracy of 0.658, or 65.8%. This takes roughly the same time as the previous. Below is the resulting confusion matrix and table of classifier results.

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.470
LivingRoom

LivingRoom

Office

Store
Store 0.550
Street

TallBuilding

Industrial

Office
Bedroom 0.490
Office

LivingRoom

Store

LivingRoom
LivingRoom 0.360
Industrial

Store

Bedroom

Bedroom
Office 0.780
LivingRoom

Store

LivingRoom

Kitchen
Industrial 0.460
OpenCountry

Street

OpenCountry

LivingRoom
Suburb 0.920
InsideCity

InsideCity

LivingRoom

TallBuilding
InsideCity 0.510
Street

Street

Kitchen

TallBuilding
TallBuilding 0.690
InsideCity

Industrial

Industrial

LivingRoom
Street 0.710
Mountain

InsideCity

TallBuilding

Store
Highway 0.800
Street

Industrial

Bedroom

Coast
OpenCountry 0.490
Industrial

TallBuilding

Coast

LivingRoom
Coast 0.820
InsideCity

OpenCountry

InsideCity

Bedroom
Mountain 0.860
Store

TallBuilding

Suburb

Suburb
Forest 0.960
InsideCity

TallBuilding

TallBuilding

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label