CS 143 Project

CS 143 / Project 3 / Scene Recognition with Bag of Words

Implementation

I stuck to baseline implementations for all parts of the project. The only somewhat interesting thing I did was play around with my input parameters (alpha for vl_svmtrain, step and size for vl_dsift) to get better results.

First, for tiny image features, I simply just called imresize to resize each image to 16x16 and then made the image zero mean by subtracting the mean from the image. For nearest neighbors, I only did 1 nearest neighbor and pretty much let vl_alldist2 do all of the work.

For my bag of SIFT representation, I built my vocabulary by finding SIFT features and then clustering them with k-means, where k was the vocabulary size. I then would take input images and determine which cluster center it was closest to and assign that "vocab word" to the image.

Finally, for my linear SVM classifier, I first found all of the 1-vs-many SVM's (one for each category), and then ran test image features against each SVM. Whichever SVM scored highest for the test image feature would specify which category the image feature belonged to.

Results

Using tiny image features and the nearest neighbor classifier, I had an accuracy of 0.204, or 20.4%. Below is the resulting confusion matrix.

Using bag of SIFT features and the nearest neighbor classifier, I had an accuracy of 0.517, or 51.7%. This takes about 10 minutes to run if the vocab.mat file is already built (or at least this is how long it takes on my Macbook Air). Below is the resulting confusion matrix.

Using bag of SIFT features and the linear SVM classifier, I had an accuracy of 0.658, or 65.8%. This takes roughly the same time as the previous. Below is the resulting confusion matrix and table of classifier results.

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.470			LivingRoom	LivingRoom	Office	Store
Store	0.550			Street	TallBuilding	Industrial	Office
Bedroom	0.490			Office	LivingRoom	Store	LivingRoom
LivingRoom	0.360			Industrial	Store	Bedroom	Bedroom
Office	0.780			LivingRoom	Store	LivingRoom	Kitchen
Industrial	0.460			OpenCountry	Street	OpenCountry	LivingRoom
Suburb	0.920			InsideCity	InsideCity	LivingRoom	TallBuilding
InsideCity	0.510			Street	Street	Kitchen	TallBuilding
TallBuilding	0.690			InsideCity	Industrial	Industrial	LivingRoom
Street	0.710			Mountain	InsideCity	TallBuilding	Store
Highway	0.800			Street	Industrial	Bedroom	Coast
OpenCountry	0.490			Industrial	TallBuilding	Coast	LivingRoom
Coast	0.820			InsideCity	OpenCountry	InsideCity	Bedroom
Mountain	0.860			Store	TallBuilding	Suburb	Suburb
Forest	0.960			InsideCity	TallBuilding	TallBuilding	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Alec Lee(al63)

CS 143 / Project 3 / Scene Recognition with Bag of Words

Implementation

Results