CS 143 / Project 3 / Scene Recognition with Bag of Words

I created a basic implementation of the tiny image/nearest neighbor pipeline, the SIFT feature/nearest neighbor pipeline, and the SIFT feature/SVM classifier pipeline. The tiny image/nearest neighbor method resulted in an accuracy within the expected range: Accuracy is 0.205.

The confusion matrix of the tiny image representation and nearest neighbor classifier. Accuracy is 20.5%

The SIFT feature representation and nearest neighbor classifier performed slightly worse than expected. Without any of the extra credit methods, its accuracy was 38.8%, which is still significantly better than the tiny image method. I created the vocabulary that bag of SIFT uses to form a histogram with a sampling of SIFT features from all 1500 images, with a step size of 50, also using the fast parameter. This resulted in roughly 200 features per image, for a total 300,000 features in the vocabulary. This data was then used to create 400 kmeans clusters. My bag of SIFT method sampled from the test images with a step size of 20, and compared each new feature to the vocabulary and create a normalized histogram for each image. The histogram results were then run through the nearest neighbors classifier.

The confusion matrix of the SIFT representation and nearest neighbor classifier. Accuracy is 38.8%

Finally, the SIFT feature representation and SVM classifier performed better than the previous two methods, though still slightly out of the 60-70% range. Fiddling with parameters improved the result up to 53-56% accuracy. This example had an Accuracy of 54.7%, using a lambda value of 0.001. Interestingly, these accuracy values seemed to vary slightly between trials, without any change of parameters.

CS 143 Project 3 results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.547

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.370
InsideCity
Industrial
Street
TallBuilding

Store 0.390
LivingRoom
Street
LivingRoom
Mountain

Bedroom 0.220
Kitchen
InsideCity
Kitchen
Office

LivingRoom 0.140
Kitchen
Bedroom
Office
Bedroom

Office 0.770
Bedroom
Bedroom
Kitchen
Bedroom

Industrial 0.290
Store
Bedroom
Store
InsideCity

Suburb 0.930
Street
Mountain
Mountain
Mountain

InsideCity 0.500
Highway
Industrial
TallBuilding
Suburb

TallBuilding 0.650
Industrial
InsideCity
Store
Bedroom

Street 0.480
TallBuilding
InsideCity
InsideCity
Store

Highway 0.690
Bedroom
Street
Coast
Coast

OpenCountry 0.210
Industrial
Industrial
Coast
Coast

Coast 0.830
OpenCountry
OpenCountry
OpenCountry
Suburb

Mountain 0.800
OpenCountry
Industrial
Forest
Forest

Forest 0.930
OpenCountry
OpenCountry
Mountain
Mountain

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label
Kitchen	0.370					InsideCity	Industrial	Street	TallBuilding
Store	0.390					LivingRoom	Street	LivingRoom	Mountain
Bedroom	0.220					Kitchen	InsideCity	Kitchen	Office
LivingRoom	0.140					Kitchen	Bedroom	Office	Bedroom
Office	0.770					Bedroom	Bedroom	Kitchen	Bedroom
Industrial	0.290					Store	Bedroom	Store	InsideCity
Suburb	0.930					Street	Mountain	Mountain	Mountain
InsideCity	0.500					Highway	Industrial	TallBuilding	Suburb
TallBuilding	0.650					Industrial	InsideCity	Store	Bedroom
Street	0.480					TallBuilding	InsideCity	InsideCity	Store
Highway	0.690					Bedroom	Street	Coast	Coast
OpenCountry	0.210					Industrial	Industrial	Coast	Coast
Coast	0.830					OpenCountry	OpenCountry	OpenCountry	Suburb
Mountain	0.800					OpenCountry	Industrial	Forest	Forest
Forest	0.930					OpenCountry	OpenCountry	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label

This performed best for images of offices, the suburbs, tall buildings, the highway, the coast, mountains, and forests. It performed worst for open country, industrial scenes, living rooms, and bedrooms. With the exception of the open country, it performed very well on outdoor scenes, and not nearly as well on indoor rooms. I would guess that this would be helped by some sort of implementation of spatial features, keeping track of where features are in relation to each other. Without this, I would guess that rooms with an abundance of man-made objects look similar to each other.

Christine Whalen (cgwhalen)

CS 143 / Project 3 / Scene Recognition with Bag of Words

CS 143 Project 3 results visualization