CS 143 / Project 3 / Scene Recognition with Bag of Words

In this project, I implemented "tiny image" and "bag of sift" for feature description, and "nearest neighbor" and "support vector machine" for recognition. For tiny image with nearest neightbor, the best accuracy is around 22.5%. For bag of sift with nearest neightbor, the best accuracy is around 53%. For bag of sift with support vector machine, the best accuracy is around 67%. In order to increase accuracy, I tuned various parameters like vocabulary size, bin size and step for sift features, and lambda for SVM.

Tiny image + nearest neighbor


Accuracy (mean of diagonal of confusion matrix) is 0.225

Bag of sift + nearest neighbor


Accuracy (mean of diagonal of confusion matrix) is 0.526

Bag of sift + support vector machine


Accuracy (mean of diagonal of confusion matrix) is 0.670

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.600
LivingRoom

Bedroom

TallBuilding

Highway
Store 0.610
OpenCountry

InsideCity

InsideCity

InsideCity
Bedroom 0.470
LivingRoom

Kitchen

Industrial

Kitchen
LivingRoom 0.240
Kitchen

Industrial

Kitchen

Mountain
Office 0.810
Bedroom

LivingRoom

Kitchen

TallBuilding
Industrial 0.530
Street

Highway

Highway

TallBuilding
Suburb 0.950
InsideCity

Highway

Street

Coast
InsideCity 0.580
Industrial

Store

Street

Coast
TallBuilding 0.720
Industrial

Street

Coast

Industrial
Street 0.720
TallBuilding

TallBuilding

InsideCity

Suburb
Highway 0.780
Store

Street

Coast

Coast
OpenCountry 0.450
Bedroom

Forest

Coast

InsideCity
Coast 0.810
Mountain

InsideCity

OpenCountry

OpenCountry
Mountain 0.850
Store

Industrial

Forest

Highway
Forest 0.930
OpenCountry

Mountain

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Experiments with different vocabulary sizes