Bag of SIFT representation and linear SVM classifier

Implementation

Features were extracted using a bag of SIFT representation. First we built a vocabulary by taking a subset of sift features of a sampling from the training images, then did k means samplings on this list of features to make K (I used 400 to balance time and accuracy) centroids of features. However, the actual features for each image are a histogram of samplings of sift features for that image binned into the nearest neighbor vocabulary word (centroid). Each test image was classified using a linear SVM classifier, with optimal lambda being .0001.

Results


Accuracy (mean of diagonal of confusion matrix) is 0.669

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.560
Bedroom

Industrial

Industrial

Bedroom
Store 0.520
Street

TallBuilding

Industrial

Office
Bedroom 0.490
Street

Kitchen

Kitchen

Store
LivingRoom 0.400
Bedroom

Store

Store

Kitchen
Office 0.880
Bedroom

Kitchen

LivingRoom

LivingRoom
Industrial 0.610
Street

TallBuilding

Kitchen

Mountain
Suburb 0.950
OpenCountry

Industrial

LivingRoom

OpenCountry
InsideCity 0.450
Suburb

LivingRoom

Industrial

LivingRoom
TallBuilding 0.720
Street

OpenCountry

Mountain

Kitchen
Street 0.600
TallBuilding

InsideCity

Industrial

Industrial
Highway 0.810
Coast

Industrial

OpenCountry

Street
OpenCountry 0.550
Coast

Coast

Suburb

Coast
Coast 0.800
OpenCountry

OpenCountry

Highway

OpenCountry
Mountain 0.790
Forest

Bedroom

OpenCountry

Bedroom
Forest 0.910
OpenCountry

OpenCountry

Store

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label