Bag of SIFT representation and nearest neighbor classifier

Implementation

Features were extracted using a bag of SIFT representation. First we built a vocabulary by taking a subset of sift features of a sampling from the training images, then did k means samplings on this list of features to make K (I used 400 to balance time and accuracy) centroids of features. However, the actual features for each image are a histogram of samplings of sift features for that image binned into the nearest neighbor vocabulary word (centroid). Each test image was classified using a K nearest neighbors classifier for the bag of sifts features. Accuracy was optimized with K=4.

Results


Accuracy (mean of diagonal of confusion matrix) is 0.567

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.390
Industrial

Industrial

Bedroom

Office
Store 0.510
TallBuilding

TallBuilding

LivingRoom

Industrial
Bedroom 0.320
OpenCountry

Kitchen

LivingRoom

Office
LivingRoom 0.340
Bedroom

Store

Office

Office
Office 0.820
Kitchen

Industrial

Kitchen

Kitchen
Industrial 0.410
TallBuilding

Store

Kitchen

Store
Suburb 0.930
OpenCountry

Mountain

LivingRoom

Bedroom
InsideCity 0.450
Street

Suburb

Industrial

Street
TallBuilding 0.440
InsideCity

LivingRoom

OpenCountry

Street
Street 0.560
Bedroom

InsideCity

Mountain

Suburb
Highway 0.800
Coast

OpenCountry

Street

Suburb
OpenCountry 0.510
Street

Bedroom

Highway

Forest
Coast 0.560
Highway

OpenCountry

Highway

OpenCountry
Mountain 0.520
OpenCountry

Coast

Forest

Suburb
Forest 0.940
Industrial

OpenCountry

Suburb

Suburb
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label