Bag of SIFT representation and nearest neighbor classifier

Implementation

Features were extracted using a bag of SIFT representation. First we built a vocabulary by taking a subset of sift features of a sampling from the training images, then did k means samplings on this list of features to make K (I used 400 to balance time and accuracy) centroids of features. However, the actual features for each image are a histogram of samplings of sift features for that image binned into the nearest neighbor vocabulary word (centroid). Each test image was classified using a K nearest neighbors classifier for the bag of sifts features. Accuracy was optimized with K=4.

Results

Accuracy (mean of diagonal of confusion matrix) is 0.567

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.390			Industrial	Industrial	Bedroom	Office
Store	0.510			TallBuilding	TallBuilding	LivingRoom	Industrial
Bedroom	0.320			OpenCountry	Kitchen	LivingRoom	Office
LivingRoom	0.340			Bedroom	Store	Office	Office
Office	0.820			Kitchen	Industrial	Kitchen	Kitchen
Industrial	0.410			TallBuilding	Store	Kitchen	Store
Suburb	0.930			OpenCountry	Mountain	LivingRoom	Bedroom
InsideCity	0.450			Street	Suburb	Industrial	Street
TallBuilding	0.440			InsideCity	LivingRoom	OpenCountry	Street
Street	0.560			Bedroom	InsideCity	Mountain	Suburb
Highway	0.800			Coast	OpenCountry	Street	Suburb
OpenCountry	0.510			Street	Bedroom	Highway	Forest
Coast	0.560			Highway	OpenCountry	Highway	OpenCountry
Mountain	0.520			OpenCountry	Coast	Forest	Suburb
Forest	0.940			Industrial	OpenCountry	Suburb	Suburb
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label