Bag of SIFT representation and linear SVM classifier

Implementation

Features were extracted using a bag of SIFT representation. First we built a vocabulary by taking a subset of sift features of a sampling from the training images, then did k means samplings on this list of features to make K (I used 400 to balance time and accuracy) centroids of features. However, the actual features for each image are a histogram of samplings of sift features for that image binned into the nearest neighbor vocabulary word (centroid). Each test image was classified using a linear SVM classifier, with optimal lambda being .0001.

Results

Accuracy (mean of diagonal of confusion matrix) is 0.669

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.560			Bedroom	Industrial	Industrial	Bedroom
Store	0.520			Street	TallBuilding	Industrial	Office
Bedroom	0.490			Street	Kitchen	Kitchen	Store
LivingRoom	0.400			Bedroom	Store	Store	Kitchen
Office	0.880			Bedroom	Kitchen	LivingRoom	LivingRoom
Industrial	0.610			Street	TallBuilding	Kitchen	Mountain
Suburb	0.950			OpenCountry	Industrial	LivingRoom	OpenCountry
InsideCity	0.450			Suburb	LivingRoom	Industrial	LivingRoom
TallBuilding	0.720			Street	OpenCountry	Mountain	Kitchen
Street	0.600			TallBuilding	InsideCity	Industrial	Industrial
Highway	0.810			Coast	Industrial	OpenCountry	Street
OpenCountry	0.550			Coast	Coast	Suburb	Coast
Coast	0.800			OpenCountry	OpenCountry	Highway	OpenCountry
Mountain	0.790			Forest	Bedroom	OpenCountry	Bedroom
Forest	0.910			OpenCountry	OpenCountry	Store	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label