CS 143 / Project 3 / Scene Recognition with Bag of Words

The implementation of Bag of SIFT representation and linear SVM classifier can be divided into following parts.

  1. Establishment of a vocabulary of visual words.
  2. Creation of bag of SIFT representation for each image.
  3. Implementation of SVM classifier.

Vocabulary of Visual Words

In this part, for each of the 1500 training images, I set binsize = 8 and step = 8 for the vl_dsift function and randomly selected 200 features. This gave me a total of 300000 features. For the kmeans function I used vocab_size = 400.

Bag of SIFT Representation

In this part, I used two different steps values( 4 and 8 ) in the vl_dsift function for each test images. Then I created and normalized histograms of all 400 vocabulary. For step = 4, the highest acuracy I got is 71.3%, and for step = 8, the highest accuracy I got is 63.5%.

SVM Classifier

In this part, I trained 15 binary, 1-vs-all SVMs. And for each test case, I evaluated it on all 15 classifiers and the classifier which is most confidently positive(using the formula W'*X + B ) "wins". For this part, i tried different LAMBDA values. And the accuary for different LAMBDA values are shown in the following table.

LAMBDAaccuracy
0.010.360
0.0010.475
0.00010.541
0.000010.662
0.0000010.713

Results

My final tuned configurations for each parameter are show in the following table.

parametervalues
bin size (buinding vocab) 8
vocab_size400
sampled features300000
step8
bin size (bag of SIFT) 8
LAMBDA0.000001

With those configurations, the best accuracy is

  1. 0.192 for Tiny image && Nearest Neighbor
  2. 0.587 for Nearest Neighbor && Bag of SIFT
  3. 0.713 for SVM && Bag of SIFT

Tiny Image && Nearest Neighbor


Accuracy (mean of diagonal of confusion matrix) is 0.587

Nearest Neighbor && Bag of SIFT


Accuracy (mean of diagonal of confusion matrix) is 0.587

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.410
LivingRoom

InsideCity

LivingRoom

Store
Store 0.550
Kitchen

Highway

Forest

LivingRoom
Bedroom 0.350
LivingRoom

TallBuilding

LivingRoom

LivingRoom
LivingRoom 0.340
Kitchen

Office

Kitchen

Kitchen
Office 0.870
LivingRoom

InsideCity

Kitchen

LivingRoom
Industrial 0.230
Store

TallBuilding

Highway

InsideCity
Suburb 0.910
Industrial

Bedroom

Street

LivingRoom
InsideCity 0.370
Street

Store

Kitchen

LivingRoom
TallBuilding 0.410
InsideCity

Bedroom

Store

Store
Street 0.650
InsideCity

LivingRoom

Store

InsideCity
Highway 0.780
Coast

OpenCountry

Forest

Street
OpenCountry 0.620
Industrial

Coast

Bedroom

Coast
Coast 0.670
OpenCountry

OpenCountry

Office

OpenCountry
Mountain 0.690
OpenCountry

OpenCountry

Forest

Forest
Forest 0.950
Industrial

Bedroom

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

SVM && Bag of SIFT


Accuracy (mean of diagonal of confusion matrix) is 0.713

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.570
Bedroom

Bedroom

Store

Industrial
Store 0.580
Kitchen

Highway

InsideCity

Suburb
Bedroom 0.500
Kitchen

Kitchen

Office

Industrial
LivingRoom 0.420
Kitchen

Industrial

Bedroom

Store
Office 0.890
LivingRoom

InsideCity

Kitchen

Kitchen
Industrial 0.680
Kitchen

InsideCity

InsideCity

Highway
Suburb 0.990
Mountain

Coast

InsideCity
InsideCity 0.700
TallBuilding

Street

LivingRoom

Store
TallBuilding 0.770
Industrial

InsideCity

Industrial

Coast
Street 0.650
Forest

InsideCity

InsideCity

InsideCity
Highway 0.820
Industrial

InsideCity

Coast

LivingRoom
OpenCountry 0.530
Coast

Bedroom

Coast

Coast
Coast 0.790
TallBuilding

OpenCountry

OpenCountry

Highway
Mountain 0.870
OpenCountry

TallBuilding

Forest

Forest
Forest 0.930
OpenCountry

Highway

OpenCountry

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label