Project 3: Scene Recognition with Bag of Words

Brian Thomas

Baseline Version

Following the instructions for the "baseline" scene recognition, the results below were obtained:

Accuracy:
An accuracy of 0.6620 was obtained.

Different vocabulary sizes

I examined the effect of different vocabulary sizes on performance. The sizes 10, 20, 50, 100, 200, 500, 1000, and 2000 were tried. Their respective accuracies were as follows:
Vocab sizeAccuracy
100.4760
200.5393
500.5947
1000.6153
2000.6680
5000.6620
10000.6073
20000.5227

The confusion matrix images were lost due to not saving during the run and MATLAB being killed.

Soft assignment (kernel codebook encoding)

I also tried measuring performance using soft assignment (kernel codebook encoding). In these experiments, gamma was chosen as 10^-4. This produced "soft" looking results while still enabling one to know what the hard selection would have been (because hard selections were 1-2 orders of magnitude larger). However, the accuracy decreased to 0.5307 for a vocabulary size of 200. This could be either due to the choice of gamma or because a different vocabulary size is optimal for soft assignments.