Project 3: Scene Recognition with Bag of Words

Brian Thomas

Baseline Version

Following the instructions for the "baseline" scene recognition, the results below were obtained:

Accuracy:

An accuracy of 0.6620 was obtained.

Different vocabulary sizes

I examined the effect of different vocabulary sizes on performance. The sizes 10, 20, 50, 100, 200, 500, 1000, and 2000 were tried. Their respective accuracies were as follows:

Vocab size Accuracy

10 0.4760

20 0.5393

50 0.5947

100 0.6153

200 0.6680

500 0.6620

1000 0.6073

2000 0.5227

The confusion matrix images were lost due to not saving during the run and MATLAB being killed.

Soft assignment (kernel codebook encoding)

I also tried measuring performance using soft assignment (kernel codebook encoding). In these experiments, gamma was chosen as 10^-4. This produced "soft" looking results while still enabling one to know what the hard selection would have been (because hard selections were 1-2 orders of magnitude larger). However, the accuracy decreased to 0.5307 for a vocabulary size of 200. This could be either due to the choice of gamma or because a different vocabulary size is optimal for soft assignments.

Vocab size	Accuracy
10	0.4760
20	0.5393
50	0.5947
100	0.6153
200	0.6680
500	0.6620
1000	0.6073
2000	0.5227