Project 3 Scene Recognition with bag of words

Design Overview

1. build_vocabulary.m:
In each loop, I called the vl_dsift to get a set of dense sifts with size 4 and step 8. Then I randomly sampled 100 sift descriptors from the set. In the end, I clustered them with k-means, k = vocab_size.

2. make_hit.m:
I looped through all the sift descriptors of the image, for each of them, find the nearest one in the vocab and add 1 to the row that correspond to the vocab in the histogram.

Below is the performance across different vocab sizes and their confusion matrixes.

vocab size accuracy
20 0.5507
50 0.6100
100 0.6480
200 0.6560
400 0.6630

confusion matrix for vocab size 20


confusion matrix for vocab size 50


confusion matrix for vocab size 100


confusion matrix for vocab size 200


confusion matrix for vocab size 400

login: wchi