1. build_vocabulary.m:
In each loop, I called the vl_dsift to get a set of dense sifts with size 4 and step 8. Then I randomly sampled 100 sift descriptors from the set. In the end, I clustered them with k-means, k = vocab_size.
2. make_hit.m:
I looped through all the sift descriptors of the image, for each of them, find the nearest one in the vocab and add 1 to the row that correspond to the vocab in the histogram.
Below is the performance across different vocab sizes and their confusion matrixes.
vocab size | accuracy |
---|---|
20 | 0.5507 |
50 | 0.6100 |
100 | 0.6480 |
200 | 0.6560 |
400 | 0.6630 |
confusion matrix for vocab size 20
confusion matrix for vocab size 50
confusion matrix for vocab size 100
confusion matrix for vocab size 200
confusion matrix for vocab size 400
login: wchi