Oh La La~~~ Scene Recognition


1. Intro This assignment is about scene recognition using bag of words as features. Bag of words is a common technique used in natural language processing, and surprisingly is also a quite popular method used in image classification which, in our case in particular, includes scene recognition.

2.Building visual words vocabulary When extracting SIFT descriptors for each image, I randomly chose between two adjacent SIFT descriptors to downsize the scale of the size of the SIFT descriptors to be classified into vocabulary. Iterate over the entire training image set and then use all the SIFT descriptors to get the vocabulary. For cross-validation, the size of the vocabulary is set to 400. (I tried 800 but it took sooooo long so I had to give up.)

3.Making histogram and dividing the training set After I obtained the vocabulary, I then computed the histogram for each training image. And then divide the training set into 10 different subsets with each subset containing 100 different images. Each image only appears in the total of the 10 subsets at most once. I didn't divide the testing set in the same way, though, because doing that seems rather unintuitive.

4.Result The first experiment was not done using cross-validation. The vocab size is set to 200, the SIFT descriptors of each image is randomly chosen in every three adjacent descriptors. The accuracy is 0.6100.


Then I increased the vocab size to 400, and increased the sub sample rate of the SIFT descriptor of each image to once every two adjacent SIFT descriptors. And I used the cross-validation method discussed in the previous section.

Back to top!

Tan "Charles" Zhang