Scene Recognition

by Eli Bosworth (eboswort)


First I gathered a large group of visual words (SIFT descriptors) from an assortment of images and ran them through k-means to choose a vocabulary of 200 words to use to classify images.

Next I gather visual words from each training image and generate of histogram of the occurrences of the 200 visual words. These histograms were used to train an SVM to be able to sort images into the categories defined by the training images

Then I used my SVM on the test images and checked to see how many I got right.

I tried using an assortment of different vocab sizes to see how it affected accuracy. Accuracy seemed to grow with larger vocabs, but that growth started to fall off as the vocab's got bigger

Vocab size of 10, accuracy 42.87%:

Vocab size of 50, accuracy 58.00%

Vocab size of 200, accuracy 63.53%

Vocab size of 200, accuracy 63.53%