CS 143: Scene recognition with bag of words
This project builds a classifier to categorize images into one of 15 scene types. Classification uses a set of distinct image features, called words, which are searched for in every image. A bag of words model represents an image using a histogram where each histogram bin counts of the frequency of a particular word. The model just counts word occurrences and doesn't store any information about the relative spatial locations of visual words, but still performs surprisingly well. The classifier was trained on 15 different scenes with 100 images each (1500 images total):
Algorithm
The first step is to create the vocabulary of words. The words used were SIFT features 4x4 pixels wide, densely extracted every 8 pixels. I first randomly sampled 400 SIFT features from each image, then ran k-means (with k = 200) on the concatenation of all randomly sampled features in all images. The centers of the clusters from k-means were then used as the words in the vocabulary.
The next step was to define the different scene categories. For each image in the training set, the dense SIFT features are computed again in an identical fashion. Each SIFT feature is then paired with its nearest neighbor in the word vocabulary. I then create a histogram (per image) that counts the number of occurrences of each word in that image.
All 1500 histograms were used as the input to a set of 15 one-vs-all SVMs. I used the provided primal SVM code by Olivier Chapelle with the parameters linear = 1 and lambda = 0.1.
Results
The bag of words classifier was then tested on almost 3000 images from the same 15 categories. The results are visualized in the confusion matrix below, which shows how images are being classified. Correct classifications make the diagonal whiter and incorrect classifications make the other areas whiter.
The confusion matrix
My classifier has an overall accuracy of 61% (where an accuracy of 7% is chance). It especially had trouble with the living room images, which were mostly classified as office, bedroom, or kitchen instead.