Project3 : Scene recognition with bag of words

Jihoon Lee

Overview

The goal of this project was to develop image classifier using visual vocabulary. The project consisted of visual vocabulary building, histogram counting, classifier training, and classifier evaluation. The data set used was 15 scene database which is introduced in the paper Lazebnik et al. 2006.
The result showed that the classifier performed the best(62%) when the number of visual vocabulary was 200 and the accuracy went down as the number of vocab increases.

Algorithms

Visual Vocabulary building

In order to build a visual vocabulary, I tried to collect dense features on images and cluster them using K-mean. For each image, I collect the dense features on image using vl_dsift function with bin size 4 and step size 8. Then, once all descriptors are collected from all image, cluster the features using kmean algorithm.

Histogram Counting

The next step is histogram counting on the training images. First of all, collect the dense features on training image and compute the distance between collected features and visual vocabulary. Then, count the occurrence of visual vocabulary on each image.

SVM training

Once the histogram of training images are ready, the classifier will be trained by feeding SVM.

Evaluation

Evaluate the performance the classifier using test image set and generate the confusion matrix. The diagonal values are the number of correctly classified and the others are mis-guessing.

Result

The result shows that the accuracy of classifier trained using 1500 images and 200 visual vocabulary is around 62%. It well-classified suburb, inside city, and highway but was not able to classify industrial.

Accuracy

Num of Vocab 10 20 50 100 200 400 1000
Accuracy 41.47 0.5180 0.5847 0.6107 0.6227 0.6173 0.6180

Confusin Matrix

Num of Vocab 10 20 50 100
Image
Num of Vocab 200 400 1000
Image