Onur Ulusel Asgn3

CS143 - Introduction to Computer Vision

Onur Ulusel (oulusel)
Oct 26, 2011

Assignment Description

The aim of this assignment is to implement a bag of words model for scene categorization. The baseline algorithm presented in Lazebnik et al. 2006 is used for this assignment.

Implementation

The implementation of the algorithm was done purely in MATLAB. A data set of 15 different scene categories and 200 images per category (100 for training and 100 for test) is used. First, local features from the training set is collected and clustered (using k-means) in to a vocabulary of varying sizes. Then each training image will be represented as a distribution of visual words using histogram encoding.

The steps after this were given to us in a stencil code. 1-vs-all classifiers are trained for each scene category. Then each test image is histogram encoded and queried for each scene category.

Results

One of the parameters affecting the accuracy of the classification is the vocabulary size used. The following table and graph shows the accuracy obtained for different vocabulary sizes. The y-axis of the given graph is displayed in logarithmic scale.

Vocabulary Size Accuracy
10 0.4307
20 0.5107
50 0.5707
100 0.6093
200 0.6293
400 0.6247
1000 0.6060
10000 0.5707



The accuracy of the model seems to be highest with a vocabulary size of 200, and it gets worse when the size is increased or decreased. The lower sizes must be oversimplyfying the features, and not giving a mean value for each distinct feature group. The higher sizes must be too soft in clustering the features and therefore leaving too many feature groups. This would result in not accumulating enough features per bin during histogram encoding.

The best accuracy was obtained using the vocabulary size 200. The image of the confusion matrix obtained from that run is given below.