CS 143 - Project 3: Scene recognition with bag of words

Arman Uguray (auguray)

In this project, we implemented a classifier using the bag of words method. The algorithm can be broken down into the following steps:

  1. Given a collection of 1500 words, collect many local features and use k-means clustering to cluster them into a vocabulary of visual words.
  2. For each of the training images build a histogram of the word frequency. We do this by looking at the features in each image, and finding the cluster to which it has the smallest euclidean distance.
  3. Feed these histograms to an SVM.
  4. Build a histogram for test images and classify them with the SVM we trained.

In my implementation of the algorithm, I discard most of the collected features while building the vocabulary to improve the performance of the clustering; instead of using all of the ~1000 collected features, I initially used a randomly picked set of 50. I tried running the algorithm with 500 and 1000 random samples each, to see how this effects accuracy.

The given default number of words in the built vocabulary is 200. I initially ran the algorithm with this vocabulary size. I also tried a vocabulary size of 20 and 1000, to see the effects on accuracy and performance. Due to the slowness of execution, I ran the algorithm only once for each parameter value.

Here are the set of runs that I've executed. The results were very similar overall, with no apparent improvement.

Runs:

  1. Initial run with the default vocabulary size and 50 random features used per image.

    confusion-matrix1
    Vocabulary Size No. Random Features Used Per Image Accuracy Result
    200 50 0.6267
  2. This time, I kept the same vocabulary size but sampled 10 times as many feature points as I did in the previous run. I did not observe any improvement.

    confusion-matrix2
    Vocabulary Size No. Random Features Used Per Image Accuracy Result
    200 500 0.6193
  3. In this run, I kept the initial number of random features and increased the vocabulary size to 1000. The result seemed to be better than the initial result, but only at a small magnitude.

    confusion-matrix3
    Vocabulary Size No. Random Features Used Per Image Accuracy Result
    1000 50 0.6420
  4. In this run, I reduced the size of the vocabulary to 20. This is a very small size, and as expected, the results were not satisfactory. confusion-matrix4
    Vocabulary Size No. Random Features Used Per Image Accuracy Result
    20 50 0.5513