Project 3: Bag of words
Angela Santin

Introduction

Summary

The general flow of this project is as follows:

Notes about my algorithm

My project uses the standard image categories, shown in the image below.



Here is a short description of my algorithm:

Results

I kept the size of the vocabulary constant at 200 words. I started off with 20 images per class and used all of the Sift descriptors to create a vocabulary with Kmeans. The results weren't great and the runtime was excessive. Next, I tried sampling 1 out of every 50 Sift descriptors per image. This reduced the runtime considerably and didn't affect the performance. Below are the results obtained with 20, 50 and 100 images per class.



Vocabulary size: 200
Images per class: 20
Accuracy: 0.5433.

Vocabulary size: 200
Images per class: 50
Accuracy = 0.6227

Vocabulary size: 200
Images per class: 100
Accuracy = 0.6147

Graph comparison

Extra Credit

Experimenting with different vocabulary sizes

From the above tests, I decided to pick vocab size = 200 and 50 training images per class as my "ideal" settings. Next, I experimented with several different vocabulary sizes to measure the performance of my algorithm. I tried out the following vocabulary sizes: 10, 20, 50, 100, 200, 400 and 1000. As mentioned in class, the perfomance did improve with an increase in vocabulary size, but so did the running time. The accuracy results are shown below.

Vocabulary size: 10
Accuracy = 0.4047

Vocabulary size: 20
Accuracy = 0.5267

Vocabulary size: 50
Accuracy = 0.5847

Vocabulary size: 100
Accuracy = 0.6073

Vocabulary size: 200
Accuracy = 0.6147

Vocabulary size: 400
Accuracy = 0.6373

Vocabulary size: 1000
Accuracy = 0.6447

Experimenting with non-linear kernels

Most datasets are not linearly separable. Thanks to the kernel trick, we can send our dataset to a higher dimensional space, where the data "magically" becomes linearly separable. A variety of kernels can be used, and the choice of a kernel depends on dataset. It seems like making the right choice is more of an art than a science: either you have a particular gut feeling, or you should just go for trial and error and hope you get lucky. I wrote up and tried several kernels, specified below. If we let

Radial Basis (gaussian):

Exponential kernel

Cauchy Kernel

Perfomance

Radial Basis
I first used the following parameters : 20 images per class for testing and training, 200 words, and I let sigma equal to 1

With a linear linear SVM and the same parameters I originally got an accuracy of 0.5433 (shown above). Now, the accuracy rose to 0.5500

Next I tried: 50 images, 400 words, and I let sigma equal to 1. I got an accuracy of 0.5827. Originally,with a vocabulary size of 200 and 50 images per class I got an accuracy of 0.6227. The linear SVM outperformed the non-linear one. One of the main reasons is probably related to the fact that sigma has not been finely tuned. Next, I tried different sigmas to see which ones was most suitable for the dataset.

Testing with different sigmas

I reused the 400 words vocabulary with 50 images per class. Using this vocab, I tested the performance of the SVM with the following sigmas: .1, .5, 1.5 and 3 with the hope of identyfing the most suitable value. Below are the accuracy results:
For sigma = .1, accuracy = 0.0973. This is clearly not the choice of sigma.

For sigma = .5, accuracy = 0.2560. Now it's getting better, but it is still underperforming.

For sigma = 1, we already know the accuracy is 0.5827.

For sigma = 1.5, accuracy = 0.6733. This is the best score so far, things are getting exciting!

Let's try incresing sigma further.
For sigma = 2, accuracy = 0.6600

Given this decrease in performance, I estimate that the sweet spot for sigma lies somewhere close 1.5.