CSCI1430 : Project 3 - Scene recognition with a bag words

rdfong

In this project we attempt to train a linear SVM classifer that can determine what class of image an image instance belongs to by feeding the classifer a large amount of training data. The pipeline is as follows:

Collecting Features and creating the vocabulary

  1. First we need to gather a bunch of feature descriptors from a large quantity of images. We do this by running a SIFT feature detector on each image and choosing some number of points from the detected features.
  2. We then need to collapse all of our feature descriptors for all of our selected points into some number of unique feature descriptors, which was done using kmeans.

Classifying images

  1. For each image we need to build a histogram of word frequency. First we collect features from the image once again using the SIFT feature detector.
  2. Next, we find the distances from each feature in the image to each word in the vocabulary. For each feature we add one to the histogram bin that corresponds to the word with the smallest distance to that feature.
  3. We then normalize our histogram values so that the results do not vary based on image size.
  4. Finally we train an SVM by feeding it the normalized histograms.

Analyzing results

Lastly we want to analyze our results by building a confusion matrix. For each class C we run a number of test images that belongs to C through the SVM. We define the result of the SVM to be the class that the SVM gives the highest confidence to.

Each row of the matrix represents a histogram that counts the number of images in our test group for a class C that fall under each class type.

After normalizing each row of the matrix, we would ideally end up with an identity matrix (100% accuracy for each class type). Given the above algorithm here are results, which fall under an acceptable range of accuracy. The accuracy value was determined by taking the average of the diagonal of the confusion matrix.

Note: I ran the program on multiple vocabulary sizes to observe the effect on having more or less words

Confusion Matrices

10 words: 46.73%20 words: 53.73%50 words: 61.13%100 words: 64.07%200 words: 67.07%400 words: 67.47%1000 words: 69.67%

Note that as we increase the vocabulary size our accuracy increases although the rate of increase drops fairly quickly as the size grows.