To classify various scenes by using bag of words model to train the classifier and to test on the 15 scene database.
The bag of words model is a popular technique for image classification inspired by models used in natural language processing. It ignores or downplays word arrangement (spatial information in the image) and classifies based only on a histogram of the frequency of visual words. Visual words are identified by clustering a large corpus of example features. The baseline of this technique is discussed in the Beyond Bags of Features by Lazebnik et al. 2006.
The general steps of the bag of words are:
The steps of this project is:
The features that we will be extracting from images are the SIFT features. SIFT stands for scale-invariant feature transform and these features are tolerant to image noise, changes in illumination, uniform scaling, rotation, and minor changes in viewing direction. The features were extracted in the form of a regular grid.
Once the features are extracted from all training images, we cluter the features into 200 vocabulary words using k-means.
Converting all training images into the histogram representation is done by figuring out the frequency of visual words in each training image.
Then the SVM is trained with the generated histograms and one can classify each test image and build a confusion matrix by comparing the histogram of the test image to the histogram of the vocabulary words
The following are the results of varying the size and step values while generating the SIFT features.
Size = 4, Step = 8, Accuracy = 0.6353
Size = 8, Step = 16, Accuracy = 0.6380
Size = 4, Step = 16, Accuracy = 0.5920