For this algorithm I simply set the size of the downscaled image to 16 (for a 16 x 16 image), set up an empty matrix to fill with the vector descriptors, and then looped through each image path and used imresize to shrink them to 16x16. Then I normalized them with the double(vect)/norm(double(vect)) function, and reconfigured each image to be a vector rather than a matrix.
In this function, I transpose the training and test features that are passed in, and then use the vl_alldist2 function to find the L2 distance between all the tests and the training. The result was a matrix of distances indexed by the number of the training and test features that were being compared in the matrices that were passed in. I then took those results and found the minimum in each column, which indicated which training data the test matched most similarly, and used those indices to select the correct training labels.
The resulting accuracy reported by the program was .20 (20%)
To build the feature vocabulary I loop through each image and use vl_dsift to extract a number of sift features from each image and store them all in a large matrix. I found a step of 20 to be good for accuracy. After amassing sift features I used the vl_kmeans function to find the 400 (vocab_size) centroid features, which make up our vocabulary.
First the vocabulary found in build_vocabulary is loaded. Then for each image, I find many sift features (here I use a smaller step of 5), and use the vl_alldist2 function to determine their distance from each of the vocab features. I create a histogram using the indices of the minimum distances, the places where the features match with vocab features, to end up with a feature that describes how many of our 400 vocal words appear in each image.
The resulting accuracy reported by the program was .443 (44.3%)
In this last function, I begin by assigning a Lambda value, which I made .00001 as it provided the best results. I loop through the 15 categories, and for each, create a label vector that indicates with binary (1, -1) values whether or not the training labels are this particular category being looked at. Then I use the training features, these new binary labels, and the lambda value, along with the function vl_svmtrain, to construct the W vector and B scalar needed to test against. I then use the W*X + B calculation, where X is our test data, to find the distances above these boundary lines each image falls under for each category. I then find the max of the columns of the resulting matrix of distances, and use those row indices, which correspond to one of the 15 categories, to label all of the test images.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.730 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Store |
![]() InsideCity |
![]() Office |
Store | 0.120 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Industrial |
![]() Kitchen |
![]() LivingRoom |
Bedroom | 0.120 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Kitchen |
![]() Office |
![]() Office |
LivingRoom | 0.150 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Street |
![]() TallBuilding |
![]() TallBuilding |
Office | 0.790 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() InsideCity |
![]() Kitchen |
![]() Kitchen |
Industrial | 0.230 | ![]() |
![]() |
![]() |
![]() |
![]() Highway |
![]() Kitchen |
![]() TallBuilding |
![]() OpenCountry |
Suburb | 0.950 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() LivingRoom |
![]() TallBuilding |
![]() Street |
InsideCity | 0.330 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Kitchen |
![]() Street |
![]() Kitchen |
TallBuilding | 0.840 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() Industrial |
![]() Kitchen |
![]() Street |
Street | 0.790 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() Store |
![]() Highway |
![]() TallBuilding |
Highway | 0.820 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() OpenCountry |
![]() TallBuilding |
![]() Street |
OpenCountry | 0.420 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() Coast |
![]() Street |
![]() TallBuilding |
Coast | 0.630 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() Office |
![]() OpenCountry |
Mountain | 0.710 | ![]() |
![]() |
![]() |
![]() |
![]() Highway |
![]() Store |
![]() Forest |
![]() Street |
Forest | 0.920 | ![]() |
![]() |
![]() |
![]() |
![]() Mountain |
![]() Store |
![]() Suburb |
![]() TallBuilding |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |