The goal of this project is scene recognition. There are two parts in the algorithm: getting features and classifying. The following list shows the strategies I used in this project. Training data were classified to 15 categories.
The tiny image method is simple. It resizes each training image to 16*16, subtracts mean and divides by norm. This is the baseline implementation for features.
The bag of words method first calculates vocabulary and builds a hitogram base on the vocabulary. To build a vocabulary, extract SIFT features from all training images and find the kmean of them. Use this vocabulary, for each image, extract more SIFT features and classify all SIFT features base on the vocabulary by finding the nearest neighbour kmeans centroid for each SIFT feature. The histogram represents how many SIFT features fall in one cluster.
The nearest neighbour classifying method takes in the features from both training and testing images, finds the nearest training sample for every testing image.
The 1-vs-all linear support vector machine method trains the machine to recognise "forest" vs "non-forest", "bedroom" vs "non-bedroom". Each test cases will be evaluated with all 15 classifiers and the result will be the classifier that returns the highest score.
Here is the list of accuracy of the algorithm (with vocabulary size 400):
There are many free parameters in this project. I experimented with LAMBDA in SVM from 0.00001 to 10. I found that 0.00001 gives me the best result with vocabulary size 50.
I also experimented with the size of vocabulary. I will report the result at the end of this webpage.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.080 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Office |
![]() TallBuilding |
![]() Forest |
Store | 0.020 | ![]() |
![]() |
![]() |
![]() |
![]() Office |
![]() TallBuilding |
![]() Forest |
![]() Bedroom |
Bedroom | 0.180 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Office |
![]() Kitchen |
![]() Mountain |
LivingRoom | 0.100 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() TallBuilding |
![]() Suburb |
![]() TallBuilding |
Office | 0.180 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Forest |
![]() Industrial |
![]() Mountain |
Industrial | 0.130 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() InsideCity |
![]() Street |
![]() Suburb |
Suburb | 0.370 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() Street |
![]() Coast |
![]() Street |
InsideCity | 0.060 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Mountain |
![]() Industrial |
![]() Suburb |
TallBuilding | 0.220 | ![]() |
![]() |
![]() |
![]() |
![]() Street |
![]() Office |
![]() Suburb |
![]() Street |
Street | 0.420 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Mountain |
![]() Industrial |
![]() Forest |
Highway | 0.560 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() LivingRoom |
![]() OpenCountry |
![]() Coast |
OpenCountry | 0.350 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() Highway |
![]() Coast |
![]() Highway |
Coast | 0.390 | ![]() |
![]() |
![]() |
![]() |
![]() Forest |
![]() Highway |
![]() Highway |
![]() Forest |
Mountain | 0.180 | ![]() |
![]() |
![]() |
![]() |
![]() Street |
![]() Bedroom |
![]() Highway |
![]() Coast |
Forest | 0.130 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Coast |
![]() Industrial |
![]() Street |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.420 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() InsideCity |
![]() LivingRoom |
![]() Bedroom |
Store | 0.560 | ![]() |
![]() |
![]() |
![]() |
![]() Street |
![]() Suburb |
![]() Forest |
![]() Bedroom |
Bedroom | 0.290 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() LivingRoom |
![]() Kitchen |
![]() Industrial |
LivingRoom | 0.250 | ![]() |
![]() |
![]() |
![]() |
![]() Office |
![]() Store |
![]() Office |
![]() TallBuilding |
Office | 0.820 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Store |
![]() Kitchen |
![]() InsideCity |
Industrial | 0.350 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() TallBuilding |
![]() Bedroom |
![]() LivingRoom |
Suburb | 0.870 | ![]() |
![]() |
![]() |
![]() |
![]() Forest |
![]() OpenCountry |
![]() Highway |
![]() Store |
InsideCity | 0.290 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Street |
![]() Suburb |
![]() Industrial |
TallBuilding | 0.450 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() OpenCountry |
![]() Industrial |
![]() Coast |
Street | 0.520 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() TallBuilding |
![]() Store |
![]() Highway |
Highway | 0.790 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Bedroom |
![]() Coast |
![]() InsideCity |
OpenCountry | 0.430 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Coast |
![]() TallBuilding |
![]() Highway |
Coast | 0.600 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Industrial |
![]() OpenCountry |
![]() Highway |
Mountain | 0.600 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() OpenCountry |
![]() Forest |
![]() Forest |
Forest | 0.920 | ![]() |
![]() |
![]() |
![]() |
![]() Mountain |
![]() Mountain |
![]() OpenCountry |
![]() Suburb |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
Accuracy (mean of diagonal of confusion matrix) is 0.649
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.510 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() Store |
![]() Store |
![]() InsideCity |
Store | 0.510 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Street |
![]() Mountain |
![]() Street |
Bedroom | 0.440 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Kitchen |
![]() Kitchen |
![]() Kitchen |
LivingRoom | 0.350 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Forest |
![]() Bedroom |
![]() TallBuilding |
Office | 0.810 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Bedroom |
![]() LivingRoom |
![]() Suburb |
Industrial | 0.540 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() LivingRoom |
![]() LivingRoom |
![]() InsideCity |
Suburb | 0.910 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Mountain |
![]() Bedroom |
![]() TallBuilding |
InsideCity | 0.550 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() TallBuilding |
![]() LivingRoom |
![]() TallBuilding |
TallBuilding | 0.760 | ![]() |
![]() |
![]() |
![]() |
![]() Suburb |
![]() Mountain |
![]() Store |
![]() Bedroom |
Street | 0.620 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Industrial |
![]() Industrial |
![]() Store |
Highway | 0.800 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() Street |
![]() Street |
![]() Street |
OpenCountry | 0.490 | ![]() |
![]() |
![]() |
![]() |
![]() Mountain |
![]() Forest |
![]() Bedroom |
![]() Coast |
Coast | 0.770 | ![]() |
![]() |
![]() |
![]() |
![]() Mountain |
![]() OpenCountry |
![]() OpenCountry |
![]() Mountain |
Mountain | 0.760 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Street |
![]() Coast |
![]() Coast |
Forest | 0.920 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Bedroom |
![]() LivingRoom |
![]() Mountain |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
EXTRA CREDIT: Below is a graph showing the accuracy with different sizes of vocabulary:
I think when the vocabulary size is relatively small, the change of the size will make a big difference on the accuracy. But as the size grows bigger, the influence of the vocabulary size will become smaller. Then changes in other parameter is needed in order to improve the performance, or new algorithm is needed.