I created a basic implementation of the tiny image/nearest neighbor pipeline, the SIFT feature/nearest neighbor pipeline, and the SIFT feature/SVM classifier pipeline. The tiny image/nearest neighbor method resulted in an accuracy within the expected range: Accuracy is 0.205.
The confusion matrix of the tiny image representation and nearest neighbor classifier. Accuracy is 20.5%
The SIFT feature representation and nearest neighbor classifier performed slightly worse than expected. Without any of the extra credit methods, its accuracy was 38.8%, which is still significantly better than the tiny image method. I created the vocabulary that bag of SIFT uses to form a histogram with a sampling of SIFT features from all 1500 images, with a step size of 50, also using the fast parameter. This resulted in roughly 200 features per image, for a total 300,000 features in the vocabulary. This data was then used to create 400 kmeans clusters. My bag of SIFT method sampled from the test images with a step size of 20, and compared each new feature to the vocabulary and create a normalized histogram for each image. The histogram results were then run through the nearest neighbors classifier.
The confusion matrix of the SIFT representation and nearest neighbor classifier. Accuracy is 38.8%
Finally, the SIFT feature representation and SVM classifier performed better than the previous two methods, though still slightly out of the 60-70% range. Fiddling with parameters improved the result up to 53-56% accuracy. This example had an Accuracy of 54.7%, using a lambda value of 0.001. Interestingly, these accuracy values seemed to vary slightly between trials, without any change of parameters.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.370 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() Industrial |
![]() Street |
![]() TallBuilding |
Store | 0.390 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Street |
![]() LivingRoom |
![]() Mountain |
Bedroom | 0.220 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() InsideCity |
![]() Kitchen |
![]() Office |
LivingRoom | 0.140 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Bedroom |
![]() Office |
![]() Bedroom |
Office | 0.770 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Bedroom |
![]() Kitchen |
![]() Bedroom |
Industrial | 0.290 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Bedroom |
![]() Store |
![]() InsideCity |
Suburb | 0.930 | ![]() |
![]() |
![]() |
![]() |
![]() Street |
![]() Mountain |
![]() Mountain |
![]() Mountain |
InsideCity | 0.500 | ![]() |
![]() |
![]() |
![]() |
![]() Highway |
![]() Industrial |
![]() TallBuilding |
![]() Suburb |
TallBuilding | 0.650 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() InsideCity |
![]() Store |
![]() Bedroom |
Street | 0.480 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() InsideCity |
![]() InsideCity |
![]() Store |
Highway | 0.690 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() Street |
![]() Coast |
![]() Coast |
OpenCountry | 0.210 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Industrial |
![]() Coast |
![]() Coast |
Coast | 0.830 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() OpenCountry |
![]() Suburb |
Mountain | 0.800 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Industrial |
![]() Forest |
![]() Forest |
Forest | 0.930 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() Mountain |
![]() Mountain |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
This performed best for images of offices, the suburbs, tall buildings, the highway, the coast, mountains, and forests. It performed worst for open country, industrial scenes, living rooms, and bedrooms. With the exception of the open country, it performed very well on outdoor scenes, and not nearly as well on indoor rooms. I would guess that this would be helped by some sort of implementation of spatial features, keeping track of where features are in relation to each other. Without this, I would guess that rooms with an abundance of man-made objects look similar to each other.