CS 143 / Project 3 / Scene Recognition with Bag of Words

Project Description

This project consists of three levels of scene recognition, utilizing both a tiny image representation and bag of sift representation of our test images, as well as a nearest neighbor search and linear SVM for classification. The accuracy of these methods is evaluated through a dataset with 15 distinct categories, with 100 images in each category. The results from each descriptor and classifier can be seen below in the form of a confusion matrix and accuracy table.

Algorithm Descriptions

Tiny Images

The Tiny Images algorithm simply uses the built in resizing capability of MATLAB to condense images to a 16 by 16 matrix, which is then resized into a column vector in order to be used as a scene descriptor. This method is quite fast, although the resizing is results in large amounts of loss from the original data, making it overall a poor representation of a scene for recognition purposes.

Bags of SIFT Features

This classification technique extracts SIFT features from each image and then finds the most similar descriptor in a visual "vocabulary" which is created as the result of a kmeans clustering opperation. These features are used to describe a given image within the bounds of a limited vocabulary. This technique results in much less loss of information than tiny images, although it is much slower to compute and is still somehwat limited by the size of the vocabulary.

Nearest Neighbor Classification

The nearest neighbor classifier simply takes the cescription of a scene (Bags of SIFT or Tiny Images) and calculates the "closest" euclidean match to the training data. This is simple metric to measure the fit of a a test image to a training label, however it is prone to being thrown off by features that are generally non-distinctive.

Linear SVM Classifier

The Linear SVM implemented is a simple one verses all implementation, which trains (in this case) 15 separate SVMs, one to recognize each type of scene. The reults of the SVM trainers are then applied to each image, and the one which results in the highest response is considered to be the most likely scene match

Algorithm Results

CS 143 Project 3 results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.191

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.050
Bedroom

Highway

Street

Highway
Store 0.020
LivingRoom

Office

Coast

Kitchen
Bedroom 0.080
Store

Suburb

OpenCountry

Coast
LivingRoom 0.080
Store

Bedroom

Coast

Forest
Office 0.050
Kitchen

TallBuilding

LivingRoom

Coast
Industrial 0.020
InsideCity

Forest

Kitchen

Highway
Suburb 0.210
Store

Forest

OpenCountry
InsideCity 0.100
Coast

Suburb

Street

Coast
TallBuilding 0.110
Mountain

Mountain

Forest

Highway
Street 0.400
Suburb

Suburb

Mountain

Kitchen
Highway 0.690
Suburb

Store

Coast

Kitchen
OpenCountry 0.310
Suburb

Mountain

Coast

Mountain
Coast 0.280
Industrial

Bedroom

OpenCountry

Highway
Mountain 0.130
LivingRoom

Industrial

OpenCountry

Highway
Forest 0.340
Store

Store

Coast

OpenCountry
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

CS 143 Project 3 results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.512

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.390
Office

Office

Office

LivingRoom
Store 0.420
TallBuilding

TallBuilding

Highway

LivingRoom
Bedroom 0.270
OpenCountry

InsideCity

LivingRoom

LivingRoom
LivingRoom 0.500
Bedroom

Highway

Industrial

Kitchen
Office 0.720
Kitchen

Bedroom

Bedroom

Bedroom
Industrial 0.260
Street

Street

Store

Highway
Suburb 0.900
OpenCountry

OpenCountry

Street

Industrial
InsideCity 0.290
Street

Industrial

Suburb

Suburb
TallBuilding 0.370
InsideCity

Industrial

Street

Street
Street 0.550
InsideCity

TallBuilding

Store

InsideCity
Highway 0.730
Street

OpenCountry

Street

Mountain
OpenCountry 0.410
Coast

Mountain

Highway

Highway
Coast 0.490
Mountain

Industrial

OpenCountry

OpenCountry
Mountain 0.490
Store

Forest

Bedroom

Suburb
Forest 0.890
Mountain

Mountain

OpenCountry

Suburb
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

CS 143 Project 3 results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.648

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.590
LivingRoom

LivingRoom

Office

Store
Store 0.500
Industrial

Bedroom

Industrial

Forest
Bedroom 0.370
Industrial

TallBuilding

Office

Kitchen
LivingRoom 0.300
Industrial

Bedroom

TallBuilding

Bedroom
Office 0.850
TallBuilding

Kitchen

Bedroom

Kitchen
Industrial 0.560
Store

InsideCity

LivingRoom

LivingRoom
Suburb 0.920
Coast

Coast

LivingRoom

InsideCity
InsideCity 0.470
Coast

Street

Kitchen

TallBuilding
TallBuilding 0.830
Highway

LivingRoom

Street

Bedroom
Street 0.600
TallBuilding

TallBuilding

LivingRoom

OpenCountry
Highway 0.790
Coast

OpenCountry

Coast

Coast
OpenCountry 0.460
Mountain

Mountain

Coast

Suburb
Coast 0.770
OpenCountry

Industrial

OpenCountry

Mountain
Mountain 0.800
Coast

Forest

Forest

OpenCountry
Forest 0.910
OpenCountry

OpenCountry

Street

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label