CSCI 1430 Project 4

Face Detection with a Sliding Window
Reese Kuppig (rkuppig)

The objective of this project was to implement a face recognition algorithm using a sliding window to locate faces in an image. This recognition is achieved by training an SVM to classify "crops," square sample areas extracted from images, as either face or non-face. Each crop is described by the SIFT feature set and the SVM is initially trained on a set of face crops and a set of non-face crops. To enhance SVM learning, the SVM may also be retrained on "hard negative" crops, false positives from the SVM, which should capture examples that are more difficult for the SVM to categorize. 



Algorithm Pipeline:

The face recognition algorithm implemented for this project breaks down into several key steps:

1. Build a set of positive example crops and negative example crops
2. Transform each crop into a SIFT feature description (occurs during crop extraction for negative examples)
3. Train an SVM on the extracted features
4. If refining with hard negatives, apply SVM to non-face test set to identify hard negatives, incorporate them into the SVM training data, and retrain the SVM
5. Test SVM on final test image set and tune performance

 

Crop Features:

The SIFT feature set was used to describe each image crop for this experiment, since it encapsulates image gradients, creating a richer, lower dimension representation for each crop. Initially crops were extracted from images and then converted to SIFT features, but for faster runtime, the middle step was removed. Instead of extracting crops from the images, SIFT features were sampled from the image at the location where the crop would have been centered. This eliminates unnecessary crop extraction and repeated calculation of SIFT features for overlapping crops.


Hard Negatives:

The baseline training set for the face detection SVM includes random positive examples and random negative examples. The random negative examples are taken from a database of non-face images, so they are guaranteed to be negatives. A "hard negative" is a false positive detection, meaning that a classifier marked a non-face crop as a face. In theory, an SVM's learning can be refined by iteratively incorporating these hard negatives into the training set and retraining the SVM. To achieve this mechanism, the SVM is first trained on random negative crops, establishing a baseline. Then, the trained SVM evaluates a set of strictly non-face images, so that any positives it returns are guaranteed to be false positives. A subset of the collected false positives are then incorporated into the negative training crops, and the SVM is retrained. This approach should create a more refined SVM at each iteration, since difficult features that it misclassifies are disambiguated and then explicitly trained into it. In practice, it seems that the improvement that mining hard negatives produces is relatively small.    

 

Results:

Both linear and non-linear SVMs were trained, and even with many fewer training examples, the non-linear SVM outperformed the linear SVM. Linear SVMs are able to take advantage of larger training sets, but non-linear SVMs can create more complex classification boundaries, allowing for more nuanced appreciation of image features. So even with fewer training examples, the non-linear SVM can extract more valuable discrimination information.


(Linear SVM, 2000 positive crops, 2000 random negative crops)

(Non-linear SVM, 250 positive crops, 250 random negative crops)

 

Incorporating the hard negative features into the training set created little, in this case negative, improvement over the same number of random negative examples.



(Non-linear SVM, 500 positive crops, 1000 random negative crops)



(Non-linear SVM, 500 positive crops, 500 random negative crops, 500 hard negative crops)

 

For incorporating hard negatives, two strategies were attempted: first, expand the set of negative crops, or second, replace half of the random negative crops. Both performed worse than the baseline without hard negatives, and between the two strategies, expansion worked better since it preserved the negative crops that were used to train the SVM and ultimately to produce the hard negatives. In other words, including the hard negatives worked best as a way to build off of the existing trained SVM, refining its discriminations, rather than to create a new SVM with new discrimination flaws. 


(Linear SVM, 2000 positive crops, 2000 random negative crops)


(Linear SVM, 2000 positive crops, 2000 random negative crops, 2000 hard negative crops)

(Linear SVM, 2000 positive crops, 1000 random negative crops (half of original negative crops), 1000 hard negative crops)

 

The best performance that was achieved hovered around 76% accuracy. Examining the false positives helps to reveal the features that the SVM must be sensitive to, mainly circles or circular areas with detail towards the center, as can be seen below.  



(Non-linear SVM, 1000 positive crops, 1000 random negative crops)

False Positives