CS 143 / Project 4: Face detection with a sliding window / Andrew Ayer <andrew>

Overview

This goal of this project was to implement face detection. Image crops of many scales are extracted from the image using a sliding window. SIFT features are extracted from each crop and fed to a linear or non-linear SVM classifier. Finally, the classifier is run against a testset of images containing no faces. False positives are then used to further train the classifier.

Algorithm

Crops to features

In this step, image crops are converted to feature vectors. I chose to use VLFeat's vl_dsift function to extract SIFT descriptors. I used a bin and step size of 10, and took the first SIFT descriptor returned for each patch. I enabled the "fast" option to vl_dsift, which modestly improved runtime at no noticeable cost to performance.

get_hard_negatives

In this step, an image which we know contains no faces, is fed into the detector. Any crops which are detected as faces are known to be false positives and are returned to train the classifier. This is very similar to the ordinary detection code, except non-maximum suppression is not used.

Optimizations

As an optimization, I coded the get_img_feats function. This function returns SIFT features for the entire image. This function is used instead of calling get_img_crops followed by crops2features.

Detection Parameters

I tuned the following parameters to try to get the best average precision:

Results

Table of results for some parameter values:

LambdaSigmanum_neg_examplesstart_scalestagesAP
1N/A5000350.276
10N/A2000310.385
1N/A5000320.390
100N/A2000310.393
100N/A5000310.406
1N/A5000310.425
103501000110.725
1N/A5000120.735
103501000120.766

The first dramatic improvement came from using SIFT descriptors as features. This improved average precision from barely 0.05 to around 0.393. Mining hard negatives provided a modest improvement in comparison. Switching to a non-linear SVM also made a modest improvement. The best improvement beyond using SIFT descriptors came from decreasing the start_scale, and this was used to achieve the best average precision of 0.766 with these parameters:

Sample Detections

Although there are some false positives in the photo above, most of the faces are detected despite obstructions like sunglasses and people not staring straight at the camera.

Sample False Positives

Here are some examples of false positives. Upside down 9 and G were detected as faces in two images. In general, false positives tended to be of roundish shapes.