CS143 / Project 4 / andrew

Overview

This goal of this project was to implement face detection. Image crops of many scales are extracted from the image using a sliding window. SIFT features are extracted from each crop and fed to a linear or non-linear SVM classifier. Finally, the classifier is run against a testset of images containing no faces. False positives are then used to further train the classifier.

Algorithm

Crops to features

In this step, image crops are converted to feature vectors. I chose to use VLFeat's vl_dsift function to extract SIFT descriptors. I used a bin and step size of 10, and took the first SIFT descriptor returned for each patch. I enabled the "fast" option to vl_dsift, which modestly improved runtime at no noticeable cost to performance.

get_hard_negatives

In this step, an image which we know contains no faces, is fed into the detector. Any crops which are detected as faces are known to be false positives and are returned to train the classifier. This is very similar to the ordinary detection code, except non-maximum suppression is not used.

Optimizations

As an optimization, I coded the get_img_feats function. This function returns SIFT features for the entire image. This function is used instead of calling get_img_crops followed by crops2features.

Detection Parameters

I tuned the following parameters to try to get the best average precision:

Lambda for the linear SVM, Lambda/Sigma of the non-linear SVM (with RBF kernel)
Number of negative examples
Start scale
Number of times to mine for hard negatives (number of stages - 1 stage means only random negatives were used)

Results

Table of results for some parameter values:

Lambda	Sigma	num_neg_examples	start_scale	stages	AP
1	N/A	5000	3	5	0.276
10	N/A	2000	3	1	0.385
1	N/A	5000	3	2	0.390
100	N/A	2000	3	1	0.393
100	N/A	5000	3	1	0.406
1	N/A	5000	3	1	0.425
10	350	1000	1	1	0.725
1	N/A	5000	1	2	0.735
10	350	1000	1	2	0.766

The first dramatic improvement came from using SIFT descriptors as features. This improved average precision from barely 0.05 to around 0.393. Mining hard negatives provided a modest improvement in comparison. Switching to a non-linear SVM also made a modest improvement. The best improvement beyond using SIFT descriptors came from decreasing the start_scale, and this was used to achieve the best average precision of 0.766 with these parameters:

Features: SIFT
Classifier: Non-linear SVM
Lambda: 10
Number of examples: 1000
Start scale: 1
Step size: 4
Number of stages: 2

Sample Detections

Although there are some false positives in the photo above, most of the faces are detected despite obstructions like sunglasses and people not staring straight at the camera.

Sample False Positives

Here are some examples of false positives. Upside down 9 and G were detected as faces in two images. In general, false positives tended to be of roundish shapes.

CS 143 / Project 4: Face detection with a sliding window / Andrew Ayer <andrew>