CS 143: Face detection with a sliding window

Evan Wallace

Algorithm

A sliding window face detector implements a brute-force method: test all patches that might contain a face and return any detections. This is implemented by sliding a rectangular window across the image at different positions and scales. If the same face is detected at multiple scales, non-maximal suppression will pick the best match.

Classifier

Each patch was represented by a single SIFT feature, calculated using the VLFeat library with a patch size of 36 and a bin size of 10. I used a linear classifier, specifically Olivier Chapelle's primal_svm with a lambda of 1. The classifier was initially trained on 5000 randomly chosen positive and negative examples. To reduce the false positive rate, I retrained the classifier twice with an additional set of 5000 negative examples chosen as the most confident by the previous classifier. I added a constant of 2 to each confidence to improve the recall rate.

Detector

The sliding window was implemented using vl_dsift with a bin size of 10 and a step size of 2. To detect faces at multiple scales, the image was repeatedly downscaled by 2/3 and the sliding window was run at each level until the image was less than 50 pixels along one dimension.

Results

My face detector achieved an accuracy of 77.2% on the test set.

The graphs below summarize the detector's performance after 1, 2, and 3 training iterations. The graph on the left shows the positive training examples (green) and the negative training examples (red) sorted by confidence. The graph on the right shows precision (percentage of positives that were correct) versus recall (percentage of positives identified correctly).

ClassificationsPrecision-Recall

The classifications after 1 iteration

The precision-recall curve after 1 iteration (accuracy of 71.9%)

The classifications after 2 iterations

The precision-recall curve after 2 iterations (accuracy of 58%)

The classifications after 3 iteration

The precision-recall curve after 3 iterations (accuracy of 77.2%)

One interesting way to look at the results is to sort all detections by confidence. The images below show the first ten positive and negative results in that sorted order. Notice how the first 115 detections are all correct ones.

PositiveNegative

First correct at #1

First incorrect at #116

Second correct at #2

Second incorrect at #168

Third correct at #3

Third incorrect at #183

Fourth correct at #4

Fourth incorrect at #189

Fifth correct at #5

Fifth incorrect at #198

Sixth correct at #6

Sixth incorrect at #215

Seventh correct at #7

Seventh incorrect at #217

Eighth correct at #8

Eighth incorrect at #221

Ninth correct at #9

Ninth incorrect at #223

Tenth correct at #10

Tenth incorrect at #224

I also tried a non-linear classifier with an RBF kernel, but the parameters were sensitive and I couldn't find parameters for which the accuracy improved over the linear classifier. The best combination I found was a lambda of 1 and a sigma of 100:

ClassificationsPrecision-Recall

The classifications for the non-linear classifier

The precision-recall curve for the non-linear classifier