CS143: Face Detection

face detection

miya schneider (mmschnei)

overview

This project was completed for CSCI1430: Computational Vision, taught by James Hays at Brown University. This goal of this project was to create a face detection system using a sliding window model and one of two classifiers: linear (based on the work by Dalal and Triggs in 2005) or non-linear (based on the work by Viola and Jones in 2001). Aspects of the detector that I implemented are described further below.

Composite of selected results

hard negatives

The purpose of mining for hard negatives was improve the accuracy of the classifier by learning what crops to avoid (the classifier learns to avoid more false positives). To implement this, for each image in a set of non-face scenes, I sampled false positives (up to 10 per image), until 1000 false positives were gathered. I then fed these to my classifier.

feature set

I chose to use a HOG descriptor (histogram of oriented gradients) for the feature representation. I experimented with the binning schema, using the default settings initially. I experimented with bin sizes and decided on 16 histogram bins and 16 HOG windows per bound box, for a total feature dimensionality of 256. With a lower dimensionality, the accuracy was not as high, and with a higher dimensionality, the accuracy was higher at the expense of run-time.

training threshold

Given more time, this is definitely something I would have experimented more with. I decided to base my stopping criteria on the amount of negative data (1000 negatives) and a constant number of iterations (for sake of saving time, I only experimented with 1 or 2 iterations). I'll admit that I did not experiment as much as I would have liked to. If I were to continue, I would most likely look at the false positive rate and stop when a certain threshold was reached.

parameters

Much of the time I spent on this project I devoted to modifying parameters in order to obtain the best results (within a reasonable amount of time).

HOG: dimensionality of 256 (described above)
Lambda: initially 100 for linear, changed to 10; 1 for non-linear
Step size: initially 4, changed to 3
Scale factor: set to 1.5
Start scale: originally 3, changed to 1 (this made a big difference)

results

Sample face detection

With my final suite of parameters (detailed above), I ran tests with both the linear and non-linear classifiers. The accuracy of both tests are summarized below. The non-linear classifier clearly out-performed the linear classifier, but took more time, as was expected.

Prior to implementing hard negatives, I ran the code using random negatives and found that the baseline was somewhere between 20 and 30 percent accuracy. From the tables below, it is clear that the use of hard negatives made a significant improvement on the accuracy.

	accuracy	true positives	false positives	true negatives	false negatives
linear	0.621	0.477	0.004	0.496	0.022
non-linear	0.690	0.492	0.002	0.498	0.007

Summary of results

Hard negative curves: linear (left) and non-linear (right)

errors

False positives

While looking through the results, I found it interesting to look at some of the errors that my detector made. The images above correspond to false positives. I thought these were noteworthy because to the human eye, they look nothing like a face (with the exception of Samuel L. Jackson's moustache, perhaps). But when generating the feature set using the HOG descriptor, these crops look like faces. To avoid this, I would probably implement cascading and mine more for hard negatives.

The pictures below are pictures that to me appear to be true positives, but according to the analysis of accuracy these are false positives. I think this might indicate some error, so I thought it would be worth including here.