Project 4: Face Detection — Sam Birch (sbirch)

CS143 11/13

In this project I implemented a couple variants of a sliding window face detector. In all cases the pipeline starts with a sample of positive training data (36×36 crops of faces) and random negative patches taken from scenes not containing any faces. I used a SIFT-based feature to represent each patch, and fed the positive & negative training data to an assortment of classification regimes.

Linear & non-linear

Linear classifiers are very fast, but the boundary represented is merely a hyperplane in the feature space, which is unlikely to correspond to a clean boundary between faces & non-faces. By switching to a non-linear classification scheme I improved perfomance pretty significantly (linear right, AP=0.385, non-linear left, AP=0.481).

Hard negative mining

By using an the initial classifier to find "hard negatives" in the negative training examples you can exploit the classifier to find training examples nearest the classification boundary by feeding the positive detections in the negative training data back into a second classifier with those hard negatives added as negative training examples. I found that a round of this improved performance modestly (and oddly, more than that decreased performance slightly). On the left, linear classifier without hard mining, right with.

Experiment with clustered models

I computed the average and standard deviation of the positive examples:

Looking at the large areas of high variance I speculated that I might be able to classify better by clustering the faces, training classifiers for each "type" of face, and then combining those classification confidences (by taking the maximum). I clustered using k-means on the same features that I classified on. The averages of 16 clusters and their standard deviations reveals some distinct classes with much lower variance:

The results were not very impressive. It improved linear classification slightly, which makes sense as the boundary representation is very impoverished. It actually decreased performance for the non-linear classifier, probably because the boundary representation can already be more or less arbitrary with a RBF kernel & taking the maximum of a number of classifiers led it to be over-confident. I think it could have performed better with a more clever way of combining the confidence values from the assorted classifiers, but training a linear classifier for weights was computationally infeasible.

Plain linear classifier, left, & linear clustered models right.

Plain non-linear classifier, left, & non-linear clustered models right.