CSCI 143, Project 4:

Face Detection with a Sliding Window

by Betsy Hilliard (betsy), 11/13/11

Background and Motivation

General object detection is a hard problem in Computer Vision. However, face detection is a significant success for the field. Humans are able to easily classify objects they encounter in our visual world. Even from infancy humans are able to recognize others by their faces. The sliding window approach a commonly used algorithm for object detection.

Algorithm

The basic algorithm is to take a set of positive examples (crops of faces) and a set of negative examples(randomly and then intelligently mined from images without faces)and use these to construct an SVM. Then use the trained SVM to attempt to classify the sliding windows of varying sizes as face or non-face. Then use non-maximum suppression to pick the most confident image crop.

Design Decisions and Results

I chose to use SIFT features as the representation of the image patch. These features provided a marked increase over the the raw image patch as a feature jumping the AP from about .06 to .38 without mining hard negatives.This was to be expected. In order to line up the features with the chosen crops a bin size of 10 and step size of 4 seems to work well enough and still fast enough.

Mining Hard negatives did not initially produce better results and actually dropped the AP. As I increased the number of iterations through the mining process the performance increase began to plateau. I stopped getting much improvement after 4 iterations.

I tried using both linear and RBG SVMs and found that the non-linear SVM was more effective with an increase of almost 10% points over the linear SVM (with a lambda of 1 and sigma of 250). Also, after about 3 iterations the non-linear RBG SVM no longer found substantial numbers of false positives. It took 5 iterations for the linear SVM to exhaust the false positives.

In order to try to speed up the algorithm I implemented a wrapper that takes an image and turns it into a set of SIFT features directly.

I found that I was able to become quite accurate at detecting only true faces, but it was difficult to find more faces. The slow speed of my implementation created a distinct barrier to parameter tuning.

The drawback to the algorithm is that the parameter tuning is difficult and yet crucial to the algorithm's success. With ill tuned parameters the performance can be prohibitively awful.

Page made by Betsy Hilliard, 11/13/2011