Project 4: Face recognition

Hari Narayanan

Algorithm

This project roughly follows the method described in Dalal-Triggs (2005). We use a histogram-of-gradients (HoG) classifier and an SVM to train a classifier that determines crops that are human faces.

First, we compute crops of the training images. We use a sliding-window model, where the window is 36x36 in area. For each image that has no faces in it (true negatives), we compute a random set of crops from it to serve as a baseline negative. We use a set of positive face crops to serve as positive examples.

Each of these crops is fed through a HoG feature detector available on MATLAB Central. This data is used as the training data for a linear SVM, which is stored as the initial classifier. We then run this classifier on each test image to find an inital accuracy result.

To improve the result, we make another pass through the pipeline. This time, instead of using random negative crops, we run the inital classifier on the true negative training images. The crops that are detected as faces are clearly false positives, so we use those to indicate negatives. This serves to smooth out the cases where the initial classifier misclassifies images. This step greatly improved classification performace from the basic initial classifier.

Parameter tuning

Using all false positives from the true negative training images is not feasible, as the large amount of data causes a memory overflow. Therefore, we sample the false positives to reach a certain number of crops (here, we used 1000). However, this leads to a problem where a few images end up with many false positives. We solve this by limiting each image to a fixed number of false positive crops, so that the false positives are more evenly spread across the training data. Lowering this value from 20 to 10 seemed to improve the final accuracy significantly.

The initial scale factor of the SVM also greatly impacted performance, at the cost of speed. The default value of 3 gave decent, but quick results, while lowering it to 1 gave much better results, but took more than five times as long. (Experimenting with many different lambda values ended up at lambda = 8.)

In addition to the linear SVM proposed in the original paper, we can use a nonlinear kernel machine similar to the one described in Viola-Jones (2001). Compared to the linear SVM, it gave much better results, especially when lambda was set to 1.

Results

The main measure of accuracy we used is average precision (AP).

With the base model and no hard negative sampling (initial classifier only):

With the base model and hard negative sampling:

With a nonlinear SVM: