Face detection with a sliding window

Feature representation

I represented each local histograms of gradients (HoG). The transformation from image to HoG descriptors involved the following:

finding the horizontal gradient of the image, "dx", by filtering it with [-1 0 1].
finding the vertical gradient of the image, "dy", by filtering it with [-1 0 1]'.
finding the orientation of the gradient at each pixel, as "atan2(dy, dx)".
finding the magnitude of the gradient at each pixel, as "sqrt(dx ^ 2 + dy ^ 2)".
breaking the image into 6 pixel x 6 pixel, cells.
for each cell, finding the histogram of orientations within that cell, weighted by gradient magnitudes.
normalizing each orientation bin, so that the average value in a given bin throughout the image is 1.

Dalal-Triggs (2005) used more local normalization methods, but doing global normalization is faster, and doesn't seem to make much of a difference.

Using linear vs. nonlinear classifiers

Linear classifiers classified patches faster, but are bad at extracting meaning from impoverished or naive data representations, such as raw pixel information. Using nonlinear classifiers was more computationally demanding. Training each was fast and successful; the training accuracy of each was between .93 and .95, and could be brought above .99 when lambda was reduced.

A linear SVM trained on 1000 positive and 1000 random negative examples, with lambda = 100 and a bias (b0) shift of 0.12:

A nonlinear SVM trained on 1000 positive and 1000 random negative examples, with lambda = 100 and a bias (b0) shift of 0.09:

Mining hard negative examples

After a classifier was initially trained, it could be used to mine non-face patches that were difficult to classify. Mining hard negatives had little impact on the AP, since fundamental limitations in the data representation prevented the classifiers from gaining anything from the new data.

A linear SVM trained on 1000 positive and 1000 random negative examples, with lambda = 100 and a bias (b0) shift of 0.12, trained on mined hard negatives for 5 iterations: