We implement a face detector in this project. We do this via machine learning, using an SVM classifier. To begin, we assemble a set of faces (this project uses the Caltech Web Faces project). Each image in this database is a closeup crop of a single face. We also assemble a set of non-faces to use as negative training data.
Rather than training on raw image data, we convert each of our training images to a SIFT descriptor, which is invariant under many non-relevant transformations (such as lighting). We then use an SVM to construct a boundary between faces and non-faces in SIFT space. The result of this work is a (rather poor) classifier for distinguishing between faces and non-faces. I get an average precision recall of around 28% for this classifier:
To improve our results, we use an iterative technique called "mining hard negatives." Using our newly trained classifier, we revisit our non-face dataset. Running the classifier on this dataset will inevitably produce some erroneous face detections. Since we know these detections are false-positives, we add them to our training set as non-faces and re-run the SVM. The addition of this additional hard negative data (data that looks a bit like a face) the SVM will be able to construct a slightly better face/non-face boundary:
Using a non-linear SVM as a classifier improves results a bit more (these results were generated without mining for hard negatives; hard negative mining would probably have resulted in even better results):