Overview
The sliding window model independently classifies image patches as being object or non-object. In this project we implement a sliding window face detector. To do so I:
- Improved representation using the SIFT descriptor, rather than raw image patches (which was the baseline).
- Implemented a strategy for iteratively re-training a classifier with mined hard negatives, and compared this to the accuracy acheived when randomly sampling non-face images.
- Utilized both linear and non linear SVMs and compared the results of each.
Results
I observed a range of accuracy results depending on the method implemented and the parameters selected...
Representation Improvement
I was able to improve upon the baseline's accuracy of .045 by implementing a SIFT descriptor (and using a linear SVM). This gave an accuracy of .23. The precision-recall curve looked like this:

I managed to improve upon this by tweaking some of the parameters. Here I changed the step_size to 2 (from 4) and the start_scale to 1 (from 3) and saw much better results:

Mining Hard Negatives
Adding more negative training examples by adding hard negatives improved upon this slightly:

Non-linear SVM
First I tested a non-linear SVM without mining hard negatives, only randomly subsampling them. This method outperformed the previous ones with an average precision of .71:

The most successful results, however, were acheived using a non-linear SVM and mining for hard negatives. Average precision was .74.
