Introduction
Summary
The methods implemented for this project are:
- (1) Utilize a strong feature such as SIFT or HoG to dramatically improve detection accuracy over the baseline raw image patches.
- (2) Implement a strategy for iteratively re-training a classifier with mined hard negatives and compare this to a baseline with random negatives.
- (3) Utilize and compare linear and non-linear classifiers. Linear classifiers can be trained with large amounts of data, but they may not be expressive enough to actually benefit from that training data. Non-linear classifiers can represent complex decision boundaries, but are limited in the amount of training data they can use at once.
I used a single SIFT feature to represent a patch. To accomplish this I call vl_dsift with a bin size and a step size of 10. I also implemented get_image_feats. This methods uses vl_dsift to get crops and descriptors in a single step and avoid having to call each of these steps individually.
Initially, the stencil code gave me an average AP of 0.05. After implementing crops2features and using a linear SVM the AP rose to 0.273, as can be seen in the image below:

Initially, I find random negative patches and train the SVM with these non-face patches. To iteratively train the SVM I will use hard negatives. I find hard negatives by calling the trained SVM on negative images and identifying the patches that result in false positives -- these are the hard negatives. Finally, I retrain the SVM using these 'hard negatives' as the negative example batch.
Implementing the hard negatives gave the AP shown below, which can be compared with the AP graph from above, which was solely based on random regatives. It seems like adding hard negatives does not in general have a significant impact on performance:

As a result, the AP increased to 0.3.

All of the results above use a linear SVM. Here, I used the nonlinear SVM with a radial basis function. Using both the crops2features and the get_hard_negatives methods. I also lowered the start_scale to 1 down from 3. The precision-recall graph is shown below, which has an AP of .562:
