The aim of this assignment is to implement a face detection algorithm using a sliding window model based on Viola-Jones 2001 and Dalal-Triggs 2005. This implementation uses SIFT features to classify image patches as being face or non-face. Both linear and non-linear classifiers are used and evaluated for performance. Both classifiers are trained using random negative data initially. Then the performance of the classifiers are compared after retraining them with mined hard negative data.
Two major modifications for the project were done in get_img_feats and get_hard_negatives functions.
The get_img_feats function was implemented to increase the run-time of the algorithm since the given code was making patches from each image and then trying to find the SIFT features of the patches one by one. The get_img_feats function finds features of the whole image so that each feature corresponds to the patch that would have been extracted instead.
The get_img_crops function which I was trying to mimic was finding crops as long as the full crop is available while the vl_dsift function used to extract features exterpolates some portions of the image if the requested region is half full. The boundary option of the vl_dsift function was used to cancel this exterpolation. Also the bin size for the vl_dsift was chosen so that it is one-forth of the patch size used by crops. This was due to SIFT features covering a region of 4-by-4 bins. And finally the center locations generated by the vl_dsift function finds the center of starting point of all the bins instead of their centers, therefore an offset was added to the result to get a 100% match with the result of get_img_crops function
The other main function implemented, get_hard_negatives, was a mixture of get_rand_negatives and run_detector functions. It basically searches the non_face images to find faces, and the faces it found were converted into crops and then to features to train the classifier for negative images. This is expected to improve the classifier so that it avoids the errors it did to find those false positives.
The implementation that uses hard negatives, runs for four iterations so that it trains itself with random negatives in the first iteration and then with hard negatives in the following iterations. The number of was selected based on some experiments and increasing the number of iterations any further only make the performance worse due to overfitting. That being said, the performance of the algorithm varies at each run due to selecting random negative features among the found false positives.
Accuracy Percentage Graph using linear classifier trained for only randomly selected negatives
Accuracy Percentage Graph using linear classifier retrained for hard negatives
Accuracy Percentage Graph using non-linear classifier trained for only randomly selected negatives
Accuracy Percentage Graph using non-linear classifier retrained for hard negatives
The accuracy of the models always improve after mining for hard negatives. Using a non-linear classifier improves the accuracy of the model significantly when only random negatives are used but linear classifier seems to work better when hard negatives are mined using the classifier.
When the trained classifer was tested over the class photo, using the non-linear classifier with hard-negatives, the following detections were obtained. As can be seen, there are too many false positives but still all the faces other than three were detected.