CS 1430 Project 4: Face Detection with moving window

Kumud Nepal
November 13, 2011

Overview

The sliding window method for face detection classifies different image patches/or features from images as being faces or non-faces. The pipeline for this generally requires loading cropped images of faces as positive trained examples. Then, some negative examples from non-face scenes are loaded. With the identifier classifier, only examples where the classifier is wrong are sampled. This may include doing a random negative extraction or hard negative extraction, where every image fed to the classifier is a non-face. Then, some features like SIFT or HOG are extracted from the crops as training data set. These give more information about images than simple crops. For the purpose of this project, SIFT features were extracted from images using the off-the-shelf vl_dsift function. The next step involves training linear and non-linear classifiers like SVM. The project uses a prival SVM approach and implements both linear and non-linear version of SVM for classification. A classifier cascade is made and particular attention is given to tuning parameters like lambda (linear/non-linear) and kernel type, kernel variance (non-linear). The hard negatives are mined and the classifer is trained accordingly after each iteration of mining the negatives. The ending condition for this iterative process could depend on tnumber of false positives gathered or simply, the number of iterations. Then, a multi-scale detector is run over the images and non-maximum suppression is performed if necessary to remove redundancies in detection. Finally, all detections are evaluated on the basis of Accuracy Percentage, which is in turn calculated from number of detections, number of false positive, false negatives, so on.





Results

Different classifiers, training parameters and stopping conditions were provided to come up with the results. The results given initially by the stencil code was a paltry had a paltry 0.085 Accuracy Percentage in detecting faces from a set of 130 images. The results dramatically improved when SIFT features were considered for training instead of just image crops. Also, linear/non-linear classifiers gave slightly different results and different parameter tuning for these classifiers changed results a little bit too. Results are gien below as figures.


Fig. 0: Result of face detection by implemented algorithm. All faces detected with a few false positives




Fig. 1: Two faces successfully detected in the same picture, yellow box denoting ground truth and green box denoting detection



Fig. 2: A false positive detected in the same picture




Linear SVM with lambda = 200



Fig. 3: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and one iteration of hard negatives



Fig. 4: Iteration wise detection evaluation



Fig. 5: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and three iterations of hard negatives



Fig. 6: Iteration wise detection evaluation



Fig. 7: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and five iterations of hard negatives



Fig. 8: Iteration wise detection evaluation




Non-linear SVM with lambda = 0.7, sigma = 1000



Fig. 9: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and one iteration of hard negatives



Fig. 10: Iteration wise detection evaluation



Fig. 11: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and three iterations of hard negatives



Fig. 12: Iteration wise detection evaluation



Fig. 13: Precision recall graph for linear SVM classifier ran for one iteration of getting random negatives and five iterations of hard negatives



Fig. 14: Iteration wise detection evaluation



Conclusion

Looks like the non-linear SVM classifier outperforms the linear SVM classifier by a little bit. Ther percentage boost in accuracy is a lot. However, it is no where close to what is published in the literature. I believe its possible to get close to those numbers by simply tuning some algorithmic parameters more -- like lambda for SVM, step size and starting scale used in the detector, etc. This comes at a cost of computing time though.