Project 4 - Face Detection with a Sliding Window

By: Kaijian Gao

Algorithms

Both linear and non-linear methods are used to achieve face detection in this project. In this report I record the results of face detection with both the linear and non-linear classifiers, using both 1 stage of random negative sampling as well as with a second stage of hard negatives mining. They all start with the same steps. First,I load a set of images to use as positive trained examples (images with faces). Then I use crops2features to convert the positive training data into features by using vl_dsift, and start a loop of negative mining. In the first loop, I get random negatives by sampling features from non-face scenes.

After training the classifier with the random negative features and random positive features, stage 1 is complete. Let us label the linear classifier that went through stage 1 as the stage 1 Linear classifier, and the non-linear one that only went through stage 1 as the stage 1 non-linear classifier. With the initial classifier,I can continue to obtain hard negatives. This is done by running the initial classifier over the non-face scenes, such that any detection is known to be a false one. After using the newly obtained hard negative samples to train the classifier further more, stage 2 of the classifier training is complete. Same as I have done above, let us label stage 2 linear classifier and stage 2 non-linear classifier. Regardless of how the classifier is trained, it is used for the final face detection in the same way. run_detector uses the classifier to classify image patches as face or non-face, and non-max_supr takes away overlapping detections and only returns the one with highest confidence.

Classifier Performance

1. Below is the precision/recall graph for my stage 1 linear classifier. It has an average precision of 35.1%.

2. Below is the precision/recall graph for my stage 2 linear classifier. It has an average precision of 32.1%.

3. Below is the precision/recall graph for my stage 1 non-linear classifier. It has an average precision of 47.7%. This result changes significantly with some calibration of the sigma value for the RBF kernel. The result in the graph is obtained with a sigma value of 256, but with a sigma value of 128, the average precision is merely 33%, and with a value of 316, it is 44.6%. There seems to be an optimal value for sigma, but due to time restraints, I choose the sigma value 256 which seems to work best among the three I have tried.

4. Below is the precision/recall graph for my stage 2 non-linear classifier. It has an average precision of 45.1%.

It appears that the stage 2 classifiers perform slightly worse than their stage 1 counterparts, as opposed to improving performance. This shows that there is probably space for improvement in the hard negatives mining. Perhaps increasing the negative training data size and making the classifier more sensitive would boost the results.

Application Results

Below is the result of the stage 1 non-linear classifier applied to the class photo.

Returning all the detections:

Returning detections with confidence > 0.4:

Returning detections with confidence > 1.0:

Returning detections with confidence > 1.4:

As we can see, most of the false positives are ignored if we consider detections with confidence above 1.0, but true positives start to be ignored at confidence level 0.4. When we get pass confidence level 1.0, more and more true positives start getting undetected.