Face Detection

This webpage displays some of my results for the Brown University CS 1430: Introduction to Computer Vision Face detection with a sliding window. In this project, I implemented face detection by training a face classifier and scanning for faces on a test set using a sliding window.

The face detection pipeline that I implemented goes as follows:

Extract image descriptors from a positive and negative data set of faces
Train a basic classifier using primal svm
Run this classifier on the negative test set to detect hard negatives and retrain the classifier on this new negative data. (Iterate to detect more hard negatives if desired)
Run the final detector on the test data set
Perform non-max suppression on the detections to remove redundancy
Visualize the detections

The steps that I implemented were the mining of hard negatives as well as collecting image features.

The Algorithm

First, I generated the baseline positive and negative training data by utilizing random crops from data sets with all faces and all non-faces, respectively. I collected the image features utilizing a HoG descriptor implementation by Oswaldo Ludwig found here. (This code was developed for the work: O. Ludwig, D. Delgado, V. Goncalves, and U. Nunes, 'Trainable Classifier-Fusion Schemes: An Application To Pedestrian Detection,' In: 12th International IEEE Conference On Intelligent Transportation Systems, 2009, St. Louis, 2009. V. 1. P. 432-437). HoG descriptors capture information about the gradient orientations in localized areas of an image. Using this feature instead of just plain image crops increased performance by upwards of 20%. Then, I trained a classifier using primal_svm. In some circumstances, I used a linear SVM, and in some circumstances, I used a nonlinear SVM. The implications of these two decisions are noted below.

In cases where I was mining for hard negatives, I went back and ran the classifier that I just generated on the negative training data. Places where the classifier indicated a face were the hard negatives, because the classifier thought that there was a face where there was none. In certain circumstances, the detection would return bounding boxes that extended outside of the image. In these cases, I threw out the image crop that ran out of bounds. My other options were translating the crop or cropping the crop. Translating it would add a crop that could be radically different from the original crop. Cropping the crop would lead to an uneven matrix that would make calulations unnecessarily complicated. I did not use non-maximum suppression in the mining of hard negatives, so that if a certain image crop were setting off the dectector on numerous occasions, all negatives would be added to the collection.

Once the classifier has been trained (iteratively, when applicable), it is run on the test data and evaluated. This can take a considerable amount of time for the entire data set.

Linear vs. Non-Linear SVM

Linear SVMs divide data along a linear plane, while non-linear SVMs can have arbitrarily complex boundaries between faces and non-faces. Linear SVMs are faster but do not yield as high results. However, they can be trained on larger quanties of data, which can make them have higher precision in the same amount of time.

Results of linear and non-linear with a start_scale of 3 and 1 iteration of mining hard negatives:

Linear	Non-linear

Mining Random Negatives vs. Mining Hard Negatives

Mining hard negatives did not greatly increase performance. There was only a marginal increase, when the same number of negatives were used but some were hard nagatives

Results of mining hard negatives vs. mining random negatives for a total result of 1462 negatives (using a non-linear SVM):

Random Negatives	Hard Negatives

Taking a closer look at mining hard negatives

Let's look at the results of multiple iterations of mining hard negatives using a linear classifier with a start scale of 3. Initial classifier performance on training data after only mining random negatives:

adds 1000 negative images to collection
accuracy: 0.972
true positive rate: 0.479
false positive rate: 0.007
true negative rate: 0.492
false negative rate: 0.021

After one round of mining hard negatives:

adds 462 random negatives to collection
accuracy: 0.978
true positive rate: 0.386
false positive rate: 0.001
true negative rate: 0.593
false negative rate: 0.020

After two rounds of mining hard negatives:

adds 70 random negatives to collection
accuracy: 0.976
true positive rate: 0.374
false positive rate: 0.003
true negative rate: 0.602
false negative rate: 0.021

After three rounds of mining hard negatives:

adds 33 random negatives to collection
accuracy: 0.978
true positive rate: 0.370
false positive rate: 0.002
true negative rate: 0.608
false negative rate: 0.020

As we iterate more and more, fewer false positives are found in the negative training data. Our true positive rate in the training data decreases significantly. I hypothesize that since the evaluation of the face detector does not penalize for false positives as long as they are of lower confidence than the true positives, the resulting AP of the test data suffers when the classifier becomes more stringent.

The above graph shows the results of the face detection using a nonlinear classifier with one pass of random negatives and 3 iterations of mining hard negatives.The AP decreases by ~2%.