![]() |
The results of the face detection using a nonlinear classifier with a start_scale of 3. Decreasing the start scale to 1 dramatically increases performance, however it also greatly slows down the detection when using HoG descriptors. |
This webpage displays some of my results for the Brown University CS 1430: Introduction to Computer Vision Face detection with a sliding window. In this project, I implemented face detection by training a face classifier and scanning for faces on a test set using a sliding window.
The face detection pipeline that I implemented goes as follows:
First, I generated the baseline positive and negative training data by utilizing random crops from data sets with all faces and all non-faces, respectively. I collected the image features utilizing a HoG descriptor implementation by Oswaldo Ludwig found here. (This code was developed for the work: O. Ludwig, D. Delgado, V. Goncalves, and U. Nunes, 'Trainable Classifier-Fusion Schemes: An Application To Pedestrian Detection,' In: 12th International IEEE Conference On Intelligent Transportation Systems, 2009, St. Louis, 2009. V. 1. P. 432-437). HoG descriptors capture information about the gradient orientations in localized areas of an image. Using this feature instead of just plain image crops increased performance by upwards of 20%. Then, I trained a classifier using primal_svm. In some circumstances, I used a linear SVM, and in some circumstances, I used a nonlinear SVM. The implications of these two decisions are noted below.
In cases where I was mining for hard negatives, I went back and ran the classifier that I just generated on the negative training data. Places where the classifier indicated a face were the hard negatives, because the classifier thought that there was a face where there was none. In certain circumstances, the detection would return bounding boxes that extended outside of the image. In these cases, I threw out the image crop that ran out of bounds. My other options were translating the crop or cropping the crop. Translating it would add a crop that could be radically different from the original crop. Cropping the crop would lead to an uneven matrix that would make calulations unnecessarily complicated. I did not use non-maximum suppression in the mining of hard negatives, so that if a certain image crop were setting off the dectector on numerous occasions, all negatives would be added to the collection.
Once the classifier has been trained (iteratively, when applicable), it is run on the test data and evaluated. This can take a considerable amount of time for the entire data set.
Linear SVMs divide data along a linear plane, while non-linear SVMs can have arbitrarily complex boundaries between faces and non-faces. Linear SVMs are faster but do not yield as high results. However, they can be trained on larger quanties of data, which can make them have higher precision in the same amount of time.
Linear | Non-linear | ![]() |
![]() |
Mining hard negatives did not greatly increase performance. There was only a marginal increase, when the same number of negatives were used but some were hard nagatives
Random Negatives | Hard Negatives | ![]() |
![]() |
Let's look at the results of multiple iterations of mining hard negatives using a linear classifier with a start scale of 3. Initial classifier performance on training data after only mining random negatives:
After one round of mining hard negatives:
After two rounds of mining hard negatives:
After three rounds of mining hard negatives:
As we iterate more and more, fewer false positives are found in the negative training data. Our true positive rate in the training data decreases significantly. I hypothesize that since the evaluation of the face detector does not penalize for false positives as long as they are of lower confidence than the true positives, the resulting AP of the test data suffers when the classifier becomes more stringent.
![]() |
The above graph shows the results of the face detection using a nonlinear classifier with one pass of random negatives and 3 iterations of mining hard negatives.The AP decreases by ~2%. |