CSCI1430 : Project 4 - Face Detection with Sliding Windows

rdfong

In this project we attempt to identify where faces are in some arbitrary scene. The main idea is to train some classifier (either linear or non linear) with positive training data, negative training data and then hard negative training data. We then run over the image with a sliding window, applying the contents of our window to the classifier to determine whether or not a face has been found.

Positive Crops

First we need to train our classifier to know what faces are. We take a set of crops that we know are faces and provide each with a descriptor for the crop. I chose to use a SIFT descriptor for each crop. We can feed these positive descriptors into the SVM.

Mining Random Negatives

Now we need to train our classifier to know what isn't a face. We do the same thing we did with the positive crops with crops that we know are negative, feeding those descriptors into the SVM as well.

Mining Hard Negatives

Next we find false positives in our data set, feeding our classifier images that we know have no faces. For any crop that our classifier finds as positive we refeed the descriptor for that crop into the classifier as a hard negative and retrain the classifier. We do this for some number of iterations (I did up to 5) in the hopes that are classifier becomes more and more accurate.

Sliding Windows

Once we are satisfied with our classifier we actually need to run our detector on arbitrary images. For each image we use a sliding window approach to generate many crops of the image. For find the HOG descriptor for each of these crops and then using the trained classifier, decide whether or not the crop is a face. Lastly we can visualize our results using a Precision vs Recall curve (see results below).

Results (using SIFT descriptor)

Linear SVM (.213 accuracy)

Using just linear the results clearly weren't great.

Non-Linear SVM (.372 accuracy)

Non-Linear did significantly better than linear, as expected.

Hard Negatives (number of iterations = 3, accuracy = .229)

Interestingly, after using 3 iterations of hard negatives on non-linear, the accuracy actually decreased. It had very high prcesion for low recalls (.96 at .15 recall), but the recall itself didn't get to past .25, resulting in a low average precision.