The purpose of this project is to build a sliding window model face detector. Sliding window face detectors have had many successful industy implementations. The overall process is very simple, we simply examine all image patches and label each a face or not a face. In this project we were required to use either SIFT or HoG strong features. I chose to use a HoG image descriptors to boost detection accuracy. Additionally I iteratively retrained a classifier with mined hard negatives. I then compared this to the baseline with random negatives. Finally I tested both linear and non linear classifiers.
The first design decision that I made was using HOG descriptors for my images features. To do this, I employed the matlb HOG script written by Oswaldo Ludwig (BSD License). In the interest of time I used the default settings in the HOG script. These default settings would divide each crop into 3x3 windows with 9 bins per window. Although I did not have time to tweak the values any further, I found that the descriptor performed quickly and had decent accuracy.
I tried two different approaches to mining hard negatives. The first approach was to grab a random bounded number of false positive detections store them in a master listresults e, and then we randomly slice the first num_crops detections and then convert them into crops. The second approach was to choose the the false positive detections with the highest confidence from all images and saving those into a master list. After several initial tests, the first approach seemed to get the best results.
A bit more on my first approach. Before we run the current classifier on each image without a face. We convert it to grayscale and normalize the image. We then run the current classifier. Each detection is a false positive, and we will use them to build new hard negative features. We take each detection and find its bounding box, if the bounding box is not completely in the image I will throw out the bounding box. Next, we take only a few bounding boxes at random from each image deterimined by taking the ceiling of the the target crop count divided by the number of images in the directory. The chosen images are added to a master list of crops which will later be truncated at random to the correct size.
Some Anecdotal results:
Linear SVM Results:
Non-Linear SVM Results: