Ben Freudberg
CSCI1430
Project 4
Introduction
The goal of this project is to develop a method for face detection in images. The general steps are to create a set of features that define the properties of a face, train a face detector on a set of positive and negative examples with a linear or non-linear SVM, run the detector on a set of images that do not contain faces and use any positive results as negative training data in a second iteration of training, run the newly trained detector on a set of test images with and without faces and quantify the results.
Defining a Face
I standardize all image patches to be examined to be 36x36 pixels for consistency. To pull features from one of these patches, I run a HOG descriptor on the patch. I’ve tuned the HOG function to split the image into 9 sections and then bin the results into 9 bins which produces an 81 dimensional feature description. The same HOG descriptor is run on all patches to be trained with or tested on.
Training the Detector
A series of both positives and random negatives are pulled and resized to be 36x36 pixels. They are all run through the HOG descriptor and their 81-dimensional descriptions are stored. The set of positive and negative descriptions are fed into an SVM (both linear and non-linear are tested). For the first test, this is the extent of the training. The trained detector is then tested on a set of test images and the results are tallied and scored.
The second test involves a more intensive training method. After the detector is trained on a sample of random positives and random negatives, it is run on a set of images that do not contain any faces. All positive detections from this sample must be false positives because there are no faces in the set. These false positives are then added to the set of negative training examples and the detector is retrained. This same method can then be run again as many times as desired. I re-ran the SVM with added false positives just once in order to improve the detector’s accuracy without increasing the program’s runtime by too much.
Running the Detector
The first step to finding faces in an image is to pull a set of random patches from the image at various locations and scales. These patches are then scaled to be 36x36 pixels and run through the HOG descriptor. The 81 dimensional descriptions are then compared to the map of faces and non-faces developed by the SVM and each patch is categorized as a face or not. After all the patches have been categorized, the face patches that overlap with each other are then examined to find which one has the greatest confidence of the overlapping group. The patch with the highest confidence will be returned and the other patches are rejected. This helps make sure no face creates more than one bounding box to be returned by the detector.
Results
My results varied between running with linear and non-linear SVMs and whether or not I implemented iterative training with hard negatives. The graphs of each of these four runs are displayed below.
Linear SVM without Hard Negatives Training
Score: 0.176
Linear SVM with Hard Negatives Training
Score: 0.231
Non-Linear SVM without Hard Negatives Training
Score: 0.337
Non-Linear SVM with Hard Negatives Training
Score: 0.298