Face Detection with a Sliding Window

by rmcverry
11/13/11

Project Overview

The purpose of this project is to build a sliding window model face detector. Sliding window face detectors have had many successful industy implementations. The overall process is very simple, we simply examine all image patches and label each a face or not a face. In this project we were required to use either SIFT or HoG strong features. I chose to use a HoG image descriptors to boost detection accuracy. Additionally I iteratively retrained a classifier with mined hard negatives. I then compared this to the baseline with random negatives. Finally I tested both linear and non linear classifiers.

Face Detection Workflow:

Get Features From Positive Training Data
1. Load crops of faces
2. Turn crops into features
Train a Linear or Nonlinear SVM with Negative Training Data
1. Load crops without faces
2. Turn crops into feature
3. Train SVM with negative and positive data
Adjust SVM Parameters
1. Find False Positive Features
2. Train a new SVM on those features
3. Repeat until satisified
Run SVM on the test image set.

Crops To Features

The first design decision that I made was using HOG descriptors for my images features. To do this, I employed the matlb HOG script written by Oswaldo Ludwig (BSD License). In the interest of time I used the default settings in the HOG script. These default settings would divide each crop into 3x3 windows with 9 bins per window. Although I did not have time to tweak the values any further, I found that the descriptor performed quickly and had decent accuracy.

Get Hard Negatives

I tried two different approaches to mining hard negatives. The first approach was to grab a random bounded number of false positive detections store them in a master listresults e, and then we randomly slice the first num_crops detections and then convert them into crops. The second approach was to choose the the false positive detections with the highest confidence from all images and saving those into a master list. After several initial tests, the first approach seemed to get the best results.

A bit more on my first approach. Before we run the current classifier on each image without a face. We convert it to grayscale and normalize the image. We then run the current classifier. Each detection is a false positive, and we will use them to build new hard negative features. We take each detection and find its bounding box, if the bounding box is not completely in the image I will throw out the bounding box. Next, we take only a few bounding boxes at random from each image deterimined by taking the ceiling of the the target crop count divided by the number of images in the directory. The chosen images are added to a master list of crops which will later be truncated at random to the correct size.

Conclusions

Tweaking lamda and decreasing the step size and start scale results in massive accuracy gains at the cost of performance in Linear SVMs

Adding more false-positive training data helped increase the accuracy of the classifier when used with smaller step sizes and start scales.

Switching to a non-linear SVM gave us the greatest accuracy boost overall.

Iterating through training data in a non-linear SVM gave a large accuracy boost

Results

Some Anecdotal results:

Linear SVM, Lamda set to 100 and using the traditional method for mining hard negatives, the final classifier had a dismal accuracy rate of <~7%

Linear SVM Lamda set to 100, choosing the highest confidence false positive hard negatives, the final classifier had an morbid accuracy rate of 2%, which was worse than the original setup.

Linear SVM Results:

Test 1

No Hard Mining

Lamda set to 10

Default params

AP = 0.214
TPR: 0.471, FPR: 0.017, TNR: 0.483, FNR: 0.029

Test 2

No Hard Mining

Lamda set to 5

Default params

AP = 0.222
TPR: 0.473, FPR: 0.015, TNR: 0.485, FNR: 0.026

Test 3

Hard Mined Once

Lamda set to 5

Start Scale set to 2

AP: 0.421 Stage 1. TPR: 0.476, FPR: 0.015, TNR: 0.484, FNR: 0.024 Stage 2. TPR: 0.400, FPR: 0.013, TNR: 0.566, FNR: 0.021

Test 4

Hard Mined Twice

Lamda set to 5

Start Scale 1

Scale Factor 1.25

AP: 0.446 Stage 1. TPR: 0.473, FPR: 0.014, TNR: 0.486, FNR: 0.026 Stage 2. TPR: 0.354, FPR: 0.013, TNR: 0.613, FNR: 0.020 Stage 3. TPR: 0.352, FPR: 0.013, TNR: 0.613, FNR: 0.022

Non-Linear SVM Results:

Test 1

No Hard Mining

Lamda set to .1

Default params

AP = 0.334
MATLAB Crashed before I could retrieve more data

Test 2

Hard Mined Once

Lamda set .1

Start Scale set to 2

AP = 0.603

Face Detection with a Sliding Window

by rmcverry 11/13/11

Project Overview

Face Detection Workflow:

Crops To Features

Get Hard Negatives

Conclusions

Results

Test 1

Test 2

Test 3

Test 4

Test 1

Test 2

by rmcverry
11/13/11