CS143 Project 4

Face Detection with a Sliding Window

by rmcverry
11/13/11

Project Overview

The purpose of this project is to build a sliding window model face detector. Sliding window face detectors have had many successful industy implementations. The overall process is very simple, we simply examine all image patches and label each a face or not a face. In this project we were required to use either SIFT or HoG strong features. I chose to use a HoG image descriptors to boost detection accuracy. Additionally I iteratively retrained a classifier with mined hard negatives. I then compared this to the baseline with random negatives. Finally I tested both linear and non linear classifiers.

Face Detection Workflow:

  1. Get Features From Positive Training Data
    1. Load crops of faces
    2. Turn crops into features
  2. Train a Linear or Nonlinear SVM with Negative Training Data
    1. Load crops without faces
    2. Turn crops into feature
    3. Train SVM with negative and positive data
  3. Adjust SVM Parameters
    1. Find False Positive Features
    2. Train a new SVM on those features
    3. Repeat until satisified
  4. Run SVM on the test image set.



Crops To Features

The first design decision that I made was using HOG descriptors for my images features. To do this, I employed the matlb HOG script written by Oswaldo Ludwig (BSD License). In the interest of time I used the default settings in the HOG script. These default settings would divide each crop into 3x3 windows with 9 bins per window. Although I did not have time to tweak the values any further, I found that the descriptor performed quickly and had decent accuracy.

Get Hard Negatives

I tried two different approaches to mining hard negatives. The first approach was to grab a random bounded number of false positive detections store them in a master listresults e, and then we randomly slice the first num_crops detections and then convert them into crops. The second approach was to choose the the false positive detections with the highest confidence from all images and saving those into a master list. After several initial tests, the first approach seemed to get the best results.

A bit more on my first approach. Before we run the current classifier on each image without a face. We convert it to grayscale and normalize the image. We then run the current classifier. Each detection is a false positive, and we will use them to build new hard negative features. We take each detection and find its bounding box, if the bounding box is not completely in the image I will throw out the bounding box. Next, we take only a few bounding boxes at random from each image deterimined by taking the ceiling of the the target crop count divided by the number of images in the directory. The chosen images are added to a master list of crops which will later be truncated at random to the correct size.

Conclusions

  • Tweaking lamda and decreasing the step size and start scale results in massive accuracy gains at the cost of performance in Linear SVMs
  • Adding more false-positive training data helped increase the accuracy of the classifier when used with smaller step sizes and start scales.
  • Switching to a non-linear SVM gave us the greatest accuracy boost overall.
  • Iterating through training data in a non-linear SVM gave a large accuracy boost
  • Results

    Some Anecdotal results:

  • Linear SVM, Lamda set to 100 and using the traditional method for mining hard negatives, the final classifier had a dismal accuracy rate of <~7%
  • Linear SVM Lamda set to 100, choosing the highest confidence false positive hard negatives, the final classifier had an morbid accuracy rate of 2%, which was worse than the original setup.
  • Linear SVM Results:

    Test 1
  • No Hard Mining
  • Lamda set to 10
  • Default params
  • AP = 0.214
    TPR: 0.471, FPR: 0.017, TNR: 0.483, FNR: 0.029
    Test 2
  • No Hard Mining
  • Lamda set to 5
  • Default params
  • AP = 0.222
    TPR: 0.473, FPR: 0.015, TNR: 0.485, FNR: 0.026
    Test 3
  • Hard Mined Once
  • Lamda set to 5
  • Start Scale set to 2
  • AP: 0.421 Stage 1. TPR: 0.476, FPR: 0.015, TNR: 0.484, FNR: 0.024 Stage 2. TPR: 0.400, FPR: 0.013, TNR: 0.566, FNR: 0.021
    Test 4
  • Hard Mined Twice
  • Lamda set to 5
  • Start Scale 1
  • Scale Factor 1.25
  • AP: 0.446 Stage 1. TPR: 0.473, FPR: 0.014, TNR: 0.486, FNR: 0.026 Stage 2. TPR: 0.354, FPR: 0.013, TNR: 0.613, FNR: 0.020 Stage 3. TPR: 0.352, FPR: 0.013, TNR: 0.613, FNR: 0.022

    Non-Linear SVM Results:

    Test 1
  • No Hard Mining
  • Lamda set to .1
  • Default params
  • AP = 0.334
    MATLAB Crashed before I could retrieve more data
    Test 2
  • Hard Mined Once
  • Lamda set .1
  • Start Scale set to 2
  • AP = 0.603