Nicholas Ragosta

CSCI 1430 Project 4

Goal:
Our goal is to implement a sliding window face detector, and to hopefully achieve high accuracy by mining hard negatives.

Method:

Obtaining a Feature Representation of Cropped Images

  1. Crops2features.m takes in a cropped image and outputs a feature space representation of the image patch. Our classifier is trained with these feature space representations of our training images and then classifies test images based on their feature space representations.
  2. An off-the-shelf HOG descriptor was used to generate feature space representations of our image patches. The descriptor used allows us to modify the number of windows per bounding box and the number of bins used.

Initial Training: Learning from Postives and Random Negatives

  1. First our classifier is trained with known face images. This supplies our classifier with a set of feature space representations of positive examples.
  2. Next we train with random images that do not contain faces. Thus supplying our classifier with a set of negative examples.

Mining Hard Negatives

  1. The accuracy of our classifier is increased with several rounds of training by mining hard negatives.
  2. "get_hard_negatives." finds false positives which are then fed into the SVM. Hard negatives are obtained by running our classifier on a set of scenes known to contain no faces. Thus if our classifier returns any face detections, these classifications are false positives.
  3. Specifics of get_hard_negatives.m: The directory of non-face scenes is fed into get_hard_negatives.m. Next each image is run through "get_detections_all_scales.m". This returns the coordinates of a each detected face. These coordinates are then converted into a bounding box with "detection2bbox.m" which returns the xmin, xmax, ymin and ymax coordinates that define the bounding box. xmin and ymin are checked to ensure that they are not negative values and that they lie within the bounds of the image. If they lie outside of the image, the bounding box is shifted to lie within the bounds of the image. Finally the bounds of each detection are fed into bboxes2crops.m which returns a crop of the detected face that is of "patch_size."

Testing Time: Sliding Window Classifier

  1. During testing, a window of patch_size is moved across each test image. The feature space of each patch is fed into our classifier to search for faces in the scene.
  2. Our classifier compares the feature space representation of the patch to what it has been trained to believe is the feature space representation of a face and non-face and makes a decision about whether or not the patch contains a face.
  3. The accuracy of our classifier is visualized with a precision v. recall curve.

Discussion:

Altering HoG Parameters
By modifying HoG parameters a modest increase in performance was achieved. The effect of modifying the number of bins was not investigated. The results from changing the number of windows per bounding box can been seen below. The accuracies reported are the results of testing with a linear classifier, without mining hard negatives. By increasing the number of windows per bounding bow from 3 x 3 to 9 x 9 an increase in accuracy from 17.1 % to 37.6 % was achieved.

Linear Classifier. No hard negatives mined

9 bins, 3 x 3 Windows 9 bins, 5 x 5 Windows 9 bins, 9 x 5 Windows

Linear vs. Nonlinear
The results of using a linear and nonlinear classifier for different HoG parameters are displayed below. A large increase in performance was achieved by switching from linear to nonlinear when the HoG parameters selected were 3,3 and 9 bins.

3 x 3 Windows, 9 Bins. No Hard Negatives Mined.

Linear Classifier Nonlinear Classifier

Mining Hard Negatives
Mining hard negatives further increased the accuracy of our classifier. After performing a single round of mining hard negatives an increase in accuracy from 17.1 % to 22.1 % was obtained. Mining hard negatives seemed to have a less significant effect on the performance of the nonlinear classifier. The results of mining for 1 and 3 rounds is shown below.

Linear Classifier. 3 x 3 Windows, 9 Bins.

Without Mining Hard Negatives. One Round of Mining Hard Negatives.

Nonlinear Classifier. 3 x 3 Windows, 9 Bins.

Without Mining Hard Negatives. One Round of Mining Hard Negatives. Three Round of Mining Hard Negatives.