CS1430 Project 4:
Face detection with a sliding window

Bryan Tyler Parker

Overview

Our assignment was to use a sliding window model for face detection: independently classify all image patches as being a face or a non-face. This was done by first learning features from a set of known face images, then learning non-faces with sets of images with no faces in them. The resulting classifier was then used to detect faces in arbitrary images.

Algorithm

Baseline

First, I got features from a set of training face crops (for which I used my own implementation of HoG, more on that later). Then, negative training examples were learned. If it was the first stage pass, then random crops from images known to have no faces were used. Otherwise, hard negatives were extracted by running the currect cascade detector on negative images, and any detections above a certain confidence threshold were used as training.

Random Negatives - Stage 1

Hard Negatives - Stage 2

The first image is the cascade with just random negatives used as the negative training set, and the second image is another pass with hard negatives (note: for visualization and ease of testing, I used the naive implementation of features=crops). Notice how the face becomes more defined, but background more noisy.

Histogram of Gradients: Extra/Grad Credit

For extra/grad credit, I implemented my own HoG (Histogram of Gradients) feature detector. It essentially works by, given a crop, first running gradient filters to get a derivative image, which is then further processed to get a magnitude image (how strong the edge is) and an angle image (what direction the edge is going in). The image is then subdivided into blocks of cells. So, for instance, blocks with 3x3 cells of a certain pixel size. Over each cell, the angle image subsection is histogram binned into channels. Each cell then cast a 'vote' per channel by summing the magnitudes of each angle in the bins. Finally, these histograms are normalized by cell and added to the HoG descriptor.

HoG Descriptor visualized

Results

num_negative_examples sizes

First, I experimented with num_negative_examples sizes, with 2 stages and with a linear classifier of lambda 100. Here are my results:


num_negative_examples = 10 - cascade

num_negative_examples = 10 - result

num_negative_examples = 100 - cascade

num_negative_examples = 100 - result

num_negative_examples = 1000 - cascade

num_negative_examples = 1000 - result

WINNER num_negative_examples = 10000 - cascade

num_negative_examples = 10000 - result

As one can see, the cascade becomes more and more defined, but with diminishing results.

Linear vs. Non-linear

Here are my results (num_negative examples = 100, stages = 2, naive feature detector):


Linear lambda = 50

Linear lambda = 100

Linear lambda = 200



Non-Linear lambda = 1, sig = 0.5

WINNER Non-Linear lambda = 1, sig = 1

Non-Linear lambda = 1, sig = 2

Non-Linear lambda = 10, sig = 1

Testing stages (random negatives and hard negatives)

Here are my results (num_negative examples = 1000, confidence_threshold = 0.9, naive feature detector):


Stage 1 (random negatives, no confidence threshold)

Stage 1 (random negatives, no confidence threshold)







Stage 2

Stage 2







Stage 3

Stage 3

Judging from these (and from some other tests that I did with the HoG), hard negatives make the features more defined, but add noise. Here, it was completely detrimental, though I found with 2 stages it is at least better defining a face.







Histogram of Gradients: Extra/Grad Credit

Now for the good stuff:


Overlap: 0.5, Pixels in Cell: 6x6, Cells per block: 3x3

Overlap: 0.5, Pixels in Cell: 6x6, Cells per block: 3x3







Overlap: 0.0, Pixels in Cell: 6x6, Cells per block: 3x3

Overlap: 0.0, Pixels in Cell: 6x6, Cells per block: 3x3







Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 3x3

Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 3x3






WINNER


Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 4x4

Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 4x4







Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 5x5

Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 5x5






Here, the winner is no overlap in blocks, a relatively small cell of 3x3 pixels, and a somewhat larger block of 4x4. This is in contrast to the 0.5 overlap, 6x6 pixel cell, and 3x3 block that the paper recommended, but of course my implementation is likely slightly different, as well as the fact that they used 24x24 pixel crops rather than 36x36.