CS1430 Project 4:
Face detection with a sliding window
Overview
Our assignment was to use a sliding window model for face detection: independently classify all image patches as being a face or a non-face. This was done by first learning features from a set of known face images, then learning non-faces with sets of images with no faces in them. The resulting classifier was then used to detect faces in arbitrary images.
Algorithm
Baseline
First, I got features from a set of training face crops (for which I used my own implementation of HoG, more on that later). Then, negative training examples were learned. If it was the first stage pass, then random crops from images known to have no faces were used. Otherwise, hard negatives were extracted by running the currect cascade detector on negative images, and any detections above a certain confidence threshold were used as training.
![]() Random Negatives - Stage 1 |
![]() Hard Negatives - Stage 2 |
The first image is the cascade with just random negatives used as the negative training set, and the second image is another pass with hard negatives (note: for visualization and ease of testing, I used the naive implementation of features=crops). Notice how the face becomes more defined, but background more noisy.
Histogram of Gradients: Extra/Grad Credit
For extra/grad credit, I implemented my own HoG (Histogram of Gradients) feature detector. It essentially works by, given a crop, first running gradient filters to get a derivative image, which is then further processed to get a magnitude image (how strong the edge is) and an angle image (what direction the edge is going in). The image is then subdivided into blocks of cells. So, for instance, blocks with 3x3 cells of a certain pixel size. Over each cell, the angle image subsection is histogram binned into channels. Each cell then cast a 'vote' per channel by summing the magnitudes of each angle in the bins. Finally, these histograms are normalized by cell and added to the HoG descriptor.
![]() HoG Descriptor visualized |
Results
num_negative_examples sizes
First, I experimented with num_negative_examples sizes, with 2 stages and with a linear classifier of lambda 100. Here are my results:
![]() num_negative_examples = 10 - cascade |
![]() num_negative_examples = 10 - result |
![]() num_negative_examples = 100 - cascade |
![]() num_negative_examples = 100 - result |
![]() num_negative_examples = 1000 - cascade |
![]() num_negative_examples = 1000 - result |
![]() WINNER num_negative_examples = 10000 - cascade |
![]() num_negative_examples = 10000 - result |
As one can see, the cascade becomes more and more defined, but with diminishing results.
Linear vs. Non-linear
Here are my results (num_negative examples = 100, stages = 2, naive feature detector):
![]() Linear lambda = 50 |
![]() Linear lambda = 100 |
![]() Linear lambda = 200 |
![]() Non-Linear lambda = 1, sig = 0.5 |
![]() WINNER Non-Linear lambda = 1, sig = 1 |
![]() Non-Linear lambda = 1, sig = 2 |
![]() Non-Linear lambda = 10, sig = 1 |
Testing stages (random negatives and hard negatives)
Here are my results (num_negative examples = 1000, confidence_threshold = 0.9, naive feature detector):
![]() Stage 1 (random negatives, no confidence threshold) |
![]() Stage 1 (random negatives, no confidence threshold) |
![]() Stage 2 |
![]() Stage 2 |
![]() Stage 3 |
![]() Stage 3 |
Judging from these (and from some other tests that I did with the HoG), hard negatives make the features more defined, but add noise. Here, it was completely detrimental, though I found with 2 stages it is at least better defining a face.
Histogram of Gradients: Extra/Grad Credit
Now for the good stuff:
![]() Overlap: 0.5, Pixels in Cell: 6x6, Cells per block: 3x3 |
![]() Overlap: 0.5, Pixels in Cell: 6x6, Cells per block: 3x3 |
![]() Overlap: 0.0, Pixels in Cell: 6x6, Cells per block: 3x3 |
![]() Overlap: 0.0, Pixels in Cell: 6x6, Cells per block: 3x3 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 3x3 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 3x3 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 4x4 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 4x4 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 5x5 |
![]() Overlap: 0.0, Pixels in Cell: 3x3, Cells per block: 5x5 |
Here, the winner is no overlap in blocks, a relatively small cell of 3x3 pixels, and a somewhat larger block of 4x4. This is in contrast to the 0.5 overlap, 6x6 pixel cell, and 3x3 block that the paper recommended, but of course my implementation is likely slightly different, as well as the fact that they used 24x24 pixel crops rather than 36x36.