Example face detection results from this project. The neutral poses are intended to be easy for a face detector, but there are still three false negatives.

Project 4: Face detection with a sliding window
CS 143: Introduction to Computer Vision



The sliding window model is conceptually simple: independently classify all image patches as being object or non-object. Sliding window classification is the dominant paradigm in object detection and for one object category in particular -- faces -- it is one of the most noticeable successes of computer vision. For example, modern cameras and photo organization tools have prominent face detection capabilities.

HoG For this project you will be implementing a sliding window face detector. You will be incorporating some of the concepts from two high impact papers in object detection. It is recommended that you look over the following papers, as well as the earlier Rowley et al. 1998. These papers are all very influential and relatively easy to read.

Not surprisingly, the pipelines are complementary. Using the strong classifiers and strong features together will result in better performance. Common to all three of the referenced papers it the concept of "mining" hard negatives to improve detection accuracy. You will implement a hard negative mining strategy and assess its impact.


An object detection system is more complex than the previous projects and thus the stencil code is more complete. Your requirements will focus on the three key elements of object detection systems: (1) representation, (2) strategies for utilizing training data, and (3) classification methods. In particular you are required to:

Details and Starter Code

The following is an outline of the stencil code:

Step 2 is where hard negatives are mined except on the first pass where there is no initial classifier and thus random negatives are returned. Mining hard negatives is conceptually simple -- run the classifier on scenes which have no faces and every detection is a false positive. You will often find more false positives than your classifier can use. In this case, randomly subsample the hard negatives.

Step 3 is where you build a feature representation for arbitrary image patches. For a quick improvement to the stencil code detector, try normalizing the image patches in crops2features.m. As with any feature change, you may then need to adjust the learning parameters. To get full credit and high accuracy you must use a feature such as SIFT or HoG. You can implement such features yourself for extra credit, but you can also use off-the-shelf implementations such as those in vl_feat. You do not necessarily need to use crops as an intermediate representation before building your strong features. For example. you can use vl_dsift to get all SIFT descriptors for your detector, as long as you make sure those SIFT parameters are the same as the ones you build for the cropped positive examples.

For step 4, code is provided to train linear and non-linear classifiers. A linear SVM can be trained with huge amounts of positive and negative examples (hundreds of thousands), but the classifier it learns is simple -- effectively a hyperplane in high-dimensional space. A non-linear SVM can be trained on only a few thousand positive and negative examples (memory consumption is quadratic w.r.t training number of training examples because of the need to construct a kernel matrix). However, the decision boundary learned by the non-linear SVM can be arbitrarily complex and the classifier will perform much better than a linear classifier if tuned correctly.

Step 6 will depend on your particular strategy for mining hard negatives or building a classifier cascade. You are free to experiment with any stopping criteria. For instance, Dalal-Triggs only mines hard negatives once. Viola-Jones iterates many more times, adding cascade stages until no more hard negatives can be found.

For step 7, the starter code provides a multi-scale detector. The detector breaks an image in to patches and runs the trained classifier on each patch. There are parameters for step size (or stride) and the ratio between scales. You can modify this function to mine hard negatives.

Steps 8, 9, and 10 are provided for you and you are unlikely to need to modify them.

The stencil code also contains a script, detect_class_photos.m, to run a classifier on the class photos.


The choice of training data is critical for this task. While an object detection system would ideally be trained and tested on a single database (as in the Pascal VOC challenge), face detection papers have traditionally trained on heterogeneous, even proprietary, datasets. As with most of the literature, we will use three databases: (1) positive training crops, (2) non-face scenes to mine for negative training data, and (3) test scenes with ground truth face locations.

You are provided with a positive training database of 6,713 cropped 36x36 faces from the Caltech Web Faces project. We arrived at this subset by filtering away faces which were not high enough resolution, upright, or front facing. There are many additional databases available For example, see Figure 3 in Huang et al. and the LFW database described in the paper. You are free to experiment with additional or alternative training data for extra credit.

Non-face scenes are the easy to collect. We provide a small database of such scenes from Wu et al. and the SUN scene database. You can add more non-face training scenes, although you are unlikely to need more negative training data unless you are building a cascade architecture.

The most common benchmark for face detection is the CMU+MIT test set. This test set contains 130 images with 511 faces. The test set is challenging because the images are highly compressed and quantized. Some of the faces are illustrated faces, not human faces. For this project, we have converted the test set's ground truth landmark points in to Pascal VOC style bounding boxes. We have inflated these bounding boxes to cover most of the head, as the provided training data does. For this reason, you are arguably training a "head detector" not a "face detector" for this project.

Copies of these data sets are available in /course/cs143/data/proj4/. You may want to make a local copy of these to speed up training and testing, but please do not include them in your handin.

Write up

For this project, and all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component).

It would be interesting to see how your detector performs on additional images beyond those in the test set such as the class photo, the CS department faculty page, etc.

For this project you should include the precision-recall curve of your final classifier and any interesting variants of your algorithm.

Extra Credit

For all extra credit, be sure to analyze on your web page cases whether your extra credit has improved classification accuracy. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit.

Some ideas:


Finally, there will be extra credit and recognition for the students who achieve the highest average precision. You aren't allowed to modify evaluate_all_detections.m which measures your accuracy.

Graduate Credit

To get graduate credit on this project you must do 10 points worth of extra credit. Those 10 points will not be added to your grade, but additional extra credit will be.

Web-Publishing Results

All the results for each project will be put on the course website so that the students can see each other's results. In class we will highlight the best projects as determined by the professor and TAs. If you do not want your results published to the web, you can choose to opt out. If you want to opt out, email cs143tas[at]cs.brown.edu saying so.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

Then run: cs143_handin proj4
If it is not in your path, you can run it directly: /course/cs143/bin/cs143_handin proj4


Final Advice


Project description and code by James Hays. Figures in this handout are from Dalal and Triggs and Wu et al.. Thanks to Jianxin Wu and Jim Rehg for suggestions.

Students effectively demonstrate how not to be seen by a robot.