CS143: Introduction to Computer Vision

Homework Assignment 4

Tracking using Particle Filtering



Due: November 16 at 10:59am -- Problem 1

         November 23 at 10:59am -- Problem 2

The assignment is worth 15% of your total grade.
It is graded out of a total of 100 (plus 10 possible extra points).


The assignment on eigenfaces dealt with detection in static images while the assignment on affine motion dealt only with motion between pairs of frames. Here we consider tracking an object across a sequence of images. While one could "track" simply by finding the object again and again in each image, it is more efficient to take into account information about object motion over time in a recursive fashion.  Here we consider the problem of tracking objects in sequences of images when those objects change in appearance due to specular reflections, shadowing, non-rigid motion, and changes in pose.

For this assignment you will take a Bayesian approach and will implement a Particle Filter tracker as described in the lecture slides.  The assignment will also exploit some of your earlier work on image filtering and combine this with some probabilistic modeling.

Please start early as this assignment is definitely harder than previous ones.

Files you need

Your task is to track the face of Gregory Dudek, a computer vision professor from McGill University. You will need a sequence in /course/cs143/data/dudek. The images are in pgm format. The data also includes seven ground truth data points marked by hand on the face in each frame they were visible. These points can be used to extract the face for learning the likelihood model. They will also be used to quantitatively evaluate your tracker. The sequence is LONG, you will only need 300 frames or so. Also, since the sequence is long you will want to only read in two frames at a time (not the whole sequence!). 

Additionally, we provided three Matlab scripts problem[1-2].m in the directory /course/cs143/asgn/asgn4/, which you should start from. Their main purpose is to make it easier for us to grade your homework. You are therefore asked to copy those files and modify them, so that they perform the appropriate task and display your results. You are of course free to write your own scripts and/or functions and only add calls to these to the scripts we provided.

What to hand in.

Please hand in enough information for us to understand what you did, what things you tried, and how it worked!

You will hand in your Matlab code, which is supposed to show all your results, and any text or explanations that are required. Please hand in the modified problem[1-2].m scripts as well as any other Matlab scripts or functions that your solution requires. Make sure that all your results are displayed properly!

You will also hand in any text or explanations electronically. They should be in a separate file than the source code, and either in Text, Postscript or PDF.

To hand in, please follow the general instructions posted to the newsgroup. The command to hand in this particular homework assignment is
    /course/cs143/bin/cs143_handin asgn4_p1

    /course/cs143/bin/cs143_handin asgn4all

Do all the following problems:

Problem 1 (40 points)

The first step is to learn a statistical model of Greg Dudek's face. You will do so by learning probability distributions corresponding to the filter responses for the face. Choose a good starting frame (you can just choose frame0001.pgm). You are to use image filters that capture some of the texture in the image region. Manually select a point in the middle of Greg's face and a rectangular region around it. You can use the ground truth data points provided to do this automatically. To see the data and how to load the ground truth data points run README.m from /course/cs143/data/dudek/.

For the selected image region compute the derivatives in the horizontal and vertical directions. Now build discrete histograms for the derivatives in the x and y directions and for the grayscale appearance. Normalize the histograms to get discrete probability distribution: Hv* , Hx* and Hy*. Choose a reasonable bin size for the histograms.

Show the discrete probability distributions corresponding to the horizontal and vertical filter responses and grayscale.

The likelihood that you are going to use is based on the statistical model you just learned. The idea is to define a likelihood that favors image regions where the derivative filter statistics and grayscale values match the trained statistics. For each proposed rectangular region centered at the (x,y) build normalized discrete histograms H(i)v, H(i)x and H(i)y for the grayscale values and derivatives in the x and y directions. We can compare these to the model that we have of Greg by computing the distance between the histograms. We will use the Bhattacharyya distance between 2 histograms, which is defined as

Assume conditional independence between grayscale values and derivatives in the x and y directions. This allows us to multiply the model probability distributions based on the three filters. Taking the log of the product, we get       


To test the likelihood model, do the following: keeping the size of the rectangular region the same shift it across the image from left to right, and top to bottom with some reasonable increment and evaluate the likelihood for these locations (there is no need to do this for the whole image, just choose some suitable neighborhood region around the true face)). Plot the value of the likelihood using a 3D plot, where x- and y- will correspond to the x- and y-position in the image and z- to the likelihood. Where does the likelihood peak? Is the true position of the face a global maxima? Is it a local maxima?

Now load frame0010.pgm and frame0205.pgm of the Greg Dudek's sequence and repeat the analysis. Show the plots.

Problem 2 (60 points)

Implement a particle filtering tracker for Greg's face. The state variable you want to estimate is the 2D (x, y) position of Greg's face. Initialize the particle set with the true position of the face in the chosen starting frame for all N particles.

To keep things simple, define the temporal prior as a Gaussian probability density (the box should be a ~, meaning distributed according to, in this eqn):


which says that estimates that are close to the previous state estimates are more likely than ones that are farther. This step corresponds to adding noise to the particles sampled from the posterior distribution. You should experiment with different values of σprior. and report the one you chose and why.

Track Greg's face using your particle filter with the likelihood defined in Problem 1. You will want to make sure you use the log likelihood in your code until you need to normalize the particles to sum to one (see lecture slides).

Show some representative frames from your tracking results -- plot the particles as colored points on the image.  Also show the expected location (weighted mean) and draw the bounding box. Show where it fails (if it does). Discuss your experimental results. How well does this likelihood work? What confuses the tracker?

Note that vision algorithms always fail! Your algorithm will likely not track Dudek over many frames. That is ok. Vision is also an experimental field and you learn a lot when things break. So tell us where it breaks and why you think it breaks.

We want to see representative results, an analysis of the results, and your code.

Extra Credit (10 points)

Add a scale and/or rotation (5 points each) parameter to the state space and estimate this scale over time. Plot the expected scale/rotation as a function of time.  Also show the tracking result on the image by displaying the position of the bounding box which will now be scaled and rotated.

Last modified: 11/8/2009