A 3D model can be reconstructed from a sequence of images

Project 5: Tracking and Structure from Motion
CS 143: Introduction to Computer Vision



In this assignment you'll be reconstructing the 3D shape of an object from a series of observations (a video). There are three essential components to this: keypoint selection, feature tracking, and structure from motion.

Keypoint Selection

The stencil code includes a Harris corner detector in harris.m, which you should use to select your keypoints. You can implement a better keypoint selection method for extra credit. Good features to track are ones whose motion can be estimated reliably (although note that for the tracker you'll be implementing Harris corners are optimal).

Writeup requirements

  1. Include an image of the selected keypoints overlayed over the first video frame.

Feature Tracking

You'll be implementing a Kanade-Lucas-Tomasi tracker for the keypoints you detect. This essentially involves computing optical flow between successive video frames and moving the selected keypoints from the first frame along the flow field. You do not have to implement coarse-to-fine KLT as the video doesn't have any large movements.

The first assumption the KLT tracker makes is brightness constancy. A point should have the same value after translation in the next frame (where I is the image function):

Take the Taylor expansion of I(x + u, y + v, t + 1), where Ix/Iy is the x/y-derivative of the image and It is the temporal gradient:


And by the first equation:

This is only one constraint when we have two unknowns (u and v). We get more by assuming that nearby pixels at points pi, i ∈ [1, 225] (in a 15×15 box around the pixel) move with the same u and v:

We can solve this overconstrained linear system via least-squares (abbreviating the above to Ad = b):

This results in two equations with two unknowns for each pixel. You can solve for u and v by inverting the 2x2 matrix on the left-hand side and multiplying it by the vector on the right-hand side. Once you have the per-pixel optical flow, you can easily track a point through the flow by looking up its displacement in the frame and adding that to its position to get the position for the next frame. Use interp2 to do the lookup or your points will drift significantly. Discard any keypoints if their predicted location moves out-of-frame (or anywhere near the borders, to be safe).

Useful functions: fspecial, conv2, gradient, interp2, isnan.

Writeup requirements

  1. For 20 random keypoints, draw the 2D path over the sequence of frames using line segments.
  2. On top of the first frame, plot the points which have moved out of frame at some point along the sequence.

Structure from Motion

Now, use the discovered keypoint tracks as input for the affine structure from motion procedure described in Shape and Motion from Image Streams under Orthography: a Factorization Method (Tomasi and Kanade 1992). See section 3.4 of this paper for an overview of the algorithm. Note that there should be a suffcient number of tracks that persist throughout the sequence to perform the factorization on a dense matrix. There is no need to fill in missing data for this problem.

To eliminate the affine ambiguity (i.e. apply the metric constraints) by discovering QQT (eq. 16 of Tomasi and Kanade), let L = QQT and solve for L using least squares. Finally, compute the Cholesky decomposition of L = QQT to recover the appropriate Q. A more detailed explanation of this "metric transformation" can be found in Section 2.3 of A Sequential Factorization Method for Recovering Shape and Motion from Image Streams, Morita and Kanade, 1997.

Useful functions: svd, chol.

Writeup requirements

  1. Plot the predicted 3D locations of the tracked points for 3 different viewpoints. Choose several viewpoints so that the 3D structure is clearly visible.
  2. Plot the predicted 3D path of the cameras. The camera position for each frame is given by the cross product kf = if × jf. For consistent results, normalize all kf to be unit vectors. Give three plots, one for each dimension of kf.


Write up

For this project, and all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component).

Extra Credit

For all extra credit, be sure to analyze on your web page cases whether your extra credit has improved your results. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit.

Some ideas:


Graduate Credit

To get graduate credit on this project you must do 10 points worth of extra credit. Those 10 points will not be added to your grade, but additional extra credit will be.

Web-Publishing Results

All the results for each project will be put on the course website so that the students can see each other's results. In class we will have presentations of the projects and the students will vote on who got the best results. If you do not want your results published to the web, you can choose to opt out. If you want to opt out, email cs143tas[at]cs.brown.edu saying so.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

Then run: cs143_handin proj5
If it is not in your path, you can run it directly: /course/cs143/bin/cs143_handin proj5


This project is adapted from Derek Hoiem's CS 543/ECE 549 course at the University of Illinois at Urbana-Champaign.