The first step in the algoritm is to find the interest points which will be tracked. Good points to track are those that have minimal ambiguity in moving in any direction, as these are the points that can be tracked most reliably. The Harris corner detector is therefore used to generate interest points, and the 500 with the strongest responses (the 500 most corner-like points) are used as the keypoints to be tracked through the video frames. See below for a plot of the interest points.
The next step is to track the motion of the points through the frames of the video. The Kande-Lucas-Tomasi tracker is used to track the selected points. This involves computing the optical flow between sucessive frames (for each pixel), and then moving the keypoints the appropriate amount each time according to computed flow field. Below are the paths for 20 random keypoints:
And here's a pretty picture of the tracks fro every keypoint:
Keypoints from the original 500 that end up moving out of view are discarded. Here's a plot of the original points that didn't make the final cut:
The final step is to use the collected information about the motion of the keypoints to actually recreate the 3D structure. This algorithm, described by Kanande and Tomasi in "Shape and Motion from Image Streams under Orthography: a Factorization Method," seeks to take the tracking data to estimate the camera parameters for each frame as well as the 3D coordinates of each of the interest points. This algorithm first centers the points in each frame relative to the average locatoin of all points in the particular frame. Then this centered measuremnt matrix is factorized using singular value decomposition (SVD), and the results of the factorization are manipulated to construct premliminary motion (camera) and shape (3D points) matrices. The final step is to eliminate the affine ambiguity of these matrices, as described by Morita and Kanade in "A Sequential Factorization Method for Recovering Shape andMotion From Image Streams," to recover the particular motion and shape matrices which give the object's 3D structure.
The following is a series of pictures of the 3D reconstruction, viewed from different angles. The red lines in each shot indicate the 3D path of the camera.
Though the reconstruction of the house is a little crude, the algorithm does a good job of recovering the general structure from the frame sequence.