Computer Vision, Project 5: Tracking and Structure from Motion
Bryce Richards
Steps 1: Select Keypoints We used a Harris corner detector to select points to track.
We chose to track the 300 keypoints with the highest Harris corner strengths. This was enough points to provide a detailed 3-D reconstruction, but
few enough points that they were all easy for our tracking algorithm to follow.
Step 2: Feature Tracking We implemented a Kanade-Lucas-Tomasi tracker for the detected keypoints. This involves using the x-, y-, and t-gradients of each pair of images (successive frames in the video) to predict where a point in the first image will be in the second image. More specifically, if I is the image intensity function, then we have the approximate relation: I(x+u, y+v, t+1) ~= I(x,y,t) + Ix*u + Iy*v + It*1, where u and v are the point's x- and y-displacement, and Ix, Iy, and It are the x-, y-, and t-gradients of the image. This relation (along with the assumption that a point's movement matches that of its neighbors) allows us to calculate u and v for every keypoint.
Step 3: Structure from Motion Using the keypoint tracks from step 2, we ran
the affine structure from motion procedure described in "Shape and Motion from Image Streams under Orthography: a Factorization Method" (Tomasi and Kanade 1992).
Below are several views of the resulting 3D reconstruction of the house. The red lines indicate the reconstructed view of the camera from frame to frame.