CS143 Project 5: Tracking and Structure from Motion

Angela Santin

Overview

Firstly, I used a Harris descriptor to track interest points on the initial image. Next, I implemented a Kanade-Lucas-Tomasi tracker to track the keypoints across frames. The KL-tracker involves computing the optical flow between successive video frames and moving the selected keypoints from the current frame along the flow field to estimate their location in the next frame. Finally, I used the tracked points to calculate the 3-d structure of the object in the image.

Keypoint Detection

The Harris point detector allow me to find the ideal interest points that I should track accross subsequent frames. Here I show 200 interest points plotted over the initial frame:

Interest points over initial frame

Optical Flow (optical_flow.m)

In the optical_flow.m file I calculate the flow vector for every pixel, in the y and the x directions. I store the values for each pixel in the the matrices u and v. For the calculation I use the following formula:

Vector Flow Equation

The above formula requires that we do summation over a window for each pixel, so that the two unknowns, u and v, are not underconstrained. Once we calculate the values of the formula above for each pixel, we can solve for u and v by solving the linear system (last equation above)for each pixel. Once I have the per-pixel optical flow for each pixel in the image, I calculate the new location of the interest points in the next frame. To do this, I look up their individual displacement in the u and v matrices. I add du and dv to their position in the current frame. Note that these displacements are often smaller than a whole pixel. Hence, tracked points often fall in between the pixels in the image. To calculate their displacement I used interp2 and look at the 4 pixels in the window around the point I'm tracking to estimate its displacement.

Most of the points remained on the image and were tracked across all frames. In the following link, I show the displacement of the initial keypoints across the frames. It's a selection of 24 frames out of the total 51 frames:

Displacement of the initial keypoints

Below I show the final and initial frame, for comparison:

Initial frame with keypoints Final frame with keypoints

For 20 random keypoints, I traced the 2D path over the sequence of frames using line segments. Click the link below to visualize them:

2D path over the sequence of 51 frames using line segments

As the tracked points move along the sequence, some of them fall off the image. In this case, only 2 points fell off the image. I show these on top of the first frame in the image below:

Points that moved off the image

Structure from motion (project5.m)

I used the keypoint tracks as input for the affine structure from motion procedure described in Shape and Motion from Image Streams under Orthography: a Factorization Method (Tomasi and Kanade 1992). This helps me find the camera positions as well as the 3d locations of all tracked points. To eliminate the affine ambiguity I follow the steps described in "A Sequential Factorization Method for Recovering Shape and Motion From Image Streams" (Morita and Kanade).

Predicted 3D locations of the tracked points for 5 different viewpoints.

View 1 View 2
View 3 View 4
View 5 View 6
View 7

Plots of the predicted 3D path of the cameras.