Overview
The goal of this project was to use a sequence of 2D images showing different perspectives of an object to reconstruct the object's 3D shape. First, a Harris corner detector is used to select keypoints. Then, a Kanade-Lucas-Tomasi tracker is used to track the keypoints through the sequence of video frames. Finally, the keypoint tracks are used to reconstruct the 3D shape as described by Tomasi and Kanade.
Here is the first frame of the sequence to be evaluated:
Keypoint Selection
A Harris corner detector is used to detect keypoints. Only the 500 strongest points are used. Here are the selected points, overlayed on the first frame of the sequence:
In general, these are good points. However, the background noise of the image has caused several points to be selected in the background. These points cause problems later.
Feature Tracking
A Kanade-Lucas-Tomasi tracker is used to track the keypoints throughout the sequence. Successive frames are fed as input to an optical flow function, which produces displacement values for each pixel. The optical flow function uses a 15x15 window. Each keypoint is moved by the displacement returned from optical flow, with sub-pixel accuracy.
Keypoints which move outside the frame are completely discarded. Here are the discarded keypoints, overlayed on the first frame of the sequence:
Here are the movements of 20 random keypoints:
The keypoints in the background noise did not track very well:
Observe where they were put on the last frame of the sequence:
Coarse-to-fine tracking may resolve the problem but the assignment did not call for this because the video otherwise doesn't have any large movements.
Structure from Motion
The keypoint tracks are used to create a "measurement matrix," D, which is factorized and manipulated to create motion (affine) and shape (3D) matrices. We then apply constraints to these matrices to eliminate affine ambiguity and determine the true motion and shape matrices.
Below are four viewpoints of the constructed 3D structure. The red lines represent the predicted 3D path of the camera. For the most part the structure is correctly reconstructed, except for the large black area off to the side of the hotel. This is an artifact of those background noise keypoints that were discussed above.
Here is a plot of the predicted camera path:
Here are the X, Y, and Z dimensions of the predicted camera path: