Project 5: Tracking and Structure from Motion

Varun Singh

Overview
Keypoint Selection
Feature Tracking
Structure from Motion

For this assignment I reconstructed the 3D shape of a hotel from a sequence of images, following the procedure documented in Shape and Motion from Image Streams under Orthography: a Factorization Method (Tomasi and Kanade 1992).

This is from the feature tracking tab, but only works in this tab, so I'm putting it here:

Keypoint Selection

First, I used a Harris corner detector to select keypoints in the image that would be easy to track. Here are all of the Harris corners, overlayed over the first image in the sequence:

As you can see, there are lots of detected 'corners' that are very bad features to track. So, for the rest of the assignment, I selected the best 500 corners and only used those. Here are the best 500 keypoints/corners detected:

Feature Tracking

Then, I tracked these features through the sequence of 51 images using optical flow. I started with the first image, and then using the per-pixel optical flow, added the displacement at each image to track each pixel. I used interp2 when calculating all of the displacements to avoid drifting of pixels. Here is an animation of the locations of all of my tracked features over the entire sequence:

Here is the 2D path over the sequence of all 51 frames, overlayed over the first image, for 20 random keypoints. The features start at red in the first image, and go to blue.

I also have a visualization of a different 20 random points, tracked through all of the images, with lines connecting the same feature in the different images. For some reason, it only displays on the first tab, so go back to the first page to see it.

Then, I discarded all the points that drifted off the image at some point during the sequence. Here are all of these points, overlayed on the first image in the sequence:

Structure from Motion

Finally, I followed the procedure described in the Tomasi and Kanade paper to generate 3D structure from motion.

Here is the generated 3D mesh of the hotel, from 3 different viewpoints. For some reason the 3D mesh is a mirror copy of what is actually seen in the image(the chimney and ground are to the right instead of to the left), but I figures this was simply due to converting to 3D coordinates. Interestingly, switching either the x and y columns of the points matrix, or switching the top and bottom (i and j) parts of the D or M matrix, both fixed the problem, but for the writeup I left it as how the support code was rendering it:

Here's a better picture of the 3D positions of the cameras, denoted by the red lines:

Here are plots of the 3D positions of the camera. Each graph shows the x, y, or z coordinates of the camera for each of the 51 frames in the sequence.

X Coordinates:

Y Coordinates:

Z Coordinates: