Summary

(results videos at bottom of page!)
For this project, we implemented Structure from Motion, which aims to acquire the 3d structure of an object from a sequence of 2d images. The pipeline is as follows :

1: Acquire a sequence of images.
2: For the first image, acquire some feature points of interest. In this case, we used feature locations based on Harris Corners, I used the top 500 of these (a number of which were culled out by being lost during the tracking).
3: For each image, pair (i-1, i), compute the optical flow for every pixel by doing the following:

     Compute the difference It between images a and b
     For every pixel in image a...
     ΣIxIx = sum of the product of all x coordinates in ∇Ia with themselves
     ΣIxIy = sum of the product of all x coordinates in ∇Ia with y coordinates in ∇Ia
     ΣIyIy = sum of the product of all y coordinates in ∇Ia with themselves
     ΣIxIt = sum of the product of all x coordinates in ∇Ia with the corresponding coordinate in It
     ΣIxIt = sum of the product of all y coordinates in ∇Ia with the corresponding coordinate in It

     Then, we solve the following for u and v.

     where u and v are the x and y translations of pixels from Ia to Ib.

4: Use the resultant pixel translations to move interest points from Ia to Ib. i.e., feature(Ib) = feature(Ia) + (u,v)
5: Remove any features that have gone outside the image area.
6: Reconstruct the 3d geometry from the resultant tracked points by doing the following:
     Center the features for each frame (subtract the mean x from all x's, and the mean y from all y's).
     We use all the points we've gathered to compute the matrix L, which is a symmetric matrix that transforms the motion and shape matrices derived from the points (U and V of a SVD of our points) into the "real" motion and shape matrices (the shape matrix is a matrix of all the 3d points that define the final form).

First frame with all points

First frame, with all points that remained at the end

First frame with points that got KILLED (by a camera)

First frame with the paths of 20 random points

Points, tracked by frame


Watch in HD, fullscreen to see details.

Paths of 20 randomly selected points


Watch in HD, fullscreen to see details.

3D object result, imported and rendered with Maya


Watch in HD, fullscreen to see details.

Extra credit (read on for sorrow)

8-point algorithm for scene reconstruction from two views. I tried this. I have a function that retrieves the fundamental matrix F and the two epipoles (the function's gloriously called eightpoint). I Then generated the camera matrices according to the paper (matlab code:)
P=[1 0 0 0; 0 1 0 0; 0 0 1 0]; ex = [0 -c(3) c(2); c(3) 0 -c(1); -c(2) c(1) 0]; Pp=[ex*a c];

(in this case, a is F, and c is the 2nd epipole). To test my code, I used some reconstruction code I found that *did not work*, but did in their examples with the same form of input data, so I think my fundamental matrix code is... well, very wrong. The kind of results I was getting were in the 1*10^20 range. Unfortunately, I ran out of time having only completed a portion of the 8-point algorithm (including reconstruction. To be honest, it is actually the entirety of the 8-point portion, the retrieval of F)...