(results videos at bottom of page!)
For this project, we implemented Structure from Motion, which aims to acquire the 3d structure of an object from a sequence of 2d images. The pipeline is as follows :
1: Acquire a sequence of images.
2: For the first image, acquire some feature points of interest. In this case, we used feature locations based on Harris Corners, I used the top 500 of these (a number of which were culled out by being lost during the tracking).
3: For each image, pair (i-1, i), compute the optical flow for every pixel by doing the following:
Compute the difference It between images a and b
For every pixel in image a...
ΣIxIx = sum of the product of all x coordinates in ∇Ia with themselves
ΣIxIy = sum of the product of all x coordinates in ∇Ia with y coordinates in ∇Ia
ΣIyIy = sum of the product of all y coordinates in ∇Ia with themselves
ΣIxIt = sum of the product of all x coordinates in ∇Ia with the corresponding coordinate in It
ΣIxIt = sum of the product of all y coordinates in ∇Ia with the corresponding coordinate in It
Then, we solve the following for u and v.
where u and v are the x and y translations of pixels from Ia to Ib.
4: Use the resultant pixel translations to move interest points from Ia to Ib. i.e., feature(Ib) = feature(Ia) + (u,v)
5: Remove any features that have gone outside the image area.
6: Reconstruct the 3d geometry from the resultant tracked points by doing the following:
Center the features for each frame (subtract the mean x from all x's, and the mean y from all y's).
We use all the points we've gathered to compute the matrix L, which is a symmetric matrix that transforms the motion and shape matrices derived from the points (U and V of a SVD of our points) into the "real" motion and shape matrices (the shape matrix is a matrix of all the 3d points that define the final form).