structure from motion

miya schneider (mmschnei)

overview

This project was completed for CSCI1430: Computational Vision, taught by James Hays at Brown University. This goal of this project was to reconstruct a 3D object based on movement of hotel across video frames. I relied heavily on the papers by Tomasi and Kanade (1992) and Morita and Kanade (1997). The first part of the project entailed determining the movements between frames, which I did by selecting important features and determining the optical flow between frames. After calculating the optical flow, I used the mathematical equations provided by Morita and Kanade to determine the structure.

feature selection

In order to track points between frames, the points must be unique enough that they can be detected from frame to frame. I used the Harris corner detection algorithm in order to find such interest points. Because of the simplicity of the detection and the relatively few number of detections, Harris corners are an optimal choice. However, the algorithm returns a large number of points, each with a confidence, so to reduce the size of the feature set I only kept track of the 500 features with the highest confidences.


Interest points selected from first frame

feature tracking

In order to track the features, I implemented the Kanade-Lucas-Tomasi algorithm. This means that between each frame, I computed the optical flow and factored the individual feature displacements into the subsequent frames. By the end, the position of each chosen feature is known for each time frame. The trajectories of 20 random features are shown below.


Sample of tracked features

Some of the original interest points move out of the camera view as the frames change and the hotel moves. These points are no longer of use, so I removed them from the feature data. The trajectories of these features are shown below. You can see that most of the removed points are near the bottom of the image. This makes sense given the apparent downward motion of the camera.


Trajectories of "bad" features

structure from motion

With the optical flow, I approximated the 3D structure of the hotel using the equations detailed in the two papers mentioned above. This involved centering the features for each image and then constructing a measurement matrix combining all the features. I then decomposed the matrix using singular value decomposition (SVD), which I subsequently used to find the motion and shape matrices. However, in order to ensure uniqueness of the decomposition it was necessary to remove affine ambiguity. This procedure is further described in the second paper.



3D locations of tracked points

The 3D hotel view is shown above. It looks fairly accurate, with the exception of the overhang (pictured in the top right view). The above graphs depict the predicted path 3D of the camera. It is clear from these that the camera moved in the x-direction quite a bit.


Path of camera in x, y, and z directions, respectively