CSCI1430 : Project 5 - Tracking and Structure from Motion

rdfong

In this project we attempt try to use several images of the same scene to identify the 3D structure of the scene. The algorithm is described below:

Identifying Key Features

To start with we needed to identify some number of key feature points of interest in the scene to track. To do this we use the Harris Corner Detector to find some number of key points, sort those points by strength and pick the top 500 points.

Optical flow

Now that we have our key features identified we try to track them frame by frame through our sequence of images. Our goal is to be able to keep track of the motion of these features throughout all the frames using the Lucas-Kanade optical flow method.
First we make some assumptions. We assume that features points never move particularly far from their starting points. We assume that the same the neighbourhood of the same feature point in two sequential frames of the image are constant in brightness. We also assume that all points move like their neighbours do.
Using these constraints we can solve the Lucas Kanade equations to get the predicted motion for each pixel of the image. From that we can calculate the predicted motion for each feature point.
We can then update the locations of our feature points by sampling our calculated optical flow at our current feature point location and then updating the point's coordinates by the corresponding flow vector. Below are images of 20 of our features points and the paths they take throughout the sequence of images:

As we calculate the feature point locations for each successive frame, if the point exceeds the boundaries of the image we just discard it.

Discarded Points

Optical Flow/Feature tracking results

20points 20points paths discarded points (original and last)

Structure from Motion

We use the method presented in Shape and motion from image streams under orthography: A factorization method to find the structure of our scene from the motion of our feature points.

We can compose our information of the path of each of the feature points from frame to frame in a matrix, D.
Using SVD we can decompose this matrix D and use submatrices of the resulting components to define a motion matrix, M, and shape matrix,S, such that D = M*S. The motion matrix represents the motion of the camera, providing two vectors to represent the camera in each frame, from which we can ascertain the direction that the camera is pointing. The shape matrix contains the locations of each of the feature points in 3D.
However, this pair of matrices is not unique. There can be other combinations of matrices with the same dimensions that could be multiplied together to get D. We can make them unique by placing constraints on our scene, for example by saying that our image axes are orthogonal and that our image scale is equal to 1. The method of applying these constraints is also described in the paper. By applying these constraints we modify our motion and shape matrices to provide a pair of matrices that uniquely describes our constrained scene.

Visualizing Results

Using the motion and shape matrices we visualize the results by plotting our points in 3D. Here are my results from a few different view points. From the images we can see that our reconstruction has managed to capture the rough form of the house.
We also create a textured triangle mesh out of the points using Delaunay triangulation. See below. Also note the red lines pointing towards the house. These lines represent the motion and direction of the camera through the sequence of image frames that was obtained from the motion matrix.
Path of the camera (in order: x,y,z):

Final Results: