CSCI1430 : Project 5 - Tracking and Structure from Motion
rdfong
In this project we attempt try to use several images of the same scene to identify the 3D structure of the scene.
The algorithm is described below:
Identifying Key Features
To start with we needed to identify some number of key feature points of interest in the scene to track.
To do this we use the Harris Corner Detector to find some number of key points, sort those points by strength
and pick the top 500 points.
Optical flow
-
Now that we have our key features identified we try to track them frame by frame through our sequence of images.
Our goal is to be able to keep track of the motion of these features throughout all the frames using the Lucas-Kanade optical flow method.
-
First we make some assumptions. We assume that features points never move particularly far from their starting points.
We assume that the same the neighbourhood of the same feature point in two sequential frames of the image are constant in brightness.
We also assume that all points move like their neighbours do.
-
Using these constraints we can solve the Lucas Kanade equations to get the predicted motion for each pixel of the image.
From that we can calculate the predicted motion for each feature point.
-
We can then update the locations of our feature points by sampling our calculated optical flow
at our current feature point location and then updating the point's coordinates by the corresponding
flow vector. Below are images of 20 of our features points and the paths they take throughout the sequence of images:
-
As we calculate the feature point locations for each successive frame, if the point exceeds the
boundaries of the image we just discard it.
Discarded Points
Optical Flow/Feature tracking results
20points
20points paths
discarded points (original and last)
Structure from Motion
We use the method presented in
Shape and motion from image streams under orthography:
A factorization method to find the structure of our scene from the motion of our feature points.
-
We can compose our information of the path of each of the feature points from frame to frame
in a matrix, D.
-
Using SVD we can decompose this matrix D and use submatrices of the resulting components
to define a motion matrix, M, and shape matrix,S, such that D = M*S. The motion matrix represents the motion of the camera, providing
two vectors to represent the camera in each frame, from which we can ascertain the direction that the camera is pointing.
The shape matrix contains the locations of each of the feature points in 3D.
-
However, this pair of matrices is not unique. There can be other combinations of matrices with the same dimensions that
could be multiplied together to get D. We can make them unique by placing
constraints on our scene, for example by saying that our image axes are orthogonal
and that our image scale is equal to 1. The method of applying these constraints is
also described in the paper. By applying these constraints we modify our motion and shape
matrices to provide a pair of matrices that uniquely describes our constrained scene.
Visualizing Results
-
Using the motion and shape matrices we visualize the results by plotting our
points in 3D. Here are my results from a few different view points.
From the images we can see that our reconstruction has managed to capture the rough form of the house.
-
We also create a textured triangle mesh out of the points using Delaunay triangulation.
See below. Also note the red lines pointing towards the house. These lines
represent the motion and direction of the camera through the sequence of image frames that
was obtained from the motion matrix.
Path of the camera (in order: x,y,z):
Final Results: