Ben Freudberg

CSCI1430

Project 5

 

Introduction

            The goal of this project is to develop a method for determining 3D geometry of an object from the motion of a video of an object. The general steps are to select a number of interest points, track the 2D location of the points through the motion in the video, and use their motion to determine the structure of the object.

Selecting Interest Points

            I run a Harris Detector on the image and sort the results by strength. I then select the top 500 points to use as my keypoints for the Kanade-Lucas-Tomasi tracker. The following is the first screen of the video overlaid with the selected interest points:

Figure 1: Image Keypoints

Optical Flow

            I then implement a KLT tracker to record the motions of my keypoints. The tracker looks at spatial and temporal image gradients to compute the displacement of each of the feature points. The tracker assumes that brightness is constant and that displacements are small. The paths of 20 random keypoints are shown overlaid on the first image of the video:

Figure 2: Keypoint Paths

            The when we attempt to compute the structure of the object, we will need to make sure that all of our feature points have remained in the frame of the image for the duration of the video. Therefore, we structure our keypoint tracker to segregate points based on whether or not they stay onscreen. Any points that leave the frame are tagged so that they will not be used by the structure algorithm later. The points that drift off-screen are shown below:

Figure 3: Keypoints that don't stay in-frame

Structure from Motion

            The final step in the project is to compute 3D coordinates of the feature points based on their paths of motion within the image frame. We can store their paths in a matrix, D, and use the SVD function to decompose this matrix into one that represents the shape of the object and one that represents the movement of the camera. The issue is that this is simply an affine representation of the object. To get a true 3D representation, we must make several assumptions to introduce constraints and reduce the number of degrees of freedom. We assume the scale of the image is constant and that the camera axes are orthogonal. With these assumptions, we may compute 3D geometry. The final results are as follows:

Figure 4: View 1

Figure 5: View 2

Figure 6: View 3