Humans, unless being actively tricked, are easily able to perceive the basics of depth even when only given a 2D image. Computers, however, find this task difficult. One method used to address this problem is to allow the computer to see multiple 2D views of the 3D world. They are then, theoretically, given stereo vision, what helps humans perceive depth, and can piece together the structure of the 3D world. This process, when repeated multiple times, is the same as the computer taking a video, or motion, and deriving the structure.
The first step is to be able to find points that are trackable. To do this the algorithm uses a Harris Corner Detector to find distinct points. It then uses optical flow to track those points between images. Once it knows where the same points are in all images it calculates the structure from of the object in the image.
One way of selecting a distinctive point is to find points where the average intensity of the points around that point changes dramatically with any relatively small shift of the point. This definition tends to apply to things that look much like corners. The main drawback (loss of a corner due to a large scale change) to this algorithm does not apply in this case as there are no major, rapid changes in scale. My implementation finds the 500 points that are most "cornery"(have the highest change in intensity with a shift) according to the Harris Corner Detector.
Corner Points Detected in Frame 1 |
---|
![]() |
Optical flow is based on the concept that, given a small motion, the intensity change between a point in one image and the same point in another image. Using this fact it is possible to determine the direction of motion of all points between the two images. This implementation observes the intensity in 15x15 squares centered at each point and the gradient between the images in order to calculate the motion vector for each point. It then takes the points it found with the harris corner detector and, since these points should be fairly accurately tracked with optical flow since their intensity characteristics are more distinct, uses the motion vector at those points to find the same points in the next image. When applied iteratively to each pair of images the algorithm generates a list of the location of the corner points in each successive image. There are some points that leave the image space, but as these are on the edge of the image and can have bad intensity calculations they are bad points to track anyway.
The points tracked below start at the green circles and travel to to red circles along the path.
Path of 30 Random Corner Points | Points That Leave the Image Space |
---|---|
![]() |
![]() |
In order to construct structure from motion follow the algorithm designed by Tomasi and Kenade. First, center the feature coordinates. Then create a matrix of all points tracked for all images. Next, factorize this matrix and create motion and shape matrices. Finally eliminate the affine ambiguity of the motion matrix.
Below are two views showing the 3D structure of the house. The red lines are vectors indicating the direction of the camera for each image in the sequence.
3D View 1 | 3D View 2 |
---|---|
![]() |
![]() |
This algorithm works well with series of images with slow and smooth motions as these assumptions are made in order for the Harris Corner and Optical flow algorithms to work properly.
Page made by Betsy Hilliard, 12/2011