Project 5: Tracking and Structure from Motion

Name: Chen Xu
login: chenx

Algorithm Design

The basic flow is as follows:

Use Harris corner detector to select keypoints from the first frame.
Use KLT tracker to estimate of the motion of the keypoints and track them.
Factorize the measurement matrix to recover shapes and motion from image stream.

Results of Basic Part

Keypoints selection

Figure 1 left shows the first 500 points with largest confidence overlayed on the first frame, and Figure 1 right shows the first 100 points with largest confidence overlayed on the same frame. Some undesired features points which are not on the hotel are detected when 500 points are selected. While no undesired corner points are detected when the number of feature points is decreased.

500 Points	100 Points

Figure 1: (Left)500 points and (Right)100 points from Harris corner detector.

KLT Tracker

Trajectories of 20 feature points over whole image stream.

The red circle means the end of the trajectory, and the trajectory starts at the tail of the tadpole-like path. All the trajectories are drawn over the last frame of image stream. Figure 2 left shows the trajectories.

Outside Points

The green cicles in Figure 2(Right) shows the points that move out of frames along the image sequence when 500 corners are selected at first.

Figure 2:(Left) trajectories of 20 points over last frame.(Right) green circles indicates the points which move out of frame.

Structure From Motion

Viewpoints of Recovered 3D Structure

Figure 3 shows three different viewpoints of recovered 3D structure of the hotel from 500 points. When implementing the structure from motion algorithm, we should consider the 2F > P and 2F < P condition. Otherwise the left-right transpose effect(which is known as bas-relief ambiguity) will happen. If 2F < P like F = 51, P = 500, the measurement matrix should be transposed before doing SVD, and R and S should also be modified accordingly. Figure 4 shows the result of 2F > P(F = 51, P = 100), and there are no left-right transpose effect.

Figure 3: Three viewpoints of structure from motion. 2F < P(F = 51, P = 500), no left-right transposition.

Figure 4:Structure from motion, 2F > P(F = 51, P = 100), no left-right transposition.

Predicted 3D path of camera

The red, green and blue plots indicates the three dimensions of Kf respectively. Which are showed in Figure 5.

Figure 5:Path of camera of three dimensions.

Extra Points

I implemented the iterative refinement of KLT tracker for large movement image sequence. I pick up every third images in the hotel image stream to make large motion stream, so the average motion is three times larger than the origianl stream. And I evaluate the tracking peformance of basic KLT-tracker and iterative KLT tracker. Figure 6 left shows the result of non-iterative tracker and right shows the result of iterative tracker of 50 feature points. The trajectories starts at the tail of the tadpole-like path and ends at the circle. We can see clearly that without the iterative refinement, the points are not correctly tracked, especially the points on the right roof of the hotel, the trajectories don't end at corner points. However, with iterative refinement(right), the corner points are tracked correctly.

Figure 6:(left) without iterative refinement of KLT tracker, (right) with iterative refinement tracker. Three times larger movement than original.