Firstly, I used a Harris descriptor to track interest points on the initial image. Next, I implemented a Kanade-Lucas-Tomasi tracker to track the keypoints across frames. The KL-tracker involves computing the optical flow between successive video frames and moving the selected keypoints from the current frame along the flow field to estimate their location in the next frame. Finally, I used the tracked points to calculate the 3-d structure of the object in the image.
The Harris point detector allow me to find the ideal interest points that I should track accross subsequent frames. Here I show 200 interest points plotted over the initial frame:
Interest points over initial frame |
---|
In the optical_flow.m file I calculate the flow vector for every pixel, in the y and the x directions. I store the values for each pixel in the the matrices u and v. For the calculation I use the following formula:
Vector Flow Equation |
---|
The above formula requires that we do summation over a window for each pixel, so that the two unknowns, u and v, are not underconstrained. Once we calculate the values of the formula above for each pixel, we can solve for u and v by solving the linear system (last equation above)for each pixel. Once I have the per-pixel optical flow for each pixel in the image, I calculate the new location of the interest points in the next frame. To do this, I look up their individual displacement in the u and v matrices. I add du and dv to their position in the current frame. Note that these displacements are often smaller than a whole pixel. Hence, tracked points often fall in between the pixels in the image. To calculate their displacement I used interp2 and look at the 4 pixels in the window around the point I'm tracking to estimate its displacement.
Most of the points remained on the image and were tracked across all frames. In the following link, I show the displacement of the initial keypoints across the frames. It's a selection of 24 frames out of the total 51 frames:
Below I show the final and initial frame, for comparison:
Initial frame with keypoints | Final frame with keypoints |
---|
For 20 random keypoints, I traced the 2D path over the sequence of frames using line segments. Click the link below to visualize them:
As the tracked points move along the sequence, some of them fall off the image. In this case, only 2 points fell off the image. I show these on top of the first frame in the image below:
Points that moved off the image |
---|
I used the keypoint tracks as input for the affine structure from motion procedure described in Shape and Motion from Image Streams under Orthography: a Factorization Method (Tomasi and Kanade 1992). This helps me find the camera positions as well as the 3d locations of all tracked points. To eliminate the affine ambiguity I follow the steps described in "A Sequential Factorization Method for Recovering Shape and Motion From Image Streams" (Morita and Kanade).
View 1 | View 2 | |
---|---|---|
View 3 | View 4 | |
View 5 | View 6 | |
View 7 |