The aim of this assignment is to reconstruct the 3D shape of an object from a given video sequence. The papers Shape and Motion from Image Streams under Orthography: a Factorization Method, and A Sequential Factorization Method for Recovering Shape and Motion from Image Streams are used as the reference to this implementation.
The three major components of the implementation are keypoint selection, feature tracking and structure from motion. Harris corner detector is used for keypoint selection. The selected keypoints are tracked using Kanade-Lucas-Tomasi Tracker. The keypoints that got close to the borders of a frame were removed from the keypoint list. The algorithm is tested over the hotel test sequence. The selected keypoints from the first frame using Harris corner detector are given in Figure 1. The keypoints that got discarded during Kanade-Lucas-Tomasi tracker are given in Figure 2. Figure 3 shows the track of randomly chosen keypoints over the sequence. The left image highlights the starting location of the tracks on the first frame of the sequence. The right image highlights the ending location of the tracks on the last frame of the sequence.
Once the tracks of the keypoints are stored, they are used as input for the affine structure from motion which involves decomposing the tracks matrix using SVD to obtain affine and motion matrices. Affine ambiguity is eliminated by finding a 3x3 non-singular matrix Q which is later used to transform the affine and motion matrices to their true solutions. The metric transformation method explained in Morita Kanade is used for this procedure. Figure 4 shows the resulting 3D mesh obtained from the motion matrix. Figure 5 shows the predicted 3D path of the camera for each frame of the sequence.