In this project, I implemented both an optical flow and reconstruction from motion. The overarching idea is given a series of images (i.e. video) of a 3d scene, try to reconstruct the 3d structure of the scene. In the optical flow part, we determine the overall motion between 2 successive frames of the video. We start by using a Harris Corner Detector to find interest points in the first image. Optical flow then gives us a way to determine how much each pixel moves from one from to another. We can then use this flow information to track our original interest points across all of the frames and determine where they are in each. Now we wish to learn both the position of the camera for each frame and the real 3d position of all these points. In order to do this, all we essentially have to do is do an SVD decomposition of a matrix of our tracked interest points coordinates and use a rank 3 approximation. The problem with this is that there exists an infinite number of mixes of cameras and world coordinates that provide an eqquivallent reconstruction and essentially differ by just a rotation. We remove this assine ambiguity by post and premultiplying by a newe matrix that enforces the fact that the cameras x and y axis should be perpendicular to one another. From this, we can easily recover the cameras location and the real location of all our tracked points.

Below is an animation of the 500 most signficant interest points for the house video tracked over 51 frames.

As can be seen, most of the points get tracked extremely well. THe front of the house is almost perfectly tracked and the side has only a few points that drift a small bit. The only true problem is the points on the ground which almost completely fail to get tracked, which is somewhat expected. The ground has a high level of similarity and moves extremly fast, making optical slow very bad at actually tracking its motion. Lets look at what the motion as a whole looks like.

As can be seen from this image the house is rotating clockwise, which is the same thing we would conclude from just watching the animation of the house originally.

While tracking the points, some of them fell out of the window and had to be dropped. Lets look at which these were.

All of the points that we lost were at the bottom of the house. This makes sense because these points will rotate out and under the visible frame making them untrackable.

Now lets take a look at the 3d scene we reconstruct using these tracked points.

As can be seen, we seem to have reconstructed the scne reasonably well aside for one small problem. The entire scene is backwards. Lets take a look at the position of the camera to get a better idea of what is going on.

As expected, the y position of the camera is decreasing because it is moving down, and the z position is mostly unchanged, but the x position is backwards, moving from left to right as opposed to right to left which it actually is. While this may seem problematic, this is understandable. If the points are backwards and the camera is backwards (in the x direction) we will recover the same images as if both of them were in there expected location. We might think that the affine ambiguity tranform would fix this, but it won't. All that step does is enforce that the x and y axis of the scene are orthognal to each other. Even if the x axis is backwards, it will still be orthogonal to the y axis.