There are three things that we need to solve for, when computing structure from motion. The first is the points that we are reconstructing, the second is the camera's x direction, and the third is the camera's y direction.
First, we compute the matrix D, which is the centered x and y points that we tracked back in step one. Then we factorize D into three matrices, U, V, and W. These are then cropped so that they are each of rank three. Then, we create the affine and 3D matrices. This gives us matrices M_hat and S_hat, which resemble the points of the reconstruction and the camera directions of the reconstruction, but there is some ambiguity. There is some matrix A, which, when multiplied by M_hat, and when its inverse is multipled by S_hat gives us the initial points S and camera directions M. The remainder of the code is just the math that solves for A.