Steve Gomez (steveg) .
April 17, 2010
In this project, we are finding image correspondences automatically and robustly recovering homographies that map the images into a common coordinate space. This allows us to glue together panoramas and other overlapping images without manual feature alignment.
The theoretical components of this project are described here, including the math behind projective mappings, adaptive non-maximal suppressing of image features, feature matching and outlier rejection, and RANSAC for robustly recovering homographies. The results of my implementations are shown below.
We've put together the tools to automatically create arbitrarily-sized panoramas. Some of my successes and failures are shown below. All these images are automatically stitched, so they've had features detected (Harris), matched, filtered, and aligned with homographies. I composite multiple images using Poisson blending, which hides the seams in most mosaics (e.g. kitchen, office). Mosaics with more than two photos are done iteratively by warping images into a growing mosaic.
Failures. Seams are still visible in the mountain photos, where there may be some color aberration at the photo borders. There are also "semantic" seams in both the kitchen and Brown panoramas, which the best homography still leaves things unaligned (e.g. cabinet edges). ALL of my panoramas were taken holding and moving the camera - without a tripod or other rig - so better shots in generally would give less distorted, better aligned mosaics. I noticed this distortion really hindered my ability to create many-photo mosaics, because my incremental process recomputes feature locations and descriptors each time an image is warped. At some point, the warping distortion causes descriptors to fail. This could be improved in the long run by using scale- and rotation-invariant feature descriptors.
Using Homographies in Video
One fun application is that we can project movies or still images into other images given some correspondences. My first attempt was to try for a "Harry Potter"-esque newspaper (with embedded video where photos are normally printed). A couple youtube clips below show my results, which were generally good.
An issue I quickly discovered was inconsistency in the control points I selected manually, however steady-handed I was. I imagine an automatic tracker over the video frames (i.e. using descriptors to find the corners of the newspaper automatically) would also cause substantial shakiness. To solve this, I wrote a script to smooth my point locations over time (triangle filter, width of 5 frames, 'replicate' behavior on border), which significantly improved the 'realism' of the video (though some amount of wobble is obvious when the camera zooms/rotates quickly). Overall, I felt this came out well and gives a bit of an eery effect (esp. with Poisson blending turned on!).
Automatic panorama detection
Another tool I put together finds panoramic pairs inside a directory of random photos and stitches them together. I imagine this could be useful if one were to dump all photos from a recent trip into a directory -- some of them captured as panorama segments -- without wanting to pick through the files.
My basic pipeline for this is to read in all images, build a descriptor for each image based on color regions, and find pairs with a low descriptor comparison error, using the provided dist2 code. My descriptors are tiny images -- photos resized to 32x32 with anti-aliasing. An assumption I make is that there exists one pair of panoramic photos (could be extended to multiple panoramas, or panoramas containing more than two images). This allows me to find the minimum error in the dist2 table and identify its image pair by the table index.
"Tiny images" used as descriptors in test.
From here, I group the pair into a cell array and hand it off to my mosaic
code. The ordering is unimportant because my feature matching makes
no assumptions about spatial overlap, and will uncover which is left/right
automatically. My test result is found below. Notice that it found the
opposite 'ordering' from my above result, so the left-most image is
warped into the right's space, as opposed to above. Both produce valid