Computer Vision, Project 1: Hybrid Images
Bryce Richards
Project Description: Hybrid images are images that have two distinct interpretations: they look like one thing when viewed up close, and like another when viewed at a distance. Hybrid images are created by combining the high frequency part of one picture with the low frequency part of another. Since high frequency signals dominate perception when they're available, we see the high-frequency image from up close. When we back up (or, equivalently, shrink the image), only the smoother, low-frequency part of the image is visible, so we see the second picture. The purpose of this project was to generate several of these hybrid images.
Algorithm Design: On a high level, the algorithm proceeds as follows. Two images are loaded, aligned (as defined by two points on each image inputted by the user), and cropped to be the same size. Next, the Gaussian and Laplacian image pyramids of the two images are generated. (The Gaussian pyramid of an image is formed by successively applying a Gaussian filter and downsizing. The image in the Laplacian pyramid at level i defined as follows: take the image in the Gaussian pyramid at level i, apply a Gaussian filter, and subtract the resulting filtered image from the original image at Gaussian level i.) After generating the pyramids, add the first L levels of one image's Laplacian pyramid (this will be the dominating, high-frequency image) to the last N-L levels of the second image's Laplacian, and then add the last image from the second image's Gaussian. When adding the images together, it is necessary to resize them so that they all have the same dimension.
Most of the creativity in the coding of this project came in how to generate the pyramids. The size and variance of the Gaussian filter, the number of levels in the pyramids, the downsizing ratio for each step of the pyramid: all of these needed assignment. I had no systematic way of setting these parameters; it was all trial-and-error, and they varied from one hybrid image to the next. When downsizing, I typically chose a factor of about (.02)^(1/N), where N is the number of levels in the Gaussian. This way, the final image of the Gaussian would always be one-fiftieth the size of the first image, no matter how many levels I chose. Also, when downsizing I usually used the 'bilinear' interpolation, since this seemed to smooth out the downsizing some. I also used it when upsizing, for the sake of consistency. Throughout this program, I made heavy use of predefined MATLAB functions. It might be possible to get slightly better results by hard-coding some of the functions myself, but a) I doubt it and b) that would take forever.