Computer Vision, Project 1: Hybrid Images

Bryce Richards

Project Description: Hybrid images are images that have two distinct interpretations: they look like one thing when viewed up close, and like another when viewed at a distance. Hybrid images are created by combining the high frequency part of one picture with the low frequency part of another. Since high frequency signals dominate perception when they're available, we see the high-frequency image from up close. When we back up (or, equivalently, shrink the image), only the smoother, low-frequency part of the image is visible, so we see the second picture. The purpose of this project was to generate several of these hybrid images.

Algorithm Design: On a high level, the algorithm proceeds as follows. Two images are loaded, aligned (as defined by two points on each image inputted by the user), and cropped to be the same size. Next, the Gaussian and Laplacian image pyramids of the two images are generated. (The Gaussian pyramid of an image is formed by successively applying a Gaussian filter and downsizing. The image in the Laplacian pyramid at level i defined as follows: take the image in the Gaussian pyramid at level i, apply a Gaussian filter, and subtract the resulting filtered image from the original image at Gaussian level i.) After generating the pyramids, add the first L levels of one image's Laplacian pyramid (this will be the dominating, high-frequency image) to the last N-L levels of the second image's Laplacian, and then add the last image from the second image's Gaussian. When adding the images together, it is necessary to resize them so that they all have the same dimension.

Most of the creativity in the coding of this project came in how to generate the pyramids. The size and variance of the Gaussian filter, the number of levels in the pyramids, the downsizing ratio for each step of the pyramid: all of these needed assignment. I had no systematic way of setting these parameters; it was all trial-and-error, and they varied from one hybrid image to the next. When downsizing, I typically chose a factor of about (.02)^(1/N), where N is the number of levels in the Gaussian. This way, the final image of the Gaussian would always be one-fiftieth the size of the first image, no matter how many levels I chose. Also, when downsizing I usually used the 'bilinear' interpolation, since this seemed to smooth out the downsizing some. I also used it when upsizing, for the sake of consistency. Throughout this program, I made heavy use of predefined MATLAB functions. It might be possible to get slightly better results by hard-coding some of the functions myself, but a) I doubt it and b) that would take forever.

Results

CATDOG. The example on the class's project webpage is of the same two pictures, but with the cat as the dominating high-frequency picture.

DOGGYSTEIN. I left it uncropped to make it artistic, not because I forgot. Duh.

CATMAN. The man's hair blends in with the cat's ear, and his face is directly over the cat's face. This image is effective because the low-frequency man could be interpretted as just different colors of the cat's fur.

NEW PHONE, OLD PHONE. I tried to make the low-frequency rotary phone's receiver blend in with the top of the high-frequency Blackberry. This was honestly my worst hybrid image, but I liked the future-to-past concept.

GHOST CLUTTER. The other images are dominated by the high-frequency image. Even in the smaller images, the high-frequency objects are still subtly perceptible. In this hybrid image, I tried to make the high-frequency image faint and ghostly.