The Empire State Building? Look closely.
In this project, I implemented "Hybrid Images", a paper published at SIGGRAPH 2006 by Aude Oliva, Antonio Torralba and Philippe. G. Schyns. The paper describes a simple method for combining two images into a single image which provides viewers with different perceptions based on the size of the image. For the full paper, please click here.
I'll be describing the algorithm as well as showcasing some of the images I've created in this web report. Because it is hard to predict how big your screen size/zoom is, the best way to view these hybrid images is to get real close to a zoomed-in version of the image (press CTRL + on Windows) and to move a little further from your screen to view the zoomed-out version of the image. You should see two entirely different images if you're doing it right!
First I align both images through a series of transformations such that a user-determined point 1 on image 1 corresponds to another user-determined point 2 on image 2. Initially I modified my code from project 6 to find the homography, but later I used the source code for image alignment found here instead because my results were poor. Perhaps homography was overkill in this case. The third image below is the image obtained from homography, the fourth image is the image obtained from image alignment.
After aligning the images, I created two pyramids (each consisting of several downsampled images) for each image. The first pyramid is a gaussian pyramid, and each level of the pyramid consists of an image that has been gaussian-filtered and downsampled from the previous level. The second pyramid is a pyramid containing an image that represents the difference between the image at the previous level in the gaussian pyramid and the image at the next level im the pyramid.
The hybrid image is obtained by taking the top few levels of the "difference" pyramid for image 1, and the bottom few levels of the "difference" pyramid for image 2, plus the very top level of the gaussian pyramid for image 2. I vary the "cutoff" level for each picture to obtain the best results.
The first two columns are the original images, and the last two columns are the combined image with the illusion at different sizes.
![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() |
Some images do not look as good, probably because of the transformation that aligns both images together. It might produce artifacts like an image border cutting across the final image, for example. Also, color images do not look as good probably because the colors provided important visual clues to viewers. Finally, having to vary the cutoff for each image was annoying. One example of a failure case is provided below.
A possible application not mentioned in the paper might be clever advertising, where the message changes based on the viewer's proximity to the ad. This could be used along highways for example, to create a "dynamic" billboard that costs less than electronic billboards. Some examples of these ads are provided below.