Hybrid Images Report Course Website

Vazheh Moussavi (vmoussav)

Project Description

Simply put, a hybrid image is an image that appears to represent one entity from near and another from afar, by mixing the low frequencies of one image with the high ones of an other. In this assignment, we had to come up with an implementation of hyrbrid images based off of that described in the paper by Oliva, Torralba, and Schyns. This procedure begins by building a Laplacian Pyramid, as first described by Burt and Adelson. Specifically, the gaussian-like blur proposed in their paper is represented (in one dimension) as the binomial kernel:

Using this kernel, the building blocks of the Laplacian Pyramid procedure (pyrUp & pyrDown), can be conceived (Downsampling is just removing even rows and columns, and Upsampling is just filling them with zeros, à la OpenCV):

From here, we can now describe algorithms for computing the Laplacians of an image at each resolution, (together making the pyramid), to losslessly reconstruct the original image:

To make a hyrbid image, we just need the Laplacian Pyramids of two different images, and given a "cutoff frequency", we only need to figure out what resolution level the cutoff frequency corresponds to, and switch pyramids for reconstruction.

Walkthrough (Cat/Dog Example)

Let's say we want to make a hybrid image of a cat and dog (in grayscale for now :P ).

We begin by first computing the Gaussian and Laplacian Pyramids for both cat and dog:

Because we want to take take high frequencies of one image and low frequencies of the other, we're hoping that the cat's frequencies don't interfere too heavily with the dog's (or that at least they have a decent non-overlapping area). Let's take a look at the two images in the frequency domain (cat left, dog right):

Clearly, there's some non-overlapping area towards the corners (high frequencies). This is good news, and we should get a decent hybrid image out of this:

Some Other Results

Marilynstein turned out alright too, the one problem being that you have have to be significantly far away to see Albert's face clearly.

Because of their somewhat similar shape and size, I tried mixing the rhino with the car. Again, you have to be pretty far to clearly see the car without interference from the rhino. Also, the blue from the car kinda looks awkward on the rhino. I guess I wasn't clever enough.

Sometimes seeing doesn't tell the whole story.

(Attempted) Extension: Private Text

In the Hybrid Images, paper, an extension is described towards images of text. Here is the example given in the paper:

For text, there are two major differences presented from natural images: the high frequencies of text (and the necessary change of blurring kernel corresponding to these high frequencies), and the need of an equivalent low-pass version of text. For the kernel, it is claimed that the sigma value used in the gaussian should be less than the number of pixels in a stroke of a letter in the text. Citing Portilla and Simoncelli, they go on to generate a low-pass "texture mask" which should essentially should be blobs similar to text. I was able to find some software from the project page that attempts to generate said texture. I played around with several samples of text, and nothing came out quite as "blobby" as what appeared to be used in the above image.

So the text "texture" still has high frequencies, and post-process blurring did not do much to help make it look any "blobbier". If you're curious, here's what the Gaussian and Laplacian Pyramids look like:

Sticking it into my hybrid image code with different sigma supports, this was the closest to "private" that I got:

, which is only partly true.