CS129 / Project 2 / Image Blending

Do you know that Bill Gates is a cat lover? (Ok. Maybe he isn't. But Einstein indeed is.)

Overview

In this project, a gradient domain image blending method has been implemented and tested. Three extra composites have been created to demonstrate the implementation and illustrate its shortcomings. A Laplacian Pyramid method has also been implemented and compared with the gradient domain method.

Implementation

Gradient domain method

The intuition behind the gradient domain method is that when combining two images that could pontentially differ a lot in terms of the overall intensity, instead of trying to preserve the original intensities of the source pixels, we try to preserve the gradients so that the source image may blend into the target image more smoothly. This intuition could be formulated as solving a system of equations, where an equation puts a constrain on one of the pixels in the result image so that:

  1. if that pixel is not covered by the mask, then the value of the pixel equals the value of the corresponding pixel in the target image
  2. otherwise, the sum of the difference between that pixel and each of his four neighbors are the same as the sum of the difference between the corresponding pixel in the source image and its four neighbors in the source image

Solving this system of equations will yield an array of pixels for the result image. When implemented in matlab, the matlab function sparse is used to construct the matrix A that corresponds to the system of equations. To construct A, two vectors are created to represent the row index and column index of the possibly non-zero elements respectively. To create the row index vector, a vector has been created to hold the consecutive number sequence from 1 to the number of pixels in the image, and then repmat is called to replicate it for 5 times. The resulted matrix is then collapsed to get the row index vector. This is because for each pixel in the image, there are at most 5 unknown elements in its corresponding equation. Similarly, the column index vector is created by first creating a vector that has the same length as the number of pixels in the image, replicating the vector, shifting the resulted vectors to get the column positions of the surrounding pixels for each pixel, and finally collapsing the matrix to get a column index vector. The value for the elements indexed by the row index vector and column index vector can also be obtained in a similar fashion. This way, the matrix A can be constructed efficiently, and it takes perhaps less than one second for the program to process each test image. The vector b that represents the solution for the system of equations is constructed directly by copying the non-masked pixels from the target image and calculating the masked pixels by taking the sum of differences between the corresponding source pixel and its neighbors.

In addition, the boundary rows and columns of the images have been identified in the implementation so that the program will only consider three neighbors for a pixel that is on the image boundary instead of having an out of boundary error or mistakenly taking the first pixel from the next row as the right neighbor of a rightmost pixel.

Laplacian Pyramid method

In the implementation of the Laplacian Pyramid method, two Laplacian pyramids are constructed for the source and the target image respectively. The two pyramids are then blended based on a corresponding Gaussian pyramid for the mask. To construct the blended image, the algorithm starts from the top level in the blended Laplacian pyramid, combines the top most two levels, takes the combined image to combine it with the third level image, and so on, until all the levels in the blended Laplacian pyramid have been combined to produce the blended image. I have experimented with different sizes for the Gaussian filter, and the final parameter is 20. The Laplacian pyramids have 20 levels.

Results

Target Mask Source Copy and Paste Gradient domain Laplacian Pyramid

Comparison of the two methods

When the cat is positioned next to the dark night sky, it is turned into a black, shadowy cat..

Gradient domain image blending actually does a good job for most of the images tested. One downside is that the color of the source image could be easily influenced by the color of the surrounding target image pixels, which may result in highly undesirable effect (like the shadowy cat on the left).

One of the advantage of Laplacian Pyramid is that it preserves the color of the source image. For the chipmunk composite, the Laplacian Pyramid method has a significantly better effect than the gradient domain method. This property may not always be desirable, though. For the Bill Gates composite, I personally think that though the gradient method has changed the color of the source image, it is actually make it look better as the lighting condition on the cat seems to be closer to that of the office.

Another property of Laplacian Pyramid method is that the effect depends on the size of the Gaussian filter, but not monotonically. If the size of the filter is too small, you might get clear boundary between the source and the target image. If the size of the filter is too large, you might get artifacts at the positions of the boundary of the source image (like in the bear image and the plane image).