This algorithm explores image blending by gradient-domain processing, allowing a user to implant a region of a source image into a target image. The most basic implementation of such an algorithm would directly copy the source pixels into the target image, but for obvious reasons, the resulting image is less than convincing. The most noticeable problem with pixel copying is that it creates very noticeable seams, or high frequency pixel areas, at the edges of the copied region. To create a more perceptually subtle blending process, it must be noted that human visual perception is more sensitive to gradients than to individual intensities. Therefore, to create a seamless blend, the original pixel gradients of both the target image and the copied source region must be preserved as much as possible.
The main steps of the algorithm are laid out below, with additional detail given for the inner steps of the imblend function, which performs the image blending. The objective of the imblend function is to find the vector x in the equation Ax = b, where A is a matrix representing unknown gradients, and b is a vector representing the desired output gradient and values. The vector x will be the final blended image.
To increase efficiency and decrease runtime, the values used to build the A matrix were precomputed. To that end:
A realignment step was added to the basic pipeline to fine-tune the hard-coded offsets through a greedy local-minimum search. Starting from the hard-coded alignment, each one-pixel move, in the 8 possible directions, was scored by taking the sum of squared differences between the source and target laplacian gradients, within the mask region. At each iteration, the best-scoring move became the new offset and this iterative refinement was repeated until a local minimum was found. In general, this realignment produced relatively small adjustments, but in some cases the effects were more noticeable, better aligning edges in the mask region:
Left: original offset. Right: after realignment, the rainbow aligns better with the left edge.
The results of the algorithm's blending operation are displayed below. The target, mask, and source images are displayed to the left, followed by the output of the direct pixel-copying strategy, and the final image, after blending and realignment, is displayed to the far right. Failure cases are presented and addressed after.
In this first failure case, the blending operation works fairly well to replicate the overall background color, but the blended region obstructs some high-frequency areas in the target image, creating obvious visual defects. The algorithm fails to recognize and preserve these areas of the target image, instead preserving the gradient of the source image. Since the algorithm is founded on human perceptual sensitivity to gradients, destroying the sharp gradients of strong edges in the target image undermines the performance of the algorithm. To counteract such effects, it would be possible to preserve the stronger gradients in both the target and source images.
In this second failure case, the source image appears to blend too much, and it takes on a ghostly appearance. Fortunately, that sort of works for the particular image contents. But, this case illustrates a potential limitation of the algorithm: in most of the results, the algorithm shifts the overall hue of the source content towards that of the surrounding target image. In the trebuchet image above, the wood of the source image takes on a green hue from the grass of the target image. In the target image below, the dominant color is blue, so the blue channel of the source image will be emphasized. However, the source image does not have a very strongly expressed blue channel, so when the blue channel is emphasized, to meet the intensity of the blue channel in the target image, it dominates the other channels, which are also receding due to weak red and green channels in the target image. This weakness of the algorithm could possibly be addressed by taking the gradient of all channels together instead of one channel at a time and minimizing the overall difference between gradients.