CS129 Final Project: Image Colorization with Image Analogies

Bryce Aebi
December 20, 2012
Colorized Output (B')
Input Target (B)
Input Training Image (B)
Input Training Image (A)

Introduction

Big is to little as tall is to short—we are all familiar with written analogies, however the idea can be extended to computational photography as well. Provided training images A and A’, where A’ is a filtered version of A, an image B’ can be synthesized that displays the same effect that transformed A into A’, provided a target image B.



Essentially, image analogies apply a transformation onto a target image, given an example transformation. This can be used for artistic filtering (for example synthesizing a watercolor painting), de-blurring images through detail hallucination, transferring textures or colors, and more. For this project, the primary application was color transfer. A color image and its greyscale variant can be used as training data to colorize a target greyscale image.

Algorithm

A feature vector is generated for each pixel in each input image. Features can include RGB channels, luminance, directional gradients, etc. Luminance alone was used in this implementation. Feature statistics for each B pixel are then compared to statistics for every A pixel and the best matching A pixel is chosen. Then, the feature vector at the corresponding pixel location in A’ is transferred to the pixel in B’ that corresponds to the B pixel.

The ‘best’ pixel is chosen in one of two ways. Each B pixel’s neighborhood is compared with every pixel neighborhood in A. (A pixel neighborhood is the 5x5 block of pixels around a specified pixel). The pixel in A whose neighborhood has the most similar feature vectors to those in the B pixel neighborhood is chosen. Alternatively, a pixel can be chosen from the already synthesized portion of the B pixel’s neighborhood. (Pixels are synthesized in row major order). The methods are called “best approximate match” and “best coherence match” respectively. Although the best approximate matches are numerically better matches, the coherent matches may look better to a human viewer because it helps with the consistency of the image. The first method is called “best approximate match” because it uses an approximate nearest neighbor algorithm to find a best match quickly. Using brute force would take an unreasonable amount of time, as each pixel neighborhood in image B must be compared to every pixel neighborhood in A.

Input Training Image (A')
Input Training Image (A)
Colorized Output (B')
Input Target (B)

Finally, the entire pixel-synthesizing algorithm is performed at multiple scales. For each input image an “image pyramid” is created with each level of the pyramid a scaled version of the image, progressively increasing in size up to the scale of the original image. By generating an image pyramid for B’, starting at the smallest scale, pixel neighborhoods can include pixels in the corresponding neighborhoods smaller levels below. The final B’ image is synthesized last at full scale at the top level of the pyramid.
Colorized Output (B')
Input Target (B)
Input Training Image (B)
Input Training Image (A)

Results

Although most results on this page look nice, the input images were carefully chosen. There are many drawbacks to colorizing greyscale images with image analogies. First, the colorized A’ image may not contain colors suitable for B’. Second, since luminance is used to match pixels, a problem can arise in which different colored pixels in A’ can have similar luminance values. Finally, certain colors associated to a luminance in A’ may be associated to a different luminance in B’. To achieve reasonable results, images containing few colors were chosen. Additionally, these colors were strongly associated with different luminance values. Training images similar to the target image often proved to provide the best results.

Gallery

(Each row contains A, A', B, and B' in that order)