CS129 / Final Project / Colorization by Example

Colorization is the process of generating a colored image given a gray scale input image. In this project, the colorization process is implemented based on Colorization by Example by Irony et.al.. The pipeline requires two inputs: one gray scale image (the target) to be colored, and a colored image that serves as the reference. The overview of the pipeline is shown below.

Image Segmentation

The original Irony paper assumes that the reference image has already been segmented and the segmentation is available as the program input. To automate the process, here we have used an existing image segmentation implementation based on Statistical Region Merging by Nock and Nielsen.

Reference imageSegmentation of the reference image

Feature Generation

Feature generation consists of two steps: 1) transform each pixel into a feature vector; 2) reduce the dimensions of the generated features. As suggested by the paper, the Discrete Cosine Transform (DCT) coefficients of a k by k neighborhodd around each pixel is used as its feature vector. The advantage of using DCT coefficients is that they are not too sensitive to translations and rotations. The DCT transform is applied only to the luminance channel of the reference image. After obtaining the feature vectors, they are further transformed into a lower dimensional sub-space using Principle Components Analysis (PCA), as the paper suggests dimension reduction as a way to improve classification accuracy. When testing the algorithm, we do notice that a better result is usually achieved by performing classification on features after dimension reduction instead of on the original feature set. The dct function in the Matlab image processing toolbox is used to generate the DCT coefficients, and Matlab's princomp function is used when performing PCA. A 5 x 5 neighborhood is used when performing DCT. For classification, usually the first half of the principal components are used, though that varies with different image sets.

Classification

After stage I and II, we obtain a training data set (features that correspond to pixels from the reference image) with labels (from the segmentation of the reference image) and a set of data points, which corresponds to pixels from the target image, to be labeled. In this stage, the data points are first classified using a k-nearest neighbor (knn) classification algorithm. After performing a naive knn, though, many data points are misclassified. To alleviate this problem, an image space voting is performed. The intuition behind image space voting is that the labels of the pixels around a given pixel p should be considered when deciding the label for p. More precisely, we define the neighborhood of p to be N(p), and we would like to examine each pixel, and replace its label by the dominant label in N(p). We further define the "confidence score" for a pixel p to have label l:

where N(p,l) is the set of pixels in N(p) with the label l, and

is the weight for each pixel. M_q is the nearest neighbor of q in the feature space that has the same label as q, and D(q, M_q) is the distance between q and its best match. The weights are computed and stored while performing Knn.

Labeled target image after KnnLabeled target image after image space voting

Colorization

The final stage is to color the gray scale input image once the label for each pixel in it has been finalized. The first step in this stage is to generate "micro-scribbles", which are colors assigned to pixels with high confidence scores. The reason for generating "micro-scribbles" before performing the full colorization is to further alleviate the problem of misclassification. When colorizing the testing images, the threshold for the confidence score ranges from 0.5 to 0.9. The color assignment C(p) for a pixel p is calculated by:

where M_q(p) denotes the pixel in L whose position with respect to M_q is the same as the position of p with respect to q.

After generating the micro-scribbles, the optimization method in Levin 2004 is used to propogate the color in the micro-scribbles to color the entire image. The intuition here is that we would like to minimize the difference between each pixel p and the weighted average of the colors of its neighbors. Formally, we would like to minimize the following objective:

where

This then boils down to solve a system of equations that express the constraints on each pixel in the target image.

Confidence scoresTarget image with micro scribblesFinal colored target image

More results

TargetReferenceLabeled targetResult

Failure cases

The algorithm does not work well when the overall intensity in the target and the reference images do not match well. Also, complex scenes are likely to fail because of the difficulties in both producing good segmentation and classification.

TargetReferenceLabeled targetResult