CS 129 Project 1 Writeup

Jeroen Chua (jchua)
September 17 2012

Outline

I describe a method to align three grey-scale images taken of a single scene taken using different filters (red, blue, and green) to produce a realistic, RGB color image. The method uses a coarse-to-fine strategy to estimate image alignments as a similarity transform (4-vector of [y-translation, x-translation, rotation, and scaling]). The SSD objective function with gradient-related features is used to estimate alignment quality.

The algorithm operates as follows:

Estimate alignment of channels using coarse-to-fine strategy, assuming a similarity transformation between channels
Automatically crop image borders
Enhance contrast
Perform white balancing

Gradient features

Instead of working with raw pixel intensities, I use gradient statistics. Here, the image is filtered according to two Sobel filters (for horizontal and vertical gradients), and the filter responses are thresholded to produce binary image representations. Responses are thresholded for two reasons: 1) I wish to align only "salient" (high) response regions, but the image contains many small responses which tend to dominate the SSD objective function. Thresholding responses allows me to concentrate on only the most salient edge fetures. 2) Binary images can be more efficient to work with (although here, I don't take advantage of the binary format).

Similarity transform estimation

The alignment between two channels is estimated as a similarity transform (translation, rotation, scaling). The estimate is performed using a coarse-to-fine strategy; an alignment is estimated at the coarsest of an image pyramid, and the alignment estimate is used to initialize the next (finer) level of the pyramid. The finest alignment is performed on an image of 1/4 the resolution of the original image (ie, NOT full resolution). Although alignment at the finest level is possible (it'll take maybe 15 mins per large image, instead of 15 seconds or so), I found the results were qualitatively similar.

An iterative algorithm is employed to perform the warp estimate at a given level of the pyramid. Using SSD as the objective function, the objective is linearalized around the current warp estimate (ie, Taylor expansion around the current warp estimate). The warp parameters are updated by setting the gradients of the linearlized objective function to 0 and solving, which can be done in closed form. For each iteration, the quantities of spatial derivatives and the Jacobian at each pixel are required, which are cheap to compute. This algorithm is similar to the optical flow estimation algorithm.

We note that the fitted transforms indicate that there were no significant rotations or scalings between image channels (max rotation found: 0.1 degree, max scaling: 0.993). So, for this dataset, aligning using a similarity transform is not much different than just considering translations.

Automatic cropping

First, vertical and horizontal edge responses at each pixel in the image are found by convolving the image with a Sobel filter. The vertical borders are found by finding the locations along the x-axis of higest horizontal edge response within 10% of the image's edges. Horizontal borders are found in a similar fashion.

Before/after cropping

Contrast enhancement

The image is first convolved with a Gaussian kernel to blur the image, and then convolved with a Laplacian filter to accentuate high-frequency components. Finally, pixels are linearly scaled to increase the mid-dynamic range of the image.

Before/after contrast enhancement

Note the sharpening of the vertical edges on the train, and the accentuation of thin structures of the chalice and ornate book cover.

Automatic white-balance

Automatic white balance is performed as proposed in , "Automatic White Balance for Digital Still Cameras"
. In short, the image is first converted to YCrCb color space, and the value of sqrt([Cr]^2 + [Cb]^2) is compared to a specified threshold. The RGB values of pixels that are below this specified threshold are then averaged, resulting in a 3-vector of average RGB values. These values are then used to normalize each of the RGB channels.

Before/after white balancing

Note that above, the before image is too yellow. The image is less yellow after balancing.

Results images