CS 129 Cours Project: White balancing with flash/no-flash image pairs

Jeroen Chua (jchua)
December 21 2012

Outline

I describe a simple method for automatically white-balancing a base image using a second image taken using flash. In many cases, the difference between the base image and no-flash image contains enough information to directly estimate the spectra of a single light source. This method can be adapted to handle multiple light sources, but here we study the single-light source case. Access to the RAW image, or any other pixel measure that is directly proportional to radiance, is required for this algorithm.

The algorithm uses the additive property of light to first estimate the surface reflectance at each pixel, and uses this information to then estimate the scene illuminant. In particular, the difference between the flash and no-flash image should roughly estimate what the scene would look like under flash alone (ie, without the scene illuminant), and if the spectra of the flash is known, the surface reflectance at each pixel can be estimated. Then, the scene illuminant can be estimated by comparing the estimate surface reflectances at each pixel to the observed pixel values in the base image. The algorithm are as follows, which short explanations of each step given in italics :
  1. Define base image (no flash) B, and flash image F, which are comprised of C color channels (eg, for RGB images, C=3),
  2. Take difference image, D = F-B. This step estimates what the scene would look like under flash alone .
  3. Create difference image mask, M = (D > threshold). This step estimates where in the scene the camera flash "hit" .
  4. For colour channels c={2...C}, compute the colour channel ratios Rc = D1Dc. This step estimates the surface reflectances at each pixel .
  5. For each colour channel ratio, compute Sc = BcRc. This steps produces a per-pixel estimates of the scene illuminant .
  6. For each colour channel ratio, build a histgoram of the values of Sc, and find the location of the maximum, Ec, using only the pixel locations for which the mask, M is True. This step combines the per-pixel estimates of the scene illuminant into a single estimate.
  7. For colour channels c={2...C}, set Nc = Bc*Ec. Set N1 = B1. This step balances out the scene illuminant .
  8. Normalize N so that it has the same overal intensity as the base image, B. N is the white-balanced image. This step ensures the resultant image has the same brightness as the original image .

Note that the scene illuminant in channel c is estimated as 1⁄ Ec, and E1 is defined to be 1. The steps above assume the flash is pure white, and that both the flash and no-flash images were taken using the same exposure/aperture settings. It is possible to modify the algorithm to use flash that is not pure-white, and where the exposure/aperture/ISO settings of the flash/no-flash image pair are not the same. These modifications were omitted to simplify the explanation of the method.

Results

Below are examples of the operation of the above algorithm. Note that some of the flash/no-flash pairs were taken using different exposure times/ISO settings, and that these were adjusted for using a slight modfication of the algorithm mentioned above. The images are shown in RGB space, and the algorithm was run using E1 set to be the red channel. The images, from left to right, are:
  1. Base image, B
  2. White-balanced image, N
  3. Flash image, F
  4. Histogram of red-to-green ratios, S2
  5. Histogram of red-to-blue ratios, S3
The results are discussed after the displayed images.

Click an image to see it at its full size.
Base image, B White-balanced image, N Flash image, F Histogram of red-to-green ratios, S2 Histogram of red-to-blue ratios, S3

Especially bad results



First, note that the algorithm generally produces balanced looking images, although some artifacts are visible, with the most prevalent being artifacts due to specular reflections. For example, in the violin example (row 4), the white-balanced image appears to have a specular blue spot on the chin rest, which should not be there. Another example can be found in the example of the girl in the white jacket (row 7); here the brick wall should be totally white, but it has tinges of red remaining from the unbalanced image. This algorithm also has trouble dealing with multiple light sources. For example, in the picture of the escalator (third last row), the red-to-green histogram (column 4) clearly shows a bimodal distribution, which suggests that either multiple light sources illuminate the scene. The book picture (last row) also demonstrates the problem of multiple light sources, and again, the histograms are not strongly unimodal. Although this algorithm can be adapated to handle such cases by clustering or mixture estimation methods, this was not explored here. Lastly, if the camera flash does not hit a significant portion of the image (last row), then the white-balancing cannot be done reliably.

Note that the algorithm is able to balance images, even when no pixel correspond to pure white (row 1), and when the gray world assumption is grossly violated (row 1). Also note that image alignments, though would be helpful, were not necessary here- some of the images were taken free-hand with no tripod, and no attempt was made to align any of the images. Also note that the algorithm is tolerant to some amount of sensor noise (rows 10,11, pictures of stairs and hallway).

The channel histograms (columns 4,5) are sharply unimodal with heavy tails, which I argue makes picking the mode of the histogram for the illuminant estimates reasonable. Note that if there were multiple light souces, the histograms would not be sharply unimodal. If the multiple light sources are sufficiently distinct and spatially separated, one may see multiple modes in the histograms. These multiple modes may be used to estimate the spectra of all light sources, which can be used to white-balance the image.

Future Work/Model Weaknesses

Although the results above show that this method works well in in some cases, its uses are limited. For instance, this method only works on indoor scenes. I require that the flash of the camera hits a sufficient amount of the scene- the more pixels hit by the flash, the better the estimate of the scene illuminant. For outdoor settings, the objects of interests are typically far away and so, there is little hope of a portable camera flash illuminating much of the scene.

It may be possible to detect multiple light sources with a flash/no-flash image pair, and balance accordingly. In particular, given the Sc's, which gives per-pixel estimates of the scene illuminant, one might instead estimate a mixture model, where each mixture component represents a light source. In general, the total illuminant falling on a pixel is a mixture of the multiple scene illuminants, and so one may imagine an algorithm that iterates between solving for the mixing proportions of the scene illuminants at each pixel, and estimating the scene illuminants. It is also possible to incorporate smoothness constraints, which would encourage nearby pixels to have the same mixing components for the light sources.