May 17, 2010

Modern cameras have a low dynamic range; that is, they do a poor job representing a wide range of lighting conditions in a scene. In a scene with bright and dark areas, most cameras will take photographs that are either overexposed in the bright regions, or underexposed in the dark ones.

For my final project, I implemented the algorithm outlined in Debevec and Malik '97, which turns pictures of the same scene taken with different exposures in to a single image which better captures the true dynamic range of the scene.

The first step in the algorithm is to recover the response function, *g*, for each color channel in the image. Every pixel value in the image, Z_{ij}, can be though of as the output of some function, f, which takes as input the amount of light hitting that point. Since the amount of light that hits a point on the sensor is the product of the radiance of that point, E_{i} and the exposure time, Δt_{j}, we get the formula Z_{ij} = f(E_{i}Δt_{j}). This means that f^{-1}(Z_{ij}) = E_{i}Δt_{j}, and thus ln(f^{-1}(Z_{ij})) = ln(E_{i}) + ln(Δt_{j}). For simplicity, let g = ln f^{-1}. This function, g, is called the response curve.

To solve for g, we want to find a function defined for all integers between 0 and 255 (the possible values for a pixel), which minimizes the following function: Σ^{i=1...N}Σ^{j=1...P}[g(Z_{ij}) - ln E_{i} - lnΔt_{j}]^{2}. That is, it minimizes the difference between g(Z_{ij})and the actual response value of the pixel in each photograph.

To further improve the results, we can add two things to our equation: a weighting function and a smoothing term. Since we want more pixels of moderate intensity, we can define a weighting function w(z) = {z for x < 128, and 255-z for z >= 128}. This will have the effect of favoring pixel values not at either extreme. We can also add a smoothing term λΣ^{z=0...255} [w(z)g^{''}(z)]^{2}, which will try to minimize the second derivative of g, to make it smoother.
This gives us the new cost function:

Σ^{i=1...N}Σ^{j=1...P}{w(Z_{ij})[g(Z_{ij}) - ln E_{i} - lnΔt_{j}]}^{2} + λΣ^{z=0...255} [w(z)g^{''}(z)]^{2}

This equation can be minimized using SVD.

Once *g* has been computed, it can be used to calculate the radiance map for the image. Since g(Z_{ij}) = ln E_{i} + ln Δt_{j}, we can compute the radiance of pixel *i* as ln E_{i} = g(Z_{ij}) - ln Δt_{j}.

To get a more robust value for E_{i}, we sum g(Z_{ij}) - ln Δt_{j} for each image, and weight each term using the weighting function *w* we used for computing *g*. This yields the following formula for the radiance:

ln *E _{i}* = (Σ

Evaluating this formula at every pixel for each color channel gives us a radiance map for the whole image.

Once the radiance map has been recovered, it needs to be converted to a form that can be properly displayed on a device with lower dynamic range (like a monitor). Since the radiance values are unbounded (they can be any number >= 0), they need to be compressed to be between 0 and 1. A naive way to do this would be simply to divide by the largest radiance value in the image. The problem with this approach is that there is likely to be one area of the scene which is much brighter than the rest, and dividing by this radiance value will cause most of the scene to be very dark. Instead, we can divide the radiance at each pixel, L, by (1+L). This scales each radiance value to a number between 0 and 1 without over/under exposing a large portion of the scene.

Although global tone mapping is simple and works fairly well, it does have some shortcomings. Since it doesn't take any local information into account, many areas of the scene end up darker than if they had been manually adjusted.

I implemented local tone mapping using the bilateral filtering method outlined in Durand and Dorsey '02. This method works by compressing the range of the intensity of the image. This algorithm computes the log of the intensity of the image, and then scales it to a fixed size. This causes blurring in the image, which is undesirable. To overcome this, the paper recommends using bilateral filtering.

A bilateral filter is an edge preserving filter. It preforms a Gaussian blur, but only on areas which are sufficiently similar, so the borders between regions are preserved. Instead of blurring the whole intensity, the Durand and Dorsey algorithm uses the bilateral filter to compute the "base" image. To get the high frequency information, or "detail", it divides the original intensity by the base. The base is then compressed and recombined with the detail to get the compressed intensity. Finally, the intensity is recombined with the color to form the result image.

The image below shows the difference that only compressing the range of the base image makes, compared to compressing the range of all of the intensity:

I tried my algorithms on both the Memorial Church scene used by Debevec and five scenes that I shot myself. I was pleased with how the results turned out for the Memorial Church. My radiance map and global tone mapping looked very similar to the images in the paper, and the image produced by the local tone mapping showed a definite improvement in the darker areas of the scene (especially the top right of the image).

My scenes did not work quite as well. I can think of two main reasons why this may be the case. First, I only took 3 pictures per scene, while the Memorial Church scene has 16. Since there are fewer pictures, it is harder to capture the entire dynamic range of the scene.

Secondly, the exposure times that the camera claimed it had used didn't seem to be accurate. Using the exposure values stored in the metadata of the pictures caused the scenes to be vastly overexposed, so I had to manually estimate the exposure times for each pictures. The errors that my estimations introduced are most likely to blame for the poor results.

Memorial Church | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |

My Room | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |

Fishbowl | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |

3rd Floor Atrium | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |

CIT Lobby | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |

Geochem Building | |
---|---|

Input Images | Radiance Map |

Global Tone Mapping | Local Tone Mapping |