Project 5: High Dynamic Range

Due Date: Friday, April 8th, 11:59pm

Brief

This handout: /course/cs129/asgn/proj5/handout/
Stencil code: /course/cs129/asgn/proj5/stencil/
Virtual Dev Machine: /course/cs129/asgn/proj5/vm/
Home Setup: /course/cs129/asgn/proj5/HomeSetup/
Data: Take some pictures!
Handin: cs129_handin proj5
Required files: README, code/, html/, html/index.html

Background

Modern cameras are unable to capture the full dynamic range of commonly encountered real-world scenes. In such scenes, even the best possible photograph will be partially under or over-exposed. Researchers and photographers commonly get around this limitation by combining information from multiple exposures of the same scene. There are few consumer friendly HDR pipelines, though. Some cameras can be configured to exposure bracket a scene, but cameras aren't smart enough to automatically combine the resulting exposures. Luckily, you have a computational camera which can quickly and intelligently capture and combine multiple exposures. In this assignment, you will use FCam to have near total control of the exposure settings.

Requirements

There are two pipelines for this assignment. Each group is required to implement both. You should read the paper that inspired each pipeline. Even though we discussed them in class, there are additional relevant details in the papers.

1) In the spirit of Debevec and Malik 1997, you are required to combine multiple exposures into a high dynamic range radiance map and then use a global tone mapping operator to create a low dynamic range visualization of your HDR radiance map. Luckily, our Nokia N900 and FCam allow you to capture "raw" images in which pixel values are (nearly) linearly proportional to exposure. This makes the computation of g, the inverse of the function mapping exposure to pixel value, unnecessary. This makes the computation of the HDR radiance map easier but not entirely trivial. You will still want to consult equations 5 and 6 in the Debevec paper to know which pixels to trust in each exposure. Once you have the radiance at each pixel, you need to tone map this to an appropriate range for display. Consider using a global tone-mapping operator, such as Reinhart's, or implement a local one for extra credit.

2) In the spirit of Exposure Fusion, Mertens et al., you will fuse multiple exposures into a single, detailed composite without ever explicitly computing an HDR radiance map. This approach doesn't care about the exposure times. It doesn't even care if the flash fired. It's just trying to composite the high contrast, well exposed pieces of the various exposures. The goal is to decide which pixels in each exposure are trustworthy using some simple heuristics and create a weighted composite of the exposures according to that trustworthiness. It's one of those things that seems to work poorly in theory, but well in practice. But the method is sensitive to the choice of particular input photos (whereas pipeline 1 is not sensitive, assuming all regions appear well-exposed at least once).

Details

Additional Phone Setup

Before doing anything else you will need to add libraries to your phone. To do this open your phones xterm and do the following: sudo gainroot apt-get install libcv4 libcvaux4 libhighgui4

First Pipeline

We want to build an HDR radiance map from several LDR exposures. The observed pixel value Z_ij for pixel i and exposure j is a function of unknown scene radiance and known exposure duration: Z_ij = f(E_i Δ t_j ). Note that E_i is the scene radiance at pixel i, and scene radiance integrated over some time E_i Δ t_j is the exposure at a given pixel. In general, f might be a somewhat complicated pixel response curve. Luckily, we can capture raw images and start by assuming that f is an identify function and leave it out.

Rearranging this equation and taking the natural log of each side, we get ln(E_i) = ln(Z_ij)-ln(Δ t_j). This is a simplified version of Equation 5 in Debevec.

Each exposure only gives us trustworthy information about certain pixels (i.e. the well exposed pixels for that image). For dark pixels the relative contribution of noise is high and for bright pixels the sensor may have been saturated. To make our estimates of E_i more accurate we need to weight the contribution of each pixel according to Equation 6 in Debevec. An example of a weighting function w is a triangle function that peaks at Z=127.5, and is zero at Z=0 and Z=255.

Getting the radiance map is only half the battle. You want to be able to show off your image clearly. There are a few gobal tone-mapping operators to play with, such as log(L), sqrt(L), and L / (1+L). Regardless of which transform you use, you'll want to stretch the intensity values in the resulting image to fill the [0 255] range for maximum contrast.

Second Pipeline

In the Exposure Fusion pipeline, there are two primary algorithmic challenges: 1) constructing weight maps which indicate the relative contribution of pixels in each exposure to the final composite and 2) blending the individual exposures in the gradient domain according to these weight maps.

The Exposure Fusion paper suggests measuring three properties when building weight maps. Weights are generated using three measures of pixel quality in their corresponding image. Each of the three measures can be thought of as producing a weight map.

The first property, contrast, is measured by filtering the intensities of a given image with the Laplacian filter (where will this assign the highest weights?). The next pixel quality measure captures the saturation of pixels in an image by computing the standard deviation of the color channels for each pixel. The final measure tries to capture the degree to which an input image is well exposed. This measure can be implemented by taking the product of the Gaussian falloff from 0.5 of the intensities for each channel, e.g. exp{-(p_i-0.5) / (2s²)} where s=0.2

The final weights are constructed by taking the product of the weight maps specified by the quality measures. Within the product the weights from a given quality measure are raised to the power of a constant to adjust their relative importance. In the example above the all three constants were one. The combined weight maps for all exposures should be normalized such that they sum to one at any pixel location.

At this point, one could produce a final composite by summing the input exposures multiplied by their weight maps. However, this approach works poorly (see below). What kinds of problems are we seeing and why? We can create a better composite by fusing the exposures in the gradient domain, as in project 2, except the suggested machinery in this case is Laplacian pyramid blending rather than Poisson blending because we have real valued weights instead of binary weights. See section 3.1 in the Exposure Fusion paper for more details.

Implementation Tips

While this handout describes your assignment at a high level, and the papers describe more algorithmic detail, the actual implementation on the phone will be non-trivial. You'll be using FCam for image acquisition and OpenCV for image analysis. Here is the OpenCV reference. We have given you starter code to try and make your work easier, but there will be a learning curve. Hopefully you did the optional phone project!
To create results which are clearly better than any single exposure, make sure your scene actually has a high dynamic range! Try taking photos that have combinations of indoor and outdoor elements, or scenes that are heavily back-lit, possibly with the light source directly visible.
You'll need to develop a photo acquisition strategy (e.g. which range of exposure times to use). Ideally this strategy would depend on the scene itself. A simple strategy would be to take one automatically metered photo, and then one much shorter photo and one much longer photo. This is similar to exposure bracketing as implemented on many cameras. An example of a more advanced strategy is here.
For your HDR pipeline to work while the phone is hand-held, all of the exposures will need to be captured in a relatively short time (less than 100ms). This would probably be fine for outdoor scenes. For indoor scenes, some of the exposures will need to be longer so you might need a tripod. You can get away with longer capture programs if you perform geometric alignment of the images.
Raw images are unfiltered sensor values which are hopefully directly proportional to exposure. These values have not even been corrected to take into account the Bayer filter on top of the sensor -- you only get 1 measurement per pixel, not 3 measurements as in a RGB image (although for almost all imaging systems, these 3 measurements come from a Bayer demosaicing algorithm, not from actual independent samples). You can use cvCvtColor() to build a color image from the raw values.

Extra Credit

Determine the set of exposures to capture for each scene at run-time. For instance, an algorithm might take one exposure, rapidly analyze it, decide which part of the dynamic range to capture next, and iterate until some maximum time budget or until all of the dynamic range in the scene is captured with some confidence. Algorithms which uniformly sample the space of exposures do not satisfy this extra credit (e.g. exposure bracketing).
Implement a local tone-mapping algorithm on your phone such as Durand and Dorsey which we discussed in class.
Instead of assuming that the input exposures are geometrically aligned, explicitly try to align the images before compositing them. In the simplest case you can search over the space of relative translations using an approach analogous to project 1. You might also try searching rotations.
Implement additional pixel quality measures for Exposure Fusion and show that they increase result quality for certain scene types.

For all extra credit, be sure to demonstrate on your web page cases where your extra credit has improved image quality.

Graduate Credit

Groups contain mixtures of graduate and undergraduate students, so there are no distinct graduate requirements.

Write up

Each group will produce a single hand in. Make sure you say who is in your group. Feel free to name your group, as well. Only one group member needs to run the handin script.

Describe your algorithm and any decisions you made to write your algorithm a particular way. Show and compare the results of your two algorithms. Also discuss any extra credit you did. Feel free to add any other information you feel is relevant.

Because you are building a (hopefully) interactive system, it would be compelling to show videos of actual usage in different scenes.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

README - text file containing anything about the project that you want to tell the TAs
code/ - directory containing all your code for this assignment
html/ - directory containing all your html report for this assignment (including images)
html/index.html - home page for your results

Then run: cs129_handin proj5
If it is not in your path, you can run it directly: /course/cs129/bin/cs129_handin proj5

Rubric

+40 pts: HDR from raw images and global tone mapping
+45 pts: Exposure Fusion with Laplacian pyramid fusion
+15 pts: Write up
-5*n pts: Lose 5 points for every time (after the first) you do not follow the instructions for the hand in format

Credits

Project partially based on Noah Snavely's Computer Vision course at Cornell University. Handout written by David Dufresne, Travis Webb, and James Hays.