By the end of the lab, students should be able to:
Stencil code: HERE.
Sample RAW image: HERE.
JPEG images are the 'ready to view' processed outputs from a camera. In computational photography, it can be useful to work directly with the raw sensor data from a digital camera. So-called RAW processing and RAW files must generally be processed before they can be displayed. In this lab, we will explore the nature of the raw sensor data, and implement our own RAW image reader.
‘RAW’ is a class of computer files which typically contain an uncompressed image containing both the sensor pixel values and a large amount of meta-information about the image generated by the camera (the EXIF data). RAW files themselves come in many proprietary file formats (Nikon’s .NEF, Canon’s .CR2, etc.), but there is a common open format, .DNG, which stands for Digital Negative. The latter indicates how these files are supposed to be thought of by digital photographers: the master originals, repositories of all the captured information of the scene.
Raw data from an image sensor contains light intensity values captured from a scene, but these data are not intrinsically recognizable to the human eye. The data is a single channel intensity image, possibly with a non-zero minimum value to represent ‘black’, with integer values that contain 10-14 bits of data (for typical digital cameras). Rather than speaking about an intrinsic ‘white’ value, no values in the image will be above some maximum which represents the saturation point of the physical camera sensor (e.g., a CMOS or CCD sensor).
Raw sensor data typically comes in the form of a Color Filter Array (CFA). This is an m-by-n array of pixels (where m and n are the dimensions of the sensor) where each pixel carries information about a single color channel: red, green, or blue. Since light falling on any given photosite (pixel) in the CCD sensor is recorded as some number of electrons in a capacitor, it can only be saved as a scalar value; a single pixel cannot retain the 3-dimensional nature of observable light. CFAs offer a compromise where information about each of the three color channels are captured at different locations by means of spectrum-selective filters placed above each pixel.
The most common CFA pattern is the Bayer array, shown in Figure 1. There are twice as many pixels that represent green light in a Bayer array image because the human eye is more sensitive to variation in shades of green and it is more closely correlated with the perception of light intensity of a scene.
Debayering, also known as demosaicing, is the process to convert a CFA image (m-by-n) to a true RGB color digital image (m-by-n-by-3). While you may only truly know one color value at any pixel location, you can cleverly interpolate the other two color values from nearby neighbors where those colors are known.
To work with raw images in Python, we must first use other pieces of software to convert the camera-manufactuer-specific formats and obtain the sweet image data found inside.
There is a fantastic piece of cross-platform, open-source software, written by Dave Coffin, called dcraw (pronounced dee-see-raw). This is the open-source solution for reading dozens of different types of RAW file and outputting an easily read PPM or TIFF file (that Python's Pillow package can directly read data from). Many open-source image editing suites incorporate this program as their own RAW-reading routine. It can read file from hundreds of camera models and perform many standard processing steps to take a RAW file and generate an attractive output.
To compile dcraw, download the source file 'dcraw.c' from Dave Coffin's webpage and run the following command at a terminal prompt:
gcc -o dcraw -O4 dcraw.c -lm -DNODEPS
Once it's successfully compiled, you should be able to run "dcraw" to see the manual for all the available flags.
Mac: You might have to remove the -O4 flag to compile.
Windows: Build dcraw using your favourite C compiler (MinGW directly, or Visual C---repo here for Visual Studio), or use the pre-built executables from here.
Linux: Should compile fine. Pre-built executables from here.
Alt dcraw.c download:HERE.
Sample RAW image: HERE.
Here is an overview of what you will explore to read and display a RAW image:
Through the stencil code, we will introduce how to implement each of these steps in Python.
Before we try to implement our own RAW reader, we need to extract some necessary information from the EXIF data using dcraw. This is called a "reconnaissance run" (for detection purposes).
Run the following command (again, you can run "dcraw" to get an idea what the following flags represent):
dcraw -4 -d -v -T <raw_file_name>
You will see the following formatted output:
Scaling with darkness <black>, saturation <white>, and multipliers <r_scale> <g_scale> <b_scale> <g_scale>
where integer numbers fill in the fields above. Record each of them as we will make use of these shortly.
You will also find a preliminary .tiff image in the directory with no color and no interpolation. We will override it shortly.
To get the raw sensor data into Python, we use dcraw with the following options to output a 16bpp TIFF file. This will also overwrite the previously produced preliminary image.
dcraw -4 -D -T <raw_file_name>
You can now read this file into Python using
raw_data = Image.open('../sample/sample.tiff') raw = np.array(raw_data).astype(np.double)
,which will yield the raw CFA information of the camera.
The 2-D array we just generated in the TIFF may not be a linear image. It is possible that the camera applied a non-linear transformation to the sensor data for storage purposes. If so, the DNG metadata will contain a table for mapping the values of the CFA array to the full 10-14 bit values. Luckily, dcraw handles linearization for us since we used the '-4' option, so this step is not necessary!
However, even if there is no non-linear compression to invert, the raw image might still have an offset and arbitrary scaling. Therefore, we need to apply an affine transformation from range [black,white] to range [0,1], to normalize the pixel values of the image. Range [black,white] can be found from the values we obtained from the 'Preparation' step.
Enter these numbers into the stencil code. If we display our image, it should look something like this:
Any object can look like any color, depending on the light illuminating it. To reveal the color that we would see as humans, what we need is a reference point, something we know should be a certain color (or more accurately, a certain chromaticity). Then, we can rescale the R, G, B values of the pixel until it is that color. As it is usually possible to identify objects that should be white, we will find a pixel we know should be white (or gray), which we know should have RGB values all equal, and then we find the scaling factors necessary to force each channel's value to be equal. As such, this rescaling process is called white balancing. Once we do this for a single pixel, we will assume that the same illuminant is lighting the entire scene, and use these scaling values for all pixels in the image.
Thus the problem reduces to simply finding two scalars which represent the relative scaling of two of the color channels to the third. It is typical that the green channel is assumed to be the channel to which the others are compared, i.e., the green channel scalar is usually 1, while the others are often > 1.
The scalars are recorded in the EXIF data. Remember the scale values we got earlier in the Preparation step? This is them.
wbscalematrix()function in the stencil code to generate the correct matrix.
There are various demosaic() functions to complete the debayering process. Instead of using them, in this lab we'd like you to write your own debayering algorithm.
The simplest method is nearest-neighbor interpolation, which simply copies an adjacent pixel of the same color channel. Another simple method is linear interpolation, called bilinear interpolation in 2D, whereby the red value of a non-red pixel is computed as the average of the two or four adjacent red pixels, and similarly for blue and green.
debayering()function in the stencil.
Now when we display it, there's color!
The current RGB image is viewable with the standard matplotlib display functions. However, the color looks a bit weird.. because its pixels don't have coordinates in the correct RGB space that is expected by the operating system.
What we need to do is to transform from the camera’s color space to the sRGB color space by applying a 3x3 matrix transformation to each of the pixels.
This part of functionality is provided to you in the stencil code. However, if you wish to use your own RAW image from a custom camera model, you need to find out the XYZ-to-camera matrix for your camera model. This matrix (times 10,000) can be found in the source code under the adobe_coeff function.
We now have a 16-bit, RGB image that has been color corrected and exists in the right color space for display. However, it is still a linear image with values relating to what was sensed by the camera, which may not be in a range appropriate for being displayed (too dark). In other words, the image is already ‘correct’ in some sense, but not necessarily ‘pretty.’
How to make it 'pretty', then? Brightness and gamma correction! Notice this last step is highly subjective.
A simple brightness correction method is to find the mean luminance of the image and then scale it so that the mean luminance is some more reasonable value. In the stencil codes, we (fairly arbitrarily) scale the image so that the mean luminance is 1/4 the maximum. For the photographically inclined, this is equivalent to scaling the image so that there are two stops of bright area detail.
The image is now brighter, but dark areas are still too dark. We will apply a non-linear function to the brightness values to make them more perceptually pleasing---we can do this through exponentiation. This is called gamma correction, which makes use of a power law to affect the brightnesses. The specific exponent chosen---the gamma value \(\gamma\)---is arbitrary; feel free to experiment.
$$O=\alpha I^\gamma$$ where \(I\) is the input image, \(O\) is the output image, and \(\alpha\) is typically 1.
Congratulations! You now have a functional RAW image reader!
Add your final code and final image to a PDF, and submit it to Gradescope! There's a temp .tex document here—for the labs, there's no need to provide a detailed write-up.
Acknowledgement: Lab written by Zemiao Zhu, with conversion to Python by Meredith Young-Ng. This lab took heavy inspiration from Processing RAW Images in MATLAB, Rob Sumner, Department of Electrical Engineering, UC Santa Cruz, 2014. ↩︎