- Consider spatial and angular sampling in light fields and its connection to blur in images.
- Practically refocus a light field using angle integration and shear transforms.
- Realize how depth relates to EPI image structure, and estimate depth from light fields by optimizing variance over shear transforms.

A (discrete) light field is a sampling of the plenoptic function, as we discussed in class. We can capture a light field with a camera array, a lens array, or for static scenes we could move a camera on a gantry or a *linear stage*.

Today we're going to see how to use images from a 3D ray sampling of a light field captured by a gantry to provide control over the focal plane and focus blur. For this, we will consider slices of the light field called epipolar images (EPIs). We can create an EPI image by slicing images in our linear-stage-captured light field along a scanline. This operation is like stacking all our 1D images into a rectangular prism, and slicing it horizontally:

Relate the EPI image to the horizontal position in the camera array (Fig 4. high res):

- What does a scanline represent in an EPI?
- What are the axis labels? Remember from class we used the two-plane parameterization with \(L(u,v,s,t)\). What intuitive name would you give each one?
- What is the relationship between scene depth and the structured pattern of angled edges in an EPI?

- Load an image set from below into a multidimensional array. For our 3D light field, this will be a 4D array. From the data below, start with 'Train' with 100 images; if your computer doesn't have a lot of memory, then reduce the number of images that you load in.
- Remember to sort your list of filenames so that they remain in order when loaded.
- Write a function which creates an EPI image. Remember: it is a slice of the data.
- Visualize your EPI to confirm that it looks correct.
- Create an animation that sweeps through each EPI in the light field (from top to bottom)—
*hint*: in Python OpenCV, repeatedly calling`cv2.imshow("window",data)`

will work fine - just remember to`cv2.waitKey(30)`

afterwards.

**Train**: 2D images x 1D linear stage with 500 views. ZIP with 100 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

**Chess**: 2D images x 2D linear stage with 289 views. ZIP with 1D - just 17 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

**Lego**: 2D images x 2D linear stage with 289 views. ZIP with 1D - just 17 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

From our study of projection, we know that perfect pinhole cameras do not blur the image of the world—everything is in focus. However, as we increase the aperture, objects start to blur. We can introduce a lens to focus some of these rays, but rays from objects that are not at the focal plane converge to a depth-dependent point spread function on the sensor.

From our light field datasets above, each one is (almost) all in focus; the aperture is small and the focal plane is set to infinity. Let's assume that they are pinhole cameras. Now, by shifting the position of the camera, we see rays that come from directions that were not visible with just a single view ('parallax'). This is very similar to what happens when we open the aperture, except now these rays are not averaged together—we have sampled them individually (or, at least, more individually!).

This individual ray sampling is powerful. Remember how we integrated rays *over time* with multiple short exposures to simulate a long exposure without noise in the night lab? Now we're going to integrate rays *over space* to simulate a larger aperture than we ever had before. Let's average across the vertical dimension of our EPI image—what will we see?

- Average your EPI image along its columns—what happens to the EPI image?
- Reconstruct one full image of the scene by averaging every EPI scanline—which part of the image is in focus?

Next, we're going to virtually change the focal plane. We can see that the slope of the line in the EPI relates to how far away the scene point is from the focal plane. In our case, the scenes are focused at infinity, so the slope is directly proportional to the *depth*: a larger slope is closer to the camera, because we see a greater disparity from camera view to camera view.

So, what happens if we *change the slope of the line* and then integrate? We know how to do this—skew matrices!

- Learn how to skew an image using
`skimage.transform.warp`

and`skimage.transform.AffineTransform`

. Use any test image to begin, and make sure you can skew correctly. - Next, skew (or
*shear*) each EPI by some amount.- Your shear matrix should look something like \([1, s, 0; 0, 1, 0; 0, 0, 1]\), where s is the shear amount.
- Shearing between [0,-2] is a good range.
*If you loaded your images in descending order, flip the sign on the shear amount.* - You can also use the
`skimage.transform.AffineTransform(shear=s)`

form.

- Then, average the EPIs in the vertical dimension. Then, reconstruct your image. What happened?
- Test different skew values.
- Make an animation as you cycle through different skew values. What do we see?

To estimate depth, we can use the knowledge we just gained: a point in focus will be at the focal plane, and the skew of the matrix determines the slope of the line (or the depth!!!) of that focal plane. Let's look at how the image content varies across skew values, and *optimize* for a skew value per pixel.

- Find the best skew value per pixel by testing a range of skews and considering the 'cost' at each pixel once we integrate across \(y\) angular views.
- Estimate the 'defocus' cost from Figure 9 (purple), which is the gradient of the integrated image. Think about why this works.
- Estimate the 'correspondence' cost from Figure 9 (cyan), which is the variance across the EPI. Think about why this works.
- Visualize the two outputs—where does each succeed?

Light field cameras and camera arrays are powerful computational tools because they let us sample and integrate rays in new ways to simulate optical and lens effects in a principled way. Lytro is the most famous light field camera company (now defunct; bought by Google). Why do you think they went under? Imagine I had to build a light field camera. What is a practical way to build a portable camera? I only have access to the same image sensor technologies as every other camera manufacturer. What do I gain and what do I lose in *which* rays I sample, and how does this affect the final image quality? As a photographer, or as a consumer, what is most useful to me?

Talk to James to know more : )

- Levoy and Hanrahan, Light Field Rendering. SIGGRAPH 1996.
- Isaksen, McMillan, and Gortler. Dynamically reparameterized light fields. SIGGRAPH 2000.
- Tao, Hadap, Malik, and Ramamoorthi. Depth from combining defocus and correspondence using light-field cameras. ICCV 2013.
- Wang, Efros, Ramamoorthi. Occlusion-aware depth estimation using light-field cameras. ICCV 2015.

Please upload your Python code, input/result images, and any notes of interest as a PDF to Gradescope. Please use writeup.tex for your submission.

This lab was developed by the 1290 course staff. Thanks to The (New) Stanford Light Field Archive, and the MERL Light Field Repository.