Video Presentation

Post-Processing a Rolling Shutter Video Effect from Matt Nichols on Vimeo.

Overview

As briefly described in the video above, my final project is an attempt to build a "rolling shutter" video effect as a post-process. Rolling shutter is an image capture effect in which each row of the output is captured at a slightly different time, so the result can often appear bent and skewed compared to what is seen by the naked eye. This can be an objectionable artifact when photographing quickly moving objects, but can also sometimes produce interestingly surreal results. A similar effect has been implemented in video hardware, and I first got the idea for this project by watching this video. After some iteration, my program takes an input video and produces a similar effect, adding a few more options for experimentation (direction of offset, magnitude of delay, etc).

Process

In this process, composing each frame is simply a matter of incorporating frame information from past frames, cascading in increasing time in whatever direction the effect is being applied. In my first attempt, I tried building each output frame by iterating backwards in the video, copying in rows from the past as needed. However, this ended up being really slow, as each input frame needs to be queried n times for the n output frames it is included within. I encountered another challenge with framerate: 15 fps is a standard framerate, but even for a slowly walking person, the gaps in position are large enough to be noticeable for small desired time offsets between each pixel row. A 5-15 millisecond offset between rows usually looks good, but 15 fps gives up 67 milliseconds between frames – some replication of rows or interpolation between frames is needed (read on if this doesn't make sense). Just copying the same row multiple times to decrease the average offset between single pixel rows creates noticeable jagged edges in the video output, so I ended up first trying a linear combination of neighboring frames, and finally a Gaussian weighting of the four nearest frames for a reasonably smooth output. The details of my algorithm follow:

First, set everything up:

Preallocate the frames for the output video.
Determine the size of a "segment". A segment is a span of rows which will be cascaded from a current frame into the future – interpolation of neighboring frames across segments is ultimately used for smoothing. The larger each segment is, the smaller the delay between each subsequent row of pixels will be, though large delays give rise to choppier output (as more data is drawn from each static frame).
Determine how many of these segments will fit into each output frame; this will dictate how many output frames each input frame will affect.

Then, create the frames for the output. For each input frame

Iterate through the segments within this frame...
Within each segment, iterate through each pixel row...
For each pixel row, find the pixel values for the same row in the current frame, one before, and each of the two ahead (a total of four sample points)
Since each individual row represents some time differential which is probably not exactly represented by any one of the frames, these four samples must be combined in some way which approximates what the pixel values would be at this time. So, I calculate Gaussian weightings for these four points: the values on a Gaussian curve g(x) = e^(-(x-c)²/(2*s²), centered around the exact desired time offset (c), at each neighboring value. (I've found that a standard deviation s = .75 works well in most circumstances, though this could be increased for more blur / less jaggedness. It's a tradeoff.)
Given these normalized Gaussian weighting values, I then calculate the weighted sum for the output row, and write it to the output frame
f + s, where f is the current frame and s is the current segment. Thus the second segment will be written into the output frame two frames ahead... etc. There is a rudimentary visualization of this in the video above.

Once this process has completed for each input frame, all output frames will have been incrementally generated, and the effect is complete. Though I have only discussed rows and horizontal segments in this description, I also generalized this process to work in any of the four row/column order directions, not just top to bottom. This is demonstrated in the results section of my video.

Next Steps

Automatically finding the best offset time: Because a lower offset time usually works better for videos with more motion, this should be as simple as finding some sort of metric for the motion present in the input, and choosing the offset time accordingly. I actually tried to do this, by writing a motion metric descriptor for videos, which essentially calculates a rolling SSD between frames and normalizes this by the video's framerate and size. However, in practice, it was really difficult to find a good mapping between motion and millisecond offset, and I thus left it out of my final product. I think the descriptor could also be better implemented by using distance between keypoints at each frame, instead of a SSD – this would allow it to be more consistent with differently sized moving objects.

Automatically determining the best direction: Because an object moving left and right will look better with a top-to-bottom effect, and vice versa for a vertically moving object, it could be cool to automatically decide which way to cascade the time offset. This could also be accomplished with some sort of keypoint analysis, as above.

More directions or different offset patterns: Currently the offset can cascade right, left, up and down, though it's theoretically possible to use arbitrary angles, and even arbitrary patterns. Selecting segments for either of these would be much more complicated and probably quite a bit slower (since I wouldn't be able to use MATLAB matrix magic with full rows and columns), but it's potentially doable.

More and better videos: I've currently only made videos with this using relatively low framerate webcams, recording relatively slow movement. However provided a bulk of time and access to better video equipment (see: winter break, my nicer camera) I could come up with much more interesting and higher quality videos to run this effect on. This, for sure, I plan to do.

December 18th, 2012