When looking to capture a particular visual phenomena, the medium of choice is usually a photograph or a video recording. Both of these options are limited temporarally, in the sense that the object of interest is only captured momentarilly in action. This paper proposes video textures, a method of representing such objects in a way that can be randomly sampled, or endlessly looped.

Implementation

The first step in creating a video texture is calculating a distance value from each frame to each other frame. L2 distance was used to measure simularity. The idea is that for any given frame, we will transition to another frame that is relatively similar to our current frame's original progression. These transition probabilities are mapped using an exponential function.

While this process preserves smoothness in transition, it does not retain any form of physical dynamics embodied in the video. Actual physics computations would prove costly. Instead, we require that for a frame to be classified as similar to another, frames temporally adjacent to them must also be similar. By calculating a weighted sum of possible past and future transitions, we increase the probability that these transitions will happen in a proper order. This sum is achieved by convolving the difference matrix with a diagonal kernel.

Another problem video textures face is transitioning to a state with poor future transitions. This usually happens near the end of the frame sequence. By predicting future costs associated for a given transition, results improve. A form of Q-learning is used to propagate future costs from the end states through to the initial.

Finally, weak transitions are pruned by selecting only local maxima for given transitions. Probabilities below a certain threshold are also reduced to zero. My metric was the average of the smallest probabilities per row.

Output

Once the transition probabilities have been calculated and pruned, we can create a random, yet smoothly transitioning video of any length. By starting at any point before the last non-zero-probability transition, we probabilistically move from frame to frame, and save out the order to render into our video texture.

Results

Transition Graphs

L2 Distance

Dynamics Adjustment

Future Cost Propagation

Thresholded Distance Values

Thresholed Probabilities

Video Results

Audio Textures

Audio Textures were attempted, but failed to produce any meaningful results. By reading in the wave values of a given audio track, I was hoping to find smooth transitions between values to produce music similar to the input. However, a decent-quality, medium length audio track has well over 9 million values in two different channels. The sheer volume restricted my ability to test this idea. A short sample of length 10000 was created from the middle 1000 of a track. The similarity metric used was the L2 distance between the derivatives of the values. Not knowing much about audio data, I assumed the sharp changes in values is what causes different sounds, and decided to compare with that.