Final Project : Video Texture

Fuyi Huang (fh3)
May 16, 2011

For the final project, students are asked to choose whatever they are interested in to implement. For this reason, I have decided to implement video texture since this subject interested me in class. As we all know, texture is a big subject in computer graphics and computer vision. We can use texture to generate a picture that is infinitely sized. Textures can also be used for recognition. However, we can also applied this idea to videos, that is, to make video textures. By video texture, we mean a video that can never end. The videos would repeat themselves in a while but the transitions are seamless.

The source paper that I have used is a SIGGRAPH 2000 paper titled "Video Textures" by Arno Schodl, Richard Szeliski, David Salesin and Irfan Essa. This paper described the algorithm of video textures in detail. It also talked about several extensions and applications of video texture. For this project, I have chosen to implement section 2 through section 4 because those sections are the main part that described the algorithm for video textures. At first, I thought I would implement section 5 as well. But after I read through the paper, I found that this section mainly uses several other techniques in computer vision to make the video looks better after video texture algorithm is done. I think this is actually an extension and has nothing to do with the idea of video texture. And one can think of other ways to make the videos look better rather than use the ideas in this section. So I decide not to implement this part.

For the implementation for the video texture algorithm, I basically followed the descriptions in the paper. There are two main steps in this algorithm: Extracting the video texture and sequencing the video texture. I will described both steps in details.

  • Extracting the video texture
  • When we want to make a textured video using a common video, the first thing we need to do is to figure out the video texture of the video just as we need to extract the texture in an image when we want to produce a big textured image. In order to do this, first we compute the distance between each pair of grayscaled frames of the video and then store this data in a distance matrix D. Naturally, we would like to choose the frame pairs that have the shortest distances as our video textures. However, simply taking this distance matrix is not enough. There are two main disadvantages of using this basic distance data. The first problem is that in some videos, we need to consider not only the similarity across frames, but also the dynamics of motion of the frames. For example, consider a pendulum video. We want our video texture to keep track of the motion of the swinging pendulum. In order to do this, instead of the basic distance matrix, we want to construct a new matrix by convoluting D with a weighted kernel w so that the new matrix D' takes care of the dynamics of the frames and the score for the dynamics would become higher. The weighted kernel w is defined as w = [1 / 4, 3 / 4, 3 / 4, 1 / 4]. The second problem is that in some cases, there may be dead ends according to our current video texture, that is, there may be a condition that we cannot come back to any previous frame after applying our video texture. For example, when something happened in the middle of our original video and broke the video pattern, the video textures we find may have dead ends that are undesirable. In this case, we need to be able to anticipate the future states at the current stage. So to solve the problem, we replace our distance matrix D' to D''. D'' is defined as D''(i,j) = D'(i,j) ^p + alpha * sum_k(P''(i,k) * D''(j,k)) where P'' is a probabily matrix defined as P''(i,j)=multiple of exp(-D''(i+1, j) / sigma) where sigma = the average distance value of the distances in D. And p and alpha are constraints. I chose p = 1 and alpha = 0.995. But while implementation, the D'' matrix is hard to compute. So we compute D''(i,j) = D'(i,j)^p + alpha * min_k(D''(i,k)) instead. In this way, we have taken care of future conditions. After all these steps, we have got plenty of good transitions that can be our video texture. Now we want to select even better ones from them. The method I used was that I computed the local maxima in the transition matrix P'' for each frame i and j and set all the probabilies below 0.08 in P'' to zero. In this way, we have selected very good transitions to use for our video texture. After that, since the paper only provided the dynamic programming algorithm for scheduling transitions i from j such that i >= j, we need to disgard all the transitions from i to j such that i < j. for looping purposes, we compute the average cost for each transition and store the lowest 20 of them. Average cost from frame i to fram j is defined as the average distance between i and j, that is D''(i,j) / (i - j + 1). Note that we always have i >= j

  • Sequencing the video textures
  • After selecting good transitions, we need to schedule them so that they form a reasonable loop. In this step, there are two difficulties. The first difficulty is that we need to select a set of transitions according to a predetermined number of frames. To do this, we use build an m * n table and use dynamic programming. m stands for the number of the frames and n stands for the number of available transitions. Each column for the table is the column for a given transition, that is, each nonempty entry in this column must have a same transition. For example, all entries in column 1 contain transition 1 and all entries in column 2 contain transition 2. Each entry in the table contains the transitions that have total length m and at least one copy of the common transition of the column. To compute an entry for the table, denote T(i,j) first search through all the entris in the same column before the entry and see if there is a non-empty entry. If there is, then denote the entry T(k,j) and search the (i-k)th row and see if there is an entry. If there are more than one entries, than select the one with the lowest average cost and combine this entry, denote as T(i-k, l), with T(k,j) to form T(i,j)_temporary. After searching for all such temporary numbers, the final value of T(i,j) is the one of the temporary values with the lowest average cost.

    After building the above table, we can select the desired transitions from the corresponding row. That is, we choose the entry that has the lowest average cost in a row (typically the last row) as our final transitions.

    Now there is one last difficulty. We need to arrange the transitions we have just chosen in order so that our final video can be played infinitely. The paper provided an algorithm for this:

    1. Schedule the transition that starts at the very end of the sequence as the very first transition to be taken. Suppose we have taken transition (i, j) (from frame i to frame j)

    2. After removing a transition in step 1, the resulting transitions may form several disjoint multisets of transitions. Sort the the multisets in order (i.e. if a < b, than the multiset of transitions that one of the transitions starts with a comes before the multiset of transitions that one of the transitions starts with b) and choose the multiset of transitions that contains frame j and repeat step 1.

    3. Schedule all the transitions in the multiset of step 2 until there are no more transitions in the multiset.

    4. Start from the next sorted multiset and repeat step 2.

    5. Continue scheduling the multisets using step 2 until all the multisets are scheduled.

    Finally, we store the scheduled transitions and apply them to build a new video. This video is our final video.

  • Programming language
  • I used MATLAB for this project. However, I found MATLAB very inefficient for this project because video texture deals a lot with data structures and only a little with computation. For the implementation, a lot of time was spent on creating the first distance matrix D and building the transition table. These steps would be fast if we can use some data structure like Array List.

  • Parameters
  • The paper was not clear about the values for the parameters. However, I used p = 2, alpha = 0.995. For the threshold, I used the mean value of matrix D''. And I kept 20 transitions at the last step.

  • Results
  • For the results, I tried to look for videos used in the original paper. However, I only found resulting videos rather than source videos. So I used some videos online as my source videos.

    Below are the examples for a distance matrix D and a transition matrix P

    A distance matrix D

    A transition matrix P

    Below are the source videos and the results (The source videos are at the left and the result videos are at the right)

    Source video for flag (

    Video texture for flag

    Source video of flames (

    Video texture for flames

    Short version of flames (Cut from

    Short version of flames video texture (p = 2, alpha = 0.995)

    Another short version of flames video texture (p = 1, alpha = 0.995)

    Source video for pendulum (

    Video texture for pendulum

    Source video for spinning top (

    Video texture for spinning top

    Source video for Tomcat (

    Video texture for Tomcat

    Source video for Tomcat (

    Video texture for Tomcat