Computer Vision Research Projects

    (warning: this is very out of date.  see my publications for more recent work)


Looking @ People

This work explores the estimation of human motion and the recognition of human gesture, action, and facial expression. The goal of this work is to estimate and understand human motion for multi-media applications such as video database indexing and for developing novel types of user interfaces.

Human Motion Estimation

Motion is intimately tied with our behavior; for example, we move when we communicate (through facial expressions and gestures) and when we interact with each other and with objects in the world. Recovering this motion is necessary if we want computers to understand human action.

Estimating human motion is a challenging problem since it typically violates many of the assumptions used to compute image motion. We have used layered mixture models to cope with multiple motions that may be present in an image region. We have developed a theory of appearance change to cope with image appearance changes that are not well modeled as motion. We have also explored learning models of non-rigid body parts such as mouths. Most recently we have been developing stochastic approaches for recovering 3D, articulated, human motion.

Learning Image Statistics of People for Bayesian Tracking.

Stochastic Tracking of 3D Human Motion. (includes MPEG movies)

Cardboard People: A Parameterized Model of Articulated Image Motion

EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation

Recognizing Human Motion

My approach to human motion estimation uses parameterized models of optical flow. These models provide a concise description of the motion in terms of a small number of parameters. The evolution of these parameters over time can be used for recognition. I have explored a number of recognition strategies to recognize facial expressions, articulated motions, and human speech.
Explaining optical flow events with parameterized spatio-temporal models. (includes MPEG movie)

Recognizing Facial Expressions in Image Sequences using Local Parameterized Models of Image Motion

Parameterized modeling and recognition of activities

Recognizing Human Behavior

Going beyond the recognition of human motion, I am interested in modeling and understanding human behavior. For example, in an office setting I would like to be able to reason about humans interacting with each other and the objects around them.

As with facial expressions and articulated motions, I think motion is an important cue for understanding human behavior. Consider this short animated clip (download movie) that is based on the film by Heider and Simmel (1944). The motion of the objects is the primary source of information in this clip. Most people construct very similar stories about what is happening in the movie and this suggests that we have very strong models of motion and action that we use to explain our world.

(thanks to Emre Yilmaz for the clip)

Human-Computer Interaction

Computers don't understand enough about people - how we act, how we feel, how we interact with the world around us. One of the goals of my work is to make computers understand us better by getting them to understand human behavior and, in particular, to understand human motion. There are two main application areas that I am pursuing:
1. Augmenting the standard human-computer interface with a camera that understands human facial motions and facial expressions. With Francois Berard I have been working on a perceptual browser that uses head motion to control window scrolling in a graphical user interface.
The Digital Office Project

2. Augmenting a standard office whiteboard with a camera that can capture the contents of the whiteboard and that understands human gestures.

Condensation-based recognition of gestures and expressions

Multi-Media Applications

New document types that incorporate digital video are rapidly increasing due to the expansion of the internet and the availability of personal computers with powerful video and graphics capabilities. Currently digital video is not a very friendly document type because it is not searchable by structure or content, it is not editable at the structural level, and it is not easily browsable. One of my motivations for studying motion (and human motion in particular) is to make video a more usable document type. This requires that we be able to analyze the structure and content of video.
Analysis of gesture and action in technical talks for video indexing

With performance artist Pamela Z, I have been exploring the relationships between motion and music. We are currently collaborating on a multi-media piece that exploits human motion tracking/understanding.

Art, Science, and the PARC Artist in Residence Program


Optical Flow Estimation

Much of my work has focused on the problem of robustly estimating image motion in video sequences. I have been particularly concerned with the problem of estimating multiple motions that occur due to motion discontinuities, transparency, fragmented occlusion, shadows, specular reflections, etc. To attack these problems I have used layered models of the image motion and have used robust statistics and mixture models to segment image motions into layers.

Motion Discontinuities

Probabilistic Detection and Tracking of Motion Discontinuities

Constraints for the early detection of discontinuity from motion.

Learning

Learning parameterized models of image motion

Layered Models

Skin and Bones: Multi-layer, locally affine, optical flow and regularization with transparency

Mixture Models for Optical Flow Computation

Robust Optical Flow Estimation

The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields

Robust Dynamic Motion Estimation Over Time

Robust Incremental Optical Flow (Ph.D. Thesis)

Combining Motion and Brightness

Estimating Optical Flow in Segmented Images using Variable-order Parametric Models with Local Deformations

Combining Intensity and Motion for Incremental Segmentation and Tracking Over Long Image Sequences

Incremental Estimation

Recursive Non-Linear Estimation of Discontinuous Flow Fields

Psychophysical Implications of Temporal Persistence in Early Vision: A Computational Account of Representational Momentum.



Appearance Change

When we observe the world, there are many changes in the appearance of objects over time that are not well modeled as image motion. Consider, for example, clouds, fire, rushing water, or trees blowing in the wind. For each of these examples, there is a change in image appearance over time that can be modeled but is not solely due to the motion of pixels from one image location to another. The goal of this work is to develop a general theory of appearance change that can be used to describe, not only motion, but these other types of iconic change.
Modeling appearance change in image sequences.


Robust Statistics

While much of my work has focused on image motion, I am also interested in the theoretical relationships between robust statistics and computer vision. One of the goals of this work is to understand the connections between robust statistics and techniques such as anisotropic diffusion or regularization with line processes. Behind these connections is a statistical interpretation of edges as outliers. Additionally, we have looked at robustly learning linear models (eg robust PCA).
Dynamic coupled component analysis.

Robust Principal Component Analysis for Computer Vision.

Robust Anisotropic Diffusion

On the Unification of Line Processes, Outlier Rejection, and Robust Statistics for Problems in Early Vision



Mixture Models

Allan Jepson and I introduced mixture models for optical flow estimation (CVPR'93). Layered models, however, are much more general and the goal of this work has been to explore their use in other areas of computer vision including shape from texture, depth estimation, and brightness segmentation.
A mixture-model framework for recovering appearance change in image sequences.

Shape from Multiple, Transparent, and Occluded Textures

Mixture Models for Image Representation