Stochastic Modeling and Tracking of Human Motion

D. Ormoneit
H. Sidenbladh
M. J. Black
T. Hastie

We present a new method for the modeling and tracking of human motion using a sequence of 2D video images. Our analysis is divided in two parts: First, we estimate a statistical model of typical activities from a large set of 3D human motion data. In a second step we use this probabilistic model as a prior distribution for Bayesian propagation using particle filters.

From a statistical modeling perspective, a 3D human motion can be thought of as a collection of time-series. The human body is represented as a set of articulated cylinders with 25 degrees of freedom and the evolution of a particular joint angle is described by one of the time-series. A key difficulty for the modeling of these data is that each time-series has to be decomposed into suitable temporal primitives prior to statistical analysis. For example, in the case of repetitive human motion such as walking, motion sequences decompose naturally into a sequence of identical ``motion cycles''. In this work, we present a new set of tools that allows for the automatic segmentation of the training data. In detail, we suggest an iterative procedure that generates the best segmentation with respect to the signal-to-noise ratio of the data in an aligned reference domain. This procedure allows us to use the mean and the principal components of the individual cycles in the reference domain as a statistical model. Technical difficulties in this context include missing information in the motion time-series and the necessity of enforcing smooth transitions between different cycles. To deal with these difficulties, we develop a new iterative method for data imputation and functional Principal Component Analysis (PCA) based on periodic regression splines.

The learned temporal model provides a prior probability distribution over human motions which can be used in a Bayesian framework for tracking. For this purpose, we specify a generative model of image appearance and the likelihood of observing image data given the model. The high dimensionality and non-linearity of the articulated human body model and the ambiguities in matching the generative model to the image result in a posterior distribution that cannot be represented in closed form. Hence, the posterior is represented using a discrete set of samples and is propagated over time using particle filtering. The learned temporal prior helps constrain the sampled posterior to regions of the parameter space with a high probability of corresponding to human motions. The resulting algorithm is able to track human subjects in monocular video sequences and recover their 3D motion under changes in their pose and against complex unknown backgrounds.

Related Publications

Sidenbladh, H., Black, M. J., and Fleet, D.J., Stochastic tracking of 3D human figures using 2D image motion, to appear: European Conference on Computer Vision, Dublin, Ireland, June 2000.

Ormoneit, D., Hastie, T., Black, M. J., Functional analysis of human motion data, to appear: In Proc. 5th World Congress of the Bernoulli Society for Probability and Mathematical Statistics and 63rd Annual Meeting of the Institute of Mathematical Statistics. Guanajuato, Mexico, May 2000.

Sidenbladh, H., De la Torre, F., Black, M. J., A framework for modeling the appearance of 3D articulated figures, Int. Conf. on Automatic Face and Gesture Recognition, Grenoble, France, April 2000. (postscript), (abstract)