M. J. Black
D. J. Fleet
We present a new method for the modeling and tracking of human motion using a sequence of 2D video images. Our analysis is divided in two parts: First, we estimate a statistical model of typical activities from a large set of 3D human motion data. In a second step we use this probabilistic model as a prior distribution for Bayesian propagation using particle filters.
>From a statistical modeling perspective, a 3D human motion can be thought of as a collection of time-series. The human body is represented as a set of articulated cylinders with 25 degrees of freedom and the evolution of a particular joint angle is described by one of the time-series. A key difficulty for the modeling of these data is that each time-series has to be decomposed into suitable temporal primitives prior to statistical analysis. For example, in the case of repetitive human motion such as walking, motion sequences decompose naturally into a sequence of identical ``motion cycles''. In this work, we present a new set of tools that allows for the automatic segmentation of the training data. In detail, we suggest an iterative procedure that generates the best segmentation with respect to the signal-to-noise ratio of the data in an aligned reference domain. This procedure allows us to use the mean and the principal components of the individual cycles in the reference domain as a statistical model. Technical difficulties in this context include missing information in the motion time-series and the necessity of enforcing smooth transitions between different cycles. To deal with these difficulties, we develop a new iterative method for data imputation and functional Principal Component Analysis (PCA) based on periodic regression splines.
The learned temporal model provides a prior probability distribution over human motions which can be used in a Bayesian framework for tracking. For this purpose, we specify a generative model of image appearance and the likelihood of observing image data given the model. The high dimensionality and non-linearity of the articulated human body model and the ambiguities in matching the generative model to the image result in a posterior distribution that cannot be represented in closed form. Hence, the posterior is represented using a discrete set of samples and is propagated over time using particle filtering. The learned temporal prior helps constrain the sampled posterior to regions of the parameter space with a high probability of corresponding to human motions. The resulting algorithm is able to track human subjects in monocular video sequences and recover their 3D motion under changes in their pose and against complex unknown backgrounds.
The moving pictures in the talk don't work in the web version but are included below.
MPEG movies of learned models. The movies show samples from the learned walking model. Samples further from the mean illustrate the variability in the model.
Mean walk. Low noise. Moderate noise. Large noise. Huge noise.
This next set of movies vary the contribution of different principal components in the walking model. These serve to illustrate how the different components contribute to the overall motion.
Variation of first component. Variation of second component. Variation of third component. Variation of forth component. Variation of fifth component.
Click on images for MPEG movies of tracking results:
Stochastic tracking of 3D human figures using 2D image motion,
Sidenbladh, H., Black, M. J., and Fleet, D.J.,
European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000.
Implicit probabilistic models of human motion for synthesis and tracking,
Sidenbladh, H., Black, M. J., and Sigal, L.,
to appear: European Conf. on Computer Vision, ECCV2002.
(abstract), (postscript), (pdf).
Learning image statistics for Bayesian tracking,
Sidenbladh, H. and Black, M. J.,
Int. Conf. on Computer Vision, ICCV-2001, Vancouver, BC, Vol. II, pp. 709-716.
(postscript, 2.8MB)(pdf, 0.38MB), (abstract)
Learning and tracking cyclic human motion,
Ormoneit, D., Sidenbladh, S., Black, M. J., Hastie, T.,
Advances in Neural Information Processing Systems 13, Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker, Eds., The MIT Press, pp. 894-900, 2001.
(abstract), (pdf), (ps.gz).
A framework for modeling the appearance of 3D articulated figures,
Sidenbladh, H., De la Torre, F., Black, M. J.,
Int. Conf. on Automatic Face and Gesture Recognition, Grenoble, France, pp. 368-375, March 2000.