Tech Report CS-06-08

HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion

Leonid Sigal and Michael J. Black

September 2006


While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. Current algorithms make many different choices about how to model the human body, how to exploit image evidence and how to approach the inference problem. We argue that there is a need for common datasets that allow fair comparison between different methods and their design choices. Until recently gathering ground-truth data for evaluation of results (especially in 3D) was challenging. In this report we present a novel dataset obtained using a unique setup for capturing synchronized video and ground-truth 3D motion. Data was captured simultaneously using a calibrated marker-based motion capture system and multiple high-speed video capture systems. The video and motion capture streams were synchronized in software using a direct optimization method. The resulting HumanEva-I dataset contains multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 50,000 frames of synchronized motion capture and video was collected at 60 Hz with an additional 37,000 frames of pure motion capture data. The data is partitioned into training, validation, and testing sub-sets. A standard set of error metrics is defined that can be used for evaluation of both 2D and 3D pose estimation and tracking algorithms. Support software and an on-line evaluation system for quantifying results using the test data is being made available to the community. This report provides an overview of the dataset and evaluation metrics and provides pointers into the dataset for additional details. It is our hope that HumanEva-I will become a standard dataset for the evaluation of articulated human motion and pose estimation.

(complete text in pdf)