Tech Report CS-07-06

XFlow: Internet-scale Distributed Stream Processing

Olga Papaemmanouil, Ugur Cetintemel and John Jannotti

June 2007

Abstract:

Existing stream processing systems are designed and optimized for clustered deployments, and cannot adequately meet the scalability and adaptivity requirements of Internet-scale monitoring applications. Furthermore, these systems commonly optimize for a specific QoS metric, which may limit their applicability to diverse applications and environments.

This paper presents the high-level design, architecture, and evaluation of XFlow, a generic distributed data collection, processing, and dissemination system that addresses these limitations. XFlow integrates an extensible pub-sub model with data flows for stream processing. The underlying pub-sub model decouples stream sources and clients, as well as the processing operators, leading to a loosely-coupled architecture that can gracefully scale, adapt to churn in system membership and workload, and facilitate sophisticated optimizations.

We first provide an overview of the basic design and architecture of XFlow. We then describe XFlow's extensible optimization model that changes the placement and implementation of operators to meet application-specific performance goals and constraints. Finally, we demonstrate the flexibility and the effectiveness of XFlow using real-world streams and experimental results obtained from up to 150 PlanetLab sites.

(complete text in pdf)