Tech Report CS-04-05

High-Availability Algorithms for Distributed Stream Processing

Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik

May 2004


Stream-processing systems are designed to support an emerging class of applications that require sophisticated and timely processing of high-volume data streams, often originating in a distributed environment. Work in stream processing has so far focused primarily on stream-oriented languages and resource-constrained, one-pass query processing. High availability, a key goal for virtually all data-processing systems, has received little attention until now.

This paper investigates failure-recovery approaches for distributed stream-processing systems. In these systems, high data rates make high availability expensive to achieve. We observe, however, that unlike traditional data-processing applications that require precise recovery for correctness, many stream-processing applications can tolerate and benefit from weaker notions of recovery. Based on the correctness requirements of our target applications, we characterize various recovery types that differ in the level of recovery guarantees they offer. We describe how to adapt standard recovery techniques to provide these various guarantees. We also introduce a novel technique that leverages the data-flow aspects of streaming systems to achieve high availability at a low cost. Using analysis and simulations, we quantify the cost of offering each type of recovery guarantee and examine the performance of each recovery technique.

(complete text in pdf or gzipped postscript)