Tech Report CS-02-12
Load Shedding in a Data Stream Manager
Nesime Tatbul, Ugur Cetintemel, Mitch Cherniack, Michael Stonebraker, Stan Zdonik
A Data Stream Manager (DSM) accepts push-based input from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results.
This paper examines a technique for dynamically inserting and removing Drop operators into the query plan as required by the current load. We examine two types of Drops: the first drops a fraction of the tuples in a random fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert Drops, and how much of the load should be shed at that point in the plan. We present an algorithm for accomplishing this and experimental evidence that it can react quickly and can bring the system back into the useful operating range.