Tech Report CS-94-29

Efficient Parallelism vs Reliable Distribution: a Trade-off for Concurrent Computations

Paris C. Kanellakis, Dimitris Michailidis, and Alex A. Shvartsman

June 1994


Concurrent computations should combine efficiency with reliability, where efficiency is usually associated with parallel and reliability with distributed computing. Such a desirable combination is not always possible, because of an intuitive trade-off: efficiency requires removing redundancy from computations whereas reliability requires some redundancy. We survey a spectrum of algorithmic models (from fail-stop, synchronous to asynchronous and from approximate to exact computations) in which reliability is guaranteed with small trade-offs in efficiency. We illustrate a number of cases where optimal trade-offs are achievable. A basic property of all these models, which is of some interest in the study of concurrency, is that ``true'' read/write concurrency is necessary for fault tolerance. In particular, we outline how algorithms can be designed so that, in each execution, the total ``true'' concurrency used can be closely related to the faults that can be tolerated.

(complete text in pdf or gzipped postscript)