Tech Report CS-91-54

Parallel Algorithms with Processor Failures and Delays

Jonathan F. Buss, Paris C. Kanellakis, Prabhakar L. Radge and Alex A. Shvartsman

August 1991


We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and strongly asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared memory; the complexity measures are {\em completed work} (where processors are charged for completed fixed-size update cycles) and {\em overhead ratio} (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serialization of the actions of the processors determined by an on-line adversary; the complexity measure is {\em total work} (number of steps taken by all processors). Despite their differences, the two models share key algorithmic techniques.

We present new algorithms for the {\em Write-All} problem (in which $P$ processors write ones into an array of size $N$) for the two models, and a third which runs on the restartable fail-stop model. These algorithms can be used to implement a simulation strategy for any $N$-processor PRAM on a restartable fail-stop $P$-processor CRCW PRAM such that it guarantees a terminating execution of each simulated $N$-processor step, with $O(\log^2 N)$ overhead ratio and $O(\min \{N +P \log^2 N+ M \log N$, $N \cdot P^{0.59}\})$ (sub-quadratic) completed work (where $M$ is the number of failures during this step's simulation). This strategy is work-optimal when the number of simulating processors is $P \leq N/\log^2 N$ and the total number of failures per each simulated $N$ processor step is $O(N/\log N)$. We also show that the {\em Write-All\/} requires $N-P+ \Omega (P \log P)$ completed/total work on these models for $P\leq N$.

(complete text in pdf)