Tech Report CS-91-36

Efficient Parallel Algorithms on Restartable Fail-stop Processors

Paris C. Kanellakis and Alex A. Shvartsman

May 1991

Abstract:

We study efficient deterministic execution of parallel algorithms on restartable fail-stop CRCW PRAMs. We allow the PRAM processors to be subject to {\em arbitrary stop failures} {\bf and} {\em restarts} that are determined by an on-line adversary and result in loss of private memory but do not affect shared memory. For this model, we define and justify the complexity measures of {\em completed work}, where processors are charged for completed fixed-size update cycles, and {\em overhead ratio}, which amortizes the work over necessary work and failures. This framework is a nontrivial extension of the fail-stop no-restart model of [KS~89].

We present a simulation strategy for any $N$-processor PRAM on a restartable fail-stop $P$-processor CRCW PRAM such that: it guarantees a terminating execution of each simulated $N$ processor step, with $O(\log^2 N)$ overhead ratio, and $O(\min \{N +P \log^2 N+ M \log N$, $N \cdot P^{0.6}\})$ (sub-quadratic) completed work, where $M$ is the number of failures during this step's simulation. This strategy is work-optimal when the number of simulating processors is $P \leq N/\log^2 N$ and the total number of failures per each simulated $N$-processor step is $O(N/\log N)$. These results are based on a new algorithm for the $WA$ problem ``$P$ processors write 1's in an array of size $N$'', together with a modification of the main algorithm of [KS~89] and with the techniques in [KPS~90, Shv~89].

We observe that, on $P=N$ restartable fail-stop processors, the $WA$ problem requires $\Omega(N \log N)$ completed work, and this lower bound holds even under the additional assumption that processors can read and locally process the entire shared memory at unit cost. Under this unrealistic assumption we have a matching upper bound. The lower bound also applies to the expected completed work of randomized algorithms that are subject to on-line adversaries. Finally, we describe a simple on-line adversary that causes inefficiency in many randomized algorithms.

(complete text in pdf)