Tech Report CS-88-14

Supercomputing with VLSI

Swaminathan Manohar

October 1988

Abstract:

Supercoprocessors (SCPs), highly parallel VLSI architectures tuned to solving a specific problem class, are shown to provide a means of cost-effective supercomputing. The use of a collection of such SCPs is advanced as a cost-effective supercomputing alternative. A methodology for building SCPs for different computation-intensive problems is described: two pragmatic constraints, namely problem-size independence and a limited-bandwidth constraint, are imposed on special-purpose architectures; a simple but powerful model of computation is used to derive general upper bounds on the speedup obtainable using such architectures. It is shown that bounds established by other authors for matrix multiplication and sorting, using problem-specific approaches, can be derived very simply using this model.

Poisson Engine-I (PE-I), a prototype SCP, is a system for solving the Laplace equation using the finite difference approximation. Previous solutions for this problem either assumed the availability of unlimited number of processors or used complex processors with local memories. In contrast, PE-I uses a novel approach: asynchronous iteration methods are implemented using a fixed-size, synchronous array of simple processing elements. This results in a scalable, high-performance system with a rich set of solution algorithms. The simplicity of PE-I enables its throughput to be determined using analytical means. This is complemented by two software tools, PES and SPEC, which together make possible a comprehensive study and analysis of PE-I. The results obtained from these tools, combined with a detailed presentation of the implementation aspects of PE-I, show that, even with a moderate number of chips, PE-I can deliver throughputs greater than a billion operations per second. Architectural and algorithmic extensions to PE-I are briefly considered: the solution of a wider class of PDEs and the use of more sophisticated algorithms like the multigrid method are some of the issues addressed.

The SCP methodology is applied to the problems of matrix-multiplication and sorting. For sorting, an SCP with superlinear speedup is outlined. For the matrix problem, the architecture and implementation details of SMP are described in detail. SMP, realized with about fifty chips using current technology, is capable of throughputs greater than 150 Mflops, and is also unique in being optimal with respect to the lower bound derived using the SCP model.

Order hardcopy report from techreports@cs.brown.edu