Tech Report CS-91-34

Programmable Systolic Arrays

Richard Paul Hughey

May 1991


Systolic arrays can solve computationally intensive problems many times faster than traditional computers or supercomputers. Because VLSI systems require many months to design, simulate, fabricate, and test, research has turned to programmable systolic arrays. This thesis introduces a general, programmable systolic array architecture: the Systolic Shared Register (SSR) architecture. The Systolic Shared Register architecture preserves the simple communication of single-purpose systolic arrays while providing a fully programmable systolic co-processor.

To test the SSR architecture, I designed, simulated, and had fabricated the Brown Systolic Array. B-SYS is an 8-bit SSR machine: each 16-element register bank in the linear array is shared between two functional units, providing simple and efficient systolic communication. The 85\,000-transistor chip worked on first fabrication and a 10-chip, 470-element prototype array performs sequence comparison over 80 times faster than its Intel 80386 host. A custom-designed board could magnify this to over 600 times faster for a 10-chip co-processor. A 32-chip \mbox{B-SYS} system could process over three billion 8-bit operations every second.

Although the SSR paradigm stands by itself, it is instructive to simultaneously consider programming methods and applications. This thesis describes the principle of software fault detection for the automatic generation of fault-tolerant programs, an efficient alternative to rigid hardware methods. Additionally, this thesis introduces the New Systolic Language; NSL is a programming language for systolic co-processors based on data stream computation. With the aid of NSL, several systolic applications are examined in detail, in particular sequence comparison problems from the Human Genome Project. A comparison of \mbox{B-SYS} to existing parallel machines reveals its unequivocal superiority in its targeted area.

(complete text in pdf)