Tech Report CS-93-28

The Design and Analysis of Efficient Lossless Data Compression Systems

Paul Glor Howard

June 1993


Our thesis is that high compression efficiency for text and images can be obtained by using sophisticated statistical compression techniques, and that greatly increased speed can be achieved at only a small cost in compression efficiency. Our emphasis is on elegant design and mathematical as well as empirical analysis.

We analyze arithmetic coding as it is commonly implemented and show rigorously that almost no compression is lost in the implementation. We show that high-efficiency lossless compression of both text and grayscale images can be obtained by using appropriate models in conjunction with arithmetic coding. We introduce a four-component paradigm for lossless image compression and present two methods that give state-of-the-art compression efficiency. In the text-compression area, we give a small improvement on the preferred method in the literature.

We show that we can often obtain significantly improved throughput at the cost of slightly reduced compression. The extra speed comes from simplified coding and modeling. Coding is simplified by using prefix codes when arithmetic coding is not necessary, and by using a new practical version of arithmetic coding, called quasi-arithmetic coding, when the precision of arithmetic coding is needed. We simplify image modeling by using small prediction contexts and making plausible assumptions about the distributions of pixel intensity values. For text modeling we use self-organizing-list heuristics and low-precision statistics.

(complete text in pdf or gzipped postscript)