Tech Report CS-91-03

Analysis of Arithmetic Coding for Data Compression

Paul G. Howard and Jeffrey Scott Vitter

January 1991


Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this paper we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. We show that adaptive models give the same code length as semi-adaptive decrementing models. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements; it also introduces a recency effect which can further affect compression. We introduce the concept of {\em weighted entropy} and use it to characterize in an elegant way the effect that periodic scaling has on the code length. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.

(complete text in pdf)