Tech Report CS-92-17

Analysis of Arithmetic Coding for Data Compression

Paul G. Howard and Jeffrey Scott Vitter

April 1992


Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodlic scaling is often used in arithmetic coding implementations to reduce time and storage requirements; it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of {\em weighted entropy} and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibing locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.

{\em Index terms:} data compression, arithmetic coding, analysis of algorithms, adaptive modeling

(complete text in pdf)