Cache Rules Everything Around Me
Andy Pavlo, PhD Student (Data Management)
Historically, the internal architecture of database management systems (DBMS) has been predicated on the storage and management of data in heavily-encoded disk blocks and the use of an in-memory buffer pool as a cache for these blocks. The expense of managing disk-resident data has fostered a class of new DBMSs that put the entire database in main memory and thus have no buffer pool. These next generation main memory DBMSs seek overcome the performance limitations of traditional DBMSs with a different architecture. The fundamental problem with main memory DBMSs, however, is that this improved performance is only achievable when the database is smaller than the amount of physical memory available in the system. If memory is exceeded (or if it might be at some point in the future), then a user must either (1) provision new hardware and migrate their database to a larger cluster, or (2) fall back to a traditional disk-based system, with its inherent performance problems. Both approaches entail significant dislocation and a better alternative is highly desirable.
To overcome the restriction that all data fit in main memory, in this talk I will present a new DBMS architecture that we call anti-caching. An anti-caching DBMS reverses the traditional hierarchy of disk-based systems: all data initially resides in memory, and when memory is exhausted, the least-recently accessed records are collected and written to disk. Unlike a traditional DBMS architecture, tuples do not reside in both places; each tuple is either in memory or in a disk block, but never in both places at the same time. We implemented a prototype of our anti-caching proposal in the H-Store high-performance, main memory DBMS and ran benchmarks across a range of database sizes and read/write mixes. Our results show that when all data fits in main memory, the anti-caching DBMS is 4x faster than a well-tuned disk-based DBMS optionally fronted by a distributed main memory cache. As the size of the database increases, the anti-caching DBMS maintains a significant performance advantage of the disk-based systems. In fact, for larger data sizes, the performance advantage of anti-caching increases, up to 10x for a data size 2x larger than memory. Based on these results, we contend that for any online transaction processing application, regardless of size, our anti-caching architecture is preferred than traditional, disk-oriented systems.
Andy Pavlo is Ph.D. candidate at Brown University working on database management systems under the circumspect guidance of Stan Zdonik and Michael Stonebraker. His most recent work is focused on the research and development of the H-Store distributed transaction processing system (since commercialized as VoltDB). Before this, he was a systems programmer for the Condor Project at the University of Wisconsin-Madison with Miron Livny.