Historically, the internal architecture of database management systems (DBMS) is predicated on the storage of data in heavily-encoded disk blocks and the use of an in-memory buffer pool as a cache. The expense of managing disk-resident data has fostered a new class of high-performance DBMSs that store the entire database in main memory. The fundamental problem with these systems, however, is that their improved performance is only achievable when the database is smaller than the amount of physical memory available in the system.
In this talk, I will present a new DBMS architecture, called "anti-caching," that reverses the traditional hierarchy of disk-oriented systems to overcome this limitation. With an anti-caching system, all data initially resides in memory, and when memory is exhausted, the least-recently accessed records are collected and written to disk. We have implemented a prototype of our anti-caching proposal in the H-Store DBMS and compared it to a well-tuned disk-based DBMS optionally fronted by a distributed main memory cache. Our experiments show that as the size of the database increases, the anti-caching DBMS maintains a significant performance advantage over the disk-based systems. Based on these results, we contend that our anti-caching architecture is preferable over traditional, disk-oriented systems for any front-end application.
Andy Pavlo is an Assistant professor in the Computer Science department at Carnegie Mellon University. His research interest is on database management systems. He also used to raise clams.