"My Database System is the Only Thing I Can Trust"
Tuesday, August 13, 2013 at 2:30 P.M.
Room 368 (CIT 3rd Floor)
An emerging class of distributed database management systems (DBMS), known as NewSQL, provide the same scalable performance of NoSQL systems while maintaining the consistency guarantees of a traditional DBMSs. These NewSQL systems achieve high throughput rates for data-intensive applications by storing their databases in a cluster of main memory partitions. This partitioning enables them to eschew much of the legacy, disk-oriented architecture that slows down traditional systems, such as heavy-weight concurrency control algorithms, thereby allowing for the efficient execution of single-node transactions. But many applications cannot be partitioned such that all of their transactions execute in this manner; these multi-node transactions require expensive coordination that inhibits performance. The DBMS therefore needs advice on what transactions will do before they start running in order to overcome these impediments. Without this information, a NewSQL DBMS will scale no better than a traditional DBMS.
But people always give me bad advice: parents, religious figures, advisors, and even David DeWitt. That means the only thing I can trust to give the right information is the DBMS itself. In this talk, I will present our research on integrating machine learning techniques to improve the performance of fast database systems. I will discuss my work on the H-Store parallel, main memory transaction processing system. I will first describe the Houdini framework that uses Markov models to predict transactions' behaviors to allow a DBMS to selectively enable runtime optimizations. I will then present Hermes, a method for the deterministic execution of speculative transactions and queries whenever a DBMS stalls because of distributed transactions. Together, these projects enable H-Store to support transactional workloads that are beyond what single-node systems can handle.
Host: Stan Zdonik