Thesis Defense


"Transactional Streaming: Managing Push-Based Workloads with Shared Mutable State in a Distributed Setting"

John Meehan

Tuesday, December 11, 2018, at 2:00 P.M.

Room 368 (CIT 3rd Floor)

In the past, streaming data management systems (SDMSs) have eschewed transactional management of shared mutable state in favor of low-latency processing of high-velocity data. Streaming workloads involving ACID state management have required custom solutions involving a combination of database systems, complicating scalability by adding blocking communication, which severely impedes performance. This dissertation presents a solution: a transactional streaming system, S-Store. Built on a main-memory OLTP system, S-Store implements a novel transaction model and gives best-in-class performance on hybrid streaming and OLTP workloads while maintaining the correctness guarantees of both worlds. To achieve optimal performance, S-Store distributes state and processing across a cluster of machines. This requires an implementation which solves the unique challenges to system design and partitioning that arise from dataflow state management. This dissertation describes heuristics for optimizing two overarching workload categories: those that use data parallelism and those that use pipeline parallelism. Transactional streaming has opened opportunities for new solutions to real-world problems, particularly in streaming data ingestion. Together, these contributions provide a foundation for a new class of streaming research in which ACID state management is an essential feature rather than an afterthought.

Host: Professor Stan Zdonik