Thesis Proposal


"Balancing Competing Bottlenecks for In-Memory Data Analytics Systems"

Andrew Crotty

Friday, March 23, 2018 at 3:00 P.M.

Room 368 (CIT 3rd Floor)

For decades, disk I/O was the single overwhelming bottleneck for a wide variety of data management problems, but recent advancements in hardware have produced much more balanced computing environments where the bottleneck can shift dramatically based on even minor variations in workload characteristics. For most state-of-the-art analytics systems, the two primary competing bottlenecks are CPU throughput and main memory bandwidth.

This thesis seeks to address the competing bottleneck challenge for analytics systems in three ways. First, we investigate new techniques related to code generation that produce streamlined code highly specialized for individual workloads, applying targeted low-level optimizations on a case-by-case basis by introspecting user-defined functions. Then, we explore a novel query processing paradigm based on bit vectors designed to produce generated code that better leverages the features of modern hardware (e.g., SIMD). Finally, we propose to study when and how code generation can best be used in these scenarios to improve performance without simply introducing unnecessary complexity.

Host: Professor Tim Kraska