Thesis Defense


"Code Generation for In-Memory Data Analytics"

Andrew Crotty

Friday, December 7, 2018 at 2:00 P.M.

Lubrano Conference Room (CIT 4th Floor)

Recent advancements in hardware have caused a shift toward purely in-memory data processing, forcing a complete redesign of the high-overhead abstractions (e.g., Volcano-style iterators) at the core of traditional, disk-based systems. One popular replacement for these outdated query processing models is code generation, which refers to the process of generating query-specific, machine-executable code to evaluate a query. In general, code generation is highly efficient and enables a variety of low-level optimizations, yet it comes with its own set of drawbacks.

This dissertation provides an in-depth exploration of code generation for in-memory data analytics. First, we present two novel code generation strategies that jointly consider properties of the operators, data, and underlying hardware to significantly improve performance compared to existing code generation approaches. Then, in order to mitigate the main disadvantages associated with code generation, we propose a new, alternative query processing model that achieves comparable performance while avoiding these downsides.

Host: Professor Tim Kraska