Assistant Professor Rodrigo Fonseca of Brown University’s Computer Science Department has just won a National Science Foundation CAREER Award for his work on understanding the performance of distributed systems through causal tracing. He joins multiple previous Brown CS faculty winners, including (most recently) Erik Sudderth, James Hays, Ben Raphael, and Chad Jenkins. CAREER Awards are the most prestigious awards given by the National Science Foundation in support of outstanding junior faculty teacher-scholars who excel at research, education, and integration of the two within the context of an organizational mission.
“This project,” says Rodrigo, “was inspired by the fact that society increasingly depends on shared software systems that are large and decentralized, with many components that interact in complex and subtle ways. These include financial and banking systems, Web and cloud services, airline reservations, and 'big data' and scientific computing, to name a few. Despite their success, assessing their performance is very difficult. What causes failures? How do we uncover dependency among their components or provide performance guarantees?”
He proposes a "Tracing Plane" to tackle these problems, a pervasive infrastructure that collects causal information from the execution of a distributed system to facilitate the efficient deployment of analytics and diagnostic tasks. In addition, by aggregating information about tasks in the system across all components in a coherent way, it enables the implementation of resource management policies that can act locally, in real-time, with global knowledge, which can’t yet be done
There’s also a core educational aspect, Fonseca explains. “In addition to engaging undergraduate and graduate students, we’re going to offer a Tracing Plane Toolkit that instructors will use to teach distributed systems concepts. It’s going to have a direct effect on the education of future computer scientists by demystifying the complex systems we use every day, which could really help promote interest in CS and STEM fields.”
"This is so exciting to me," he says. "This award enables me to work on advancing how large-scale distributed systems are built, with pervasive visibility as a central aspect from the ground up, increasing our understanding of how they work, and how they fail. Ultimately, we’ll provide tools and methods for reliability and predictability, which will have a large and long-lasting potential impact."