Justin DeBrabant

Research

Broadly, my interests lie in database systems, and even more broadly with large data and how it can be managed. As real-world data sets continue to grow and our society becomes more data-centric, I believe the most interesting research questions of the future will be new ways to manage and use that data in meaningful ways.

My current research is in new architectures for high-throughput NewSQL systems. In particular, I have worked on an architecture called anti-caching, which allows main-memory databases to handle datasets larger than main-memory without losing the performance advantages over disk-based systems. Main memory database systems such as H-Store are a perfect example of adapting system designs to changing hardware configurations. I believe strongly in this trend, and believe as systems researchers we should always be re-evaluating current design choices as assumptions about underlying hardware become invalid. To this end, my current work involves extending anti-caching to take advantage of next-generation non-volatile memory.

I have also done work in big data visualization and parallel data mining algorithms. Below is a sample of past and present research projects that I have been involved with along with representative publications.


Anti-Caching

This project addresses the limitation of main-memory databases to datasets smaller than the available memory in the system. We have proposed a novel extension to the main-memory shared-nothing database architecture to allow larger-than-memory datasets, and show that for skewed workloads, this architecture significantly outperforms both a traditional disk-based architecture and a disk-based architecture fronted by a main-memory distributed cache such as memcached.

Currently, we are exploring the adaptation anti-caching for use with next-generation non-volatile memory. Our goal is to show that anti-caching can take full advantage of a new memory hierarchy including non-volatile memory and is therefore the best architecuture for high-throughput OLTP worklaods on this new hardware.

J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, S. Zdonik. Anti-Caching: A New Approach to Database System Architecture. VLDB 2013 (PDF)


Big Data Visualization

The goal of this project is the exploration of techniques to aid in the visualization of big data. In particular, we aim to make exploratory data analysis more interactive through the use of aggressive prefetching and caching of data. Through the use of predictive models to learn and predict common query patterns, we have constructed a predictive prefetching and caching technique that can significantly reduce query latencies for exploratory workloads. We also explore the use of predictive modeling with a query recommendation frameowork to aid users in finding interesting but unexplored aspects of datasets.

U. Cetintemel, M. Cherniak, J. DeBrabant, Y. Diao, K. Dimitriadou, A. Kalanin, O. Papaemmanouil, S. Zdonik. Query Steering for Interactive Data Exploration. CIDR 2013 (PDF)

J. DeBrabant, U. Cetintemel, S. Zdonik. SeerDB: Predictive Prefetching and Caching in Large Scientific and Analytic Datasets. Masters Thesis.


Parallel Itemset Mining

As transactional datasets continue to grow in size, data mining techniques designed for a single machine will no longer suffice, distributed algorithms are needed. This project explores parallel association rule mining with MapReduce. We propose an approximate distributed algortihm with confidence bounds by sampling of the original itemsets.

Since publication, this work has become part of SAMOA, an open-source platform for mining big data streams developed at Yahoo.

M. Riondato, J. DeBrabant, R. Fonseca, E. Upfal. PARMA: A Parallel Randomized Algorithm for Approximate Association Rules Mining in MapReduce. CIKM 2012 (PDF)


Professional Activities

Co-Founder, DataScale Consulting, December 2011 - present (company website)

Adjunct Lecturer, Advanced Distributed Systems, University of Massachusetts Dartmouth, Fall 2013 (course website)

Teaching Assistant, Advanced Topics in Data Management: In Pursuit of Big Data, Brown University, Spring 2013 (course website)

Teaching Assistant, Computer Networks, Brown University, Spring 2012 (course website)

Teaching Assistant, Database Management Systems, Brown University, Fall 2011

Teaching Assistant, Intro to Object-Oriented Programming, Florida State University, Fall 2008 - Spring 2010

Invited Talks

"Anti-Caching with NVM", Intel Big Data ISTC, December 2013

"Beyond Main Memory: Anti-Caching in Main Memory Database Systems", Fidelity Investments, December 2013

"Beyond Main Memory: Anti-Caching in Main Memory Database Systems", 2013 Amazon Ph.D. Symposium, November 2013

"Anti-Caching in Main Memory Database Systems", VoltDB Inc., February 2013

"The Traditional Wisdom Is All Wrong", NEDB, February 2013

"Profile-Driven Prefetching and Caching for Interactive Big Data Visualization", Intel Big Data ISTC, January 2013

"data > memory", Harvard-Brown Systems Colloquium, December 2012


Relevant Coursework

Advanced Topics in Data Management Systems: NewSQL*, Spring 2012, Professor Stan Zdonik, Brown University

Data Management in Data-Intensive Science*, Fall 2011, Professor Ugur Cetintemel, Brown University

Special Topics in Networking and Distributed Systems*, Fall 2011, Professor Rodrigo Fonseca, Brown University

Computer Networks, Spring 2011, Professor Rodrigo Fonseca, Brown University

Human-in-the-Loop Data Management*, Spring 2011, Professor Ugur Cetintemel, Brown University

Theory and Structure of Database Systems*, Fall 2009, Professor Feifei Li, Florida State University

Intro to Operating Systems, Fall 2009, Professor Andy Wang, Florida State University

Intro to Database Systems, Spring 2008, Professor Feifei Li, Florida State University

* denotes graduate-level coursework