Overview

 

Applications that deal with continuous data streams are becoming increasingly important primarily due to the emergence of sensors and similar small-scale embedded computing devices that continuously produce large volumes of data they obtain from their environment. Many stream-based applications, such as environmental monitoring, surveillance, tracking, plant maintenance, and telecommunications data management,  require the ability to handle huge volumes of continuous data streams arriving in real-time; to be able to function efficiently under uncertainty in data, and to be able to deliver results in a timely fashion. Existing database management systems are inherently ill-equipped for supporting such tasks, mainly because they are designed based on the implicit assumption that the system is a passive data repository storing a large but finite collection of data elements, which are processed in response to human-initiated queries. Moreover, in many of these applications, there is a need to ask questions that require comparing and combining stored, historical data with real-time streaming data. The primary goal of the Auroraproject is to build a single infrastructure that can efficiently and seamlessly meet the requirements of such demanding applications. To this end, we are currently critically rethinking many existing data management and processing issues, as well as developing new proactive data processing concepts and techniques.

 

Aurora addresses three broad application types in a single, unique framework:

  1. Real-time monitoring applications continuously monitor the present state of the world and are, thus, interested in the most current data as it arrives from the environment. In these applications, there is little or no need (or time) to store such data.

  2. Archival applications are typically interested in the past. They are primarily concerned with processing large amounts of finite data stored in a time-series repository.

  3. Spanning applications involve both the present and past states of the world, requiring combining and comparing incoming live data and stored historical data. These applications are the most demanding as there is a need to balance real-time requirements with efficient processing of large amounts of disk-resident data.

In order to provide efficient and effective data management and processing support for these applications, we are revisiting all aspects of database design and implementation, spanning from query optimization to user interfaces. Our current research focus is on  the real-time data processing issues, such as QoS- and memory-aware operator scheduling, semantic load shedding for coping with transient spikes in incoming data rates, as well as novel hybrid data storage organizations that would seamlessly and efficiently combine pull- and push-based data processing. Our VLDB'02 paper provides an overview of the basic Aurora architecture and our current research directions. 

 

We currently  have an Aurora prototype that provides basic data-stream processing functionality. The prototype consists of a Java-based GUI, a catalog manager, storage manager, real-time scheduler, and several primitive stream processing operators. In order to learn about the current status of our implementation efforts, grab our demonstration proposal

 

We are also in the process of designing a scalable, distributed Aurora, named Aurora* (in homage R*). Our primary goal in Aurora* is to achieve high scalability and availability for distributed stream processing applications. To this end, we have started to develop light-weight, decentralized mechanisms and protocols for dynamic continuous introspection and optimization; load sharing and shedding; and failure detection and recovery. An overview of the design of our Aurora* and the challenges we have started to tackle can be found in our CIDR'03 paper.

 

The Aurora project is a collaboration between Brandeis University, Brown University, and MIT.


The Aurora project has been superseded by the Borealis project. Borealis is a distributed multi-processor version of Aurora.

No support is available for the Aurora source code. We expect to post an alpha release of Borealis in January 2005; which will be supported. Here is the final snapshot of the Aurora source: aurora.tar.gz

 


Last modified by Bradley Berg, 12/08/2004