Overview Publications People Software Acknowledgements
Borealis
Distributed Stream Processing Engine
Borealis  logo

Softwareupdate

Borealis is a distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis builds on our previous efforts in the area of stream processing: Aurora and Medusa.

The Borealis project is no longer an active research project. The last two releases of Borealis are:
The Borealis software runs on Linux x86-based computers. The instructions to install Borealis have details about system compatibility and software installation.

The Borealis Application Programmer's Guide describes how to write application programs using Borealis. The Borealis Developer's Guide and the paper, A Distributed Catalog for the Borealis Stream Processing Engine, describes the distributed operation of the Borealis system software.

The current version of Borealis includes the following modules:
  • Stream Processing Engine that provides the core low-latency stream processing functionality with a rich set of stream-oriented operators.


  • A Coordinator that deploys a network of cooperating Borealis stream processing engines, distributes query processing across multiple machines, and maintains integrity and correct operation as the network is dynamically mutated.


  • A Load Manager that monitors run-time load and dynamically moves operators across machines to improve performance.


  • A Load Shedder that detects and eliminates CPU overload from multiple machines by selectively dropping tuples in a coordinated fashion.


  • A Fault Tolerance module that runs redundant query replicas to deal with various failure modes and achieve high availability.


  • A Revision Processing mechanism  that process corrections of erroneous input tuples by generating corrections of previously output query results.
The release also includes the following tools:
  • A Graphical Query Editor that simplifies the composition of streaming queries using a boxes and arrows interface. Queries may also be written using a textual (XML-based) query definition language.


  • A System Visualizer that dynamically displays the Borealis network topology and various run-time statistics.


  • A Stream Connection Generator to generate code to marshal data between applications and Borealis.

Copyright 2006, Brandeis University, Brown University, and Massachusetts Institute of Technology; all rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.


  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.


  • Neither Brandeis University, Brown University, MIT names, nor the names of individual contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Overview

Over the last several years, a great deal of progress has been made in the area of stream processing engines (SPEs). Several groups have developed working prototypes and many papers have been published on detailed aspects of the technology such as stream-oriented languages, basic resource management, and resource-constrained one-pass query processing. While this work is an important first step, fundamental mismatches remain between the requirements of many streaming applications and the capabilities of first-generation systems.

In the Borealis project, we identify and address the following shortcomings of current stream processing techniques:
  • Distributed, highly-available operation: Distributed operation across a cluster of commodity machines is the most economical way of achieving high scalability and availability, two key design concerns common to many streaming domains. Furthermore, typical streaming workloads exhibit significant bursts and variations over all time-scales, requiring the ability to dynamically distribute load and deal with transient overloads. We are investigating novel distributed architectures and algorithms customized on the basis of the requirements and characteristics of stream processing applications. Examples include load distribution algorithms that are tolerant of load spikes and high availability approaches that enable parallel, low latency recovery. 


  • Dynamic revision of query results: In many real-world streams, corrections or updates to previously processed data are available only after the fact. For instance, many popular data streams, such as the Reuters stock market feed, often include so-called revision records, which allow the feed originator to correct errors in previously reported data. Furthermore, stream sources (such as sensors), as well as their connectivity, can be highly volatile and unpredictable. As a result, data may arrive late and miss its processing window, or may be ignored temporarily due to an overload situation. In all these cases, applications are forced to live with imperfect results, unless the system has means to revise its processing and results to take into account newly available data or updates.


  • Dynamic query modification: In many stream processing applications, it is desirable to change certain attributes of the query at run time. For example, in the financial services domain, traders typically wish to be alerted of interesting events, where the definition of ``interesting'' (i.e., the corresponding filter predicate) varies based on current context and results. In network monitoring, the system may want to obtain more precise results on a specific subnetwork, if there are signs of a potential Denial-of-Service attack. Finally, in a military stream application that Mitre explained to us, they wish to switch to a ``cheaper'' query when the system is overloaded. For the first two applications, it is sufficient to simply alter the operator parameters (e.g., window size, filter predicate), whereas the last one calls for altering the operators that compose the running query.


  • Flexible and highly-scalable optimization: Currently, commercial stream processing applications are popular in industrial process control, financial services, and network monitoring. Here we see a server heavy optimization problem --- the key challenge is to process high-volume data streams on a collection of resource-rich ``beefy'' servers. Over the horizon, we see a very large number of applications of wireless sensor technology (e.g., RFID in retail applications, cell phone services). Here,  there is a sensor heavy optimization problem --- the key challenges revolve around extracting and processing sensor data from a network of resource-constrained ``tiny'' devices. Further over the horizon, we expect sensor networks to become faster and increase in processing power. In this case the optimization problem becomes more balanced, becoming sensor heavy, server heavy. Thus, there will be a need for a more flexible optimization structure that can deal with a very large number of devices and perform sensor-heavy server-heavy resource management and optimization.

Publications
   Demonstrations:

People
   Brandeis:
   Brown:
   MIT:

Acknowledgements
This project is based upon work supported by the National Science Foundation under Grant No. 0325703 and 0086057.  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation