A continuous MapReduce framework for stream-oriented applications
The Continuous-MapReduce (C-MR) framework is a parallelized stream processing engine that is being developed at Brown University.
C-MR supports low-latency stream processing using the familiar MapReduce interface with the addition of integrated window semantics. In keeping with the philosophy of MapReduce, C-MR abstracts away the complexities of parallelized data stream management and windowing for complex workflows of MapReduce jobs.
To support the continuous application of MapReduce jobs to data streams over parallel hardware platforms, it is necessary to provide the following key features and functionalities:
Our work describes the design and implementation of Continuous-MapReduce. We make the following high-level contributions:
- Little or no changes to the standard MapReduce programming model
- Support for complex workflows of MapReduce jobs
- Automatic window management to maintain stream order in the midst of parallelized operations
- A workflow-wide, latency-oriented scheduler that can effectively utilize the available parallel resources to deal with time-varying load and spikes common in real-world streams
- Transparently augmenting the MapReduce processing model
to support windows for parallel stream processing. This includes automatic
windowed stream management, ensuring the preservation of temporal order after
partitioned stream processing and prior to temporal aggregation, and defining
windows over application timestamps, all without changing the basic
MapReduce programming model and semantics.
- Effectively leveraging heterogeneous parallel computing
nodes. We provide the design and implementation of a generic, parallel stream
processing system that can effectively utilize the available computing
resources, such as CPUs and GPUs, with potentially heterogeneous capabilities.
- Supporting flexible and dynamic workflow scheduling to
deal with variations in load and resource availability. To this end, we
present a progressive scheduler, designed specifically for generic computing
nodes, which incrementally and continuously transitions between scheduling
policies given the status of system resources. For example, under memory stress,
the system that normally uses a latency-minimization scheduling policy switches
to a memory saving policy to avoid thrashing.
C-MR: Continuously Executing MapReduce Workflows on Multi-core Processors
N. Backman, K. Pattabiraman, R. Fonseca, and U. Cetintemel, In Proceedings of the 3rd International Workshop on MapReduce and its Applications, MapReduce `12, June 2012.
Managing Parallelism for Stream Processing in the Cloud
N. Backman, R. Fonseca, and U. Cetintemel, In Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, HotCDP `12, April 2012.
Continuous MapReduce on Heterogeneous Parallel Hardware for Stream Processing
N. Backman, K. Pattabiraman, and U. Cetintemel, Poster Session, New England Database Summit, January 2011.
C-MR: A Continuous-MapReduce Processing Model for Low-Latency Stream Processing on Multi-Core Architectures
N. Backman, K. Pattabiraman, and U. Cetintemel, Department of Computer Science, Brown University, Technical Report CS-10-01, February 2010.
C-MR: A Continuous-MapReduce Implementation for Stream Processing
N. Backman, K. Pattabiraman, and U. Cetintemel, Poster Session, New England Database Summit, January 2010.
This work has been partially supported by the National Science Foundation under grant No. IIS-0916691 and IIS-0448284.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.