C-MR: A continuous MapReduce framework for stream-oriented applications

C-MR
A continuous MapReduce framework for stream-oriented applications

Overview

Publications

People

Acknowledgements

Overview

The Continuous-MapReduce (C-MR) framework is a parallelized stream processing engine that is being developed at Brown University. C-MR supports low-latency stream processing using the familiar MapReduce interface with the addition of integrated window semantics. In keeping with the philosophy of MapReduce, C-MR abstracts away the complexities of parallelized data stream management and windowing for complex workflows of MapReduce jobs.

To support the continuous application of MapReduce jobs to data streams over parallel hardware platforms, it is necessary to provide the following key features and functionalities:

Little or no changes to the standard MapReduce programming model
Support for complex workflows of MapReduce jobs
Automatic window management to maintain stream order in the midst of parallelized operations
A workflow-wide, latency-oriented scheduler that can effectively utilize the available parallel resources to deal with time-varying load and spikes common in real-world streams

Our work describes the design and implementation of Continuous-MapReduce. We make the following high-level contributions:

Transparently augmenting the MapReduce processing model to support windows for parallel stream processing. This includes automatic windowed stream management, ensuring the preservation of temporal order after partitioned stream processing and prior to temporal aggregation, and defining windows over application timestamps, all without changing the basic MapReduce programming model and semantics.
Effectively leveraging heterogeneous parallel computing nodes. We provide the design and implementation of a generic, parallel stream processing system that can effectively utilize the available computing resources, such as CPUs and GPUs, with potentially heterogeneous capabilities.
Supporting flexible and dynamic workflow scheduling to deal with variations in load and resource availability. To this end, we present a progressive scheduler, designed specifically for generic computing nodes, which incrementally and continuously transitions between scheduling policies given the status of system resources. For example, under memory stress, the system that normally uses a latency-minimization scheduling policy switches to a memory saving policy to avoid thrashing.

Publications

C-MR: Continuously Executing MapReduce Workflows on Multi-core Processors
N. Backman, K. Pattabiraman, R. Fonseca, and U. Cetintemel, In Proceedings of the 3rd International Workshop on MapReduce and its Applications, MapReduce `12, June 2012.
Managing Parallelism for Stream Processing in the Cloud
N. Backman, R. Fonseca, and U. Cetintemel, In Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, HotCDP `12, April 2012.
Continuous MapReduce on Heterogeneous Parallel Hardware for Stream Processing
N. Backman, K. Pattabiraman, and U. Cetintemel, Poster Session, New England Database Summit, January 2011.
C-MR: A Continuous-MapReduce Processing Model for Low-Latency Stream Processing on Multi-Core Architectures
N. Backman, K. Pattabiraman, and U. Cetintemel, Department of Computer Science, Brown University, Technical Report CS-10-01, February 2010.
C-MR: A Continuous-MapReduce Implementation for Stream Processing
N. Backman, K. Pattabiraman, and U. Cetintemel, Poster Session, New England Database Summit, January 2010.

People

Students

Faculty

Acknowledgements

This work has been partially supported by the National Science Foundation under grant No. IIS-0916691 and IIS-0448284.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.