Map  Reduce
1 Your Map  Reduce
2 Anagrams
3 Nile 2.0
4 Analysis
5 Template Files
6 Handing In


  • Every individual student must turn in their own separate sweep.

  • Each pair turns in a single final submission.

When working with partners, you should follow the collaboration policy.

You will have to implement your own version of the essence of Google’s MapReduce.

One of the things you will discover is that their operator is horribly mis-named: it isn’t the composition of map and reduce. Nevertheless, we’ll stick to their naming to respect (unconventional) convention.

The MapReduce pattern allows you to divide problems into a series of independent, equivalent tasks that can be parallelized. Specifically, all the map tasks can be run at the same time, as can all of the reduce tasks, because the result of each task does not depend on any of the other tasks. Because your implementation will run on only a single processor, you won’t see any performance gains. However, working through the central idea should help familiarize you with it so you recognize applications that can use it later.

First, you will have to write a MapReduce framework. This framework will take a mapper and a reducer, along with input. It will apply the mapper to each input, collect the results, and pass them to the reducer. Require map-reduce-support.arr to get access to the following data structure:

data Tv-pair:

  | tv(tag :: Any, value :: Any)


This represents a tag-value pair, the fundamental unit of data for all MapReduce programs.

1 Your MapReduce

For your framework, you’ll need to define:

fun<A, B, M, N, O> map-reduce(

  input :: List<Tv-pair<A, B>>,

  mapper :: (Tv-pair<A, B> -> List<Tv-pair<M, N>>),

  reducer :: (Tv-pair<M, List<N>> -> Tv-pair<M, O>))

-> List<Tv-pair<M, O>>

Note that the mapper function produces a List<Tv-pair<M, N>> while the reducer function takes a Tv-pair<M, List<N>>. You will have to figure out how to collect all of the N’s for each M so you can pass them to a reducer task. (Think about how the Design Recipe can help you here!)

Let’s consider this contract in terms of a concrete example. To help you with testing, the support code provides the mapper and reducer for one example: word count. In this example, we make a distinction between word tokens and word types. Word tokens are individual words (for “the words the words”, the word tokens are “the”, “words”, “the”, and “words”). Word types are unique words (for “the words the words”, the word types are “the” and “words”). The support code provides:
  • A word count mapper:

    fun wc-map(

        file :: Tv-pair<String, String>)

      -> List<Tv-pair<String, Number>>

    wc-map consumes a file, which is a Tv-pair with the filename as its tag and the file’s contents as its value. It produces a List<Tv-pair> where the tags are word tokens and the values are the corresponding counts of the tokens. In this case, wc-map produces the count 1 for every word token, because it counts words as independent tokens as it encounters them.

  • A word count reducer:

    fun wc-reduce(

        word-counts :: Tv-pair<String, List<Number>>)

      -> Tv-pair<String, Number>

    It takes a Tv-pair<String, List<Number>> in which the tag represents a word and the value is a list of word counts for that word (in this case, as we know, a list of 1s, though it won’t always be so simple!). It produces a Tv-pair<String, Number> in which the tag is a word, and the value is the number of times that word appeared in all of the input.

It’s the reducer’s job to sum the word token counts into a word type count. wc-map may be run independently on each file, and wc-reduce may be run independently on each word. This is the division of the problem into a series of equivalent tasks which can be run at the same time, and from here we gain the performance advantage over the undivided problem.

Once you and your partner are satisfied with your map-reduce, use it to solve the following problems.

2 Anagrams

Your input consists of a List<Tv-pair<String, String>> in which the tag is the file name, and the value is the string contents of the file (like in the word count example). For example, an input could be tv("words.txt", "star rats tarts"). Your output should be a List<Tv-pair<String, List<String>>> in which each tag represents a unique anagram group, and the value is a list of unique words that appear in the input that are all anagrams of each other. We will define anagrams to be strings made up of the exact same characters (case-sensitive). Your functions should be named anagram-map and anagram-reduce.

3 Nile 2.0

If you thought you were done with Nile, then you must be in de-nile!

recommend(title :: String,

          book-records :: List<Tv-pair<String, String>)

  -> Tv-pair<Number, List<String>>

takes a book title and a list of tagged files (like in Anagrams, except formatted according to the Nile spec) and produces a Tv-pair with the highest number of times the book was paired with one or more other books, and a list of those books.

popular-pairs(book-records :: List<Tv-pair<String, String>>)

  -> Tv-pair<Number, List<String>>

takes a list of tagged files and produces a Tv-pair with the highest number of times any two books were paired with each other, and a list of the corresponding pairs.

Pairs should be formatted according to the original Nile spec.

4 Analysis

  1. How would JoinLists be useful for MapReduce? (One paragraph or less.)

  2. Give an example of another problem that you could use MapReduce to solve beyond Anagrams and Nile. Provide the contracts for your input data, mapper, and reducer.

5 Template Files

For your MapReduce sweep, we’d like you to start thinking about the assignment by writing some interesting tests using the provided wc-reducer and wc-mapper.

Initial Tests (MapReduce)

Implementation (MapReduce)

Final Tests (MapReduce)

6 Handing In

Remember that this assignment requires you to hand in sweeps.

To submit, return to the Captain Teach assignments page:

and click “Next Step” again. Upload a zip file containing both your implementation, named map-reduce.arr, and your tests. At the next step, please upload your analysis.

Note that you will submit twice from Captain Teach—once for JoinLists and once for MapReduce.