Map-Reduce
1 Your Map-Reduce
2 Anagrams
3 Nile 2.0
4 Template Files
5 Handing In
5.1 Initial Test Sweep
5.2 Implementation and Final Tests

Map-Reduce

You will have to implement your own version of the essence of Google’s MapReduce.

One of the things you will discover is that their operator is horribly mis-named: it isn’t the composition of map and reduce. Nevertheless, we’ll stick to their naming to respect (unconventional) convention.

The map-reduce pattern allows you to divide problems into a series of independent, equivalent tasks that can be parallelized. Specifically, all the map tasks can be run at the same time, as can all of the reduce tasks, because the results of each task does not depend on any of the other tasks. Because your implementation will run on only a single processor, you won’t see any performance gains. However, working through the central idea should help familiarize you with it so you recognize applications that can use it later.

First, you will have to write a map-reduce framework. This framework will take a mapper and a reducer, along with input. It will apply the mapper to each input, collect the results, and pass them to the reducer. Require map-reduce-support.arr to get access to the following data structure:

data Tv-pair:

  | tv(tag :: Any, value :: Any)

end

This represents a tag-value pair, the fundamental unit of data for all map-reduce programs.

1 Your Map-Reduce

For your framework, you’ll need to define:

fun<A, B, M, N, O> map-reduce(

    input :: List<Tv-pair<A, B>>,

    mapper :: (Tv-pair<A, B> -> List<Tv-pair<M, N>>),

    reducer :: (Tv-pair<M, List<N>> -> Tv-pair<M, O>))

  -> List<Tv-pair<M, O>>

Note that the mapper function produces a List<Tv-pair<M, N>> while the reducer function takes a Tv-pair<M, List<N>>. You will have to figure out how to collect all of the N’s for each M so you can pass them to a reducer task.

Let’s consider this contract in terms of a concrete example. To help you with testing, the support code provides the mapper and reducer for one example: word count. In this case, wc-map requires a Tv-Pair containing two strings: the name of a file and the contents of the file. This Tv-Pair is mapped to another Tv-Pair containing a word, and a number representing the frequency with which the word occurs in a single file. This is finally reduced to a Tv-Pair which consists of the word and the frequency with which it occurs across all files. Concretely, the support code provides:
  • wc-map(file :: Tv-pair<String, String>) -> List<Tv-pair<String, Number>>

    This is the mapper. It takes a Tv-pair<String, String> representing a file, as described above, and produces a List<Tv-pair<String, Number>> in which the tag represents a word, and the value represents the number of times the word appears.

  • wc-reduce(word-counts :: Tv-pair<String, List<Number>>) -> Tv-pair<String, Number>

    This is the reducer. It takes a Tv-pair<String, List<Number>> in which the tag represents a word and the value is a list of word counts for that word. It produces a Tv-pair<String, Number> in which the tag is a word, and the value is the number of times that word appeared in all of the input files.

Once you’re satisfied with your map-reduce, use it to solve these problems.

2 Anagrams

Your input consists of a List<Tv-pair<String, String>> in which the tag is the file name, and the value is the string contents of the file (like in the word count example). For example, an input could be tv("words.txt", "star rats tarts"). Your output should be a List<Tv-pair<String, List<String>>> in which each tag represents a unique anagram group, and the value is a list of unique words that appear in the input that are all anagrams of each other. We will define anagrams to be strings made up of the exact same characters. Your functions should be named anagram-map and anagram-reduce.

3 Nile 2.0

If you thought you were done with nile, then you must be in de-nile!

recommend(title :: String, book-records :: List<Tv-pair<String, String>)

  -> Tv-pair<Number, List<String>>

takes a book title and a list of tagged files (like in Anagrams, except formatted according to the Nile spec) and produces a Tv-pair with the highest number of times the book was paired with one or more other books, and a list of those books.

popular-pairs(book-records :: List<Tv-pair<String, String>)

  -> Tv-pair<Number, List<String>>

takes a list of tagged files and produces a Tv-pair with the highest number of times any two books were paired with each other, and a list of the corresponding pairs.

Pairs should be formatted according to the original Nile spec.

4 Template Files

For your map-reduce sweep, we’d like you to start thinking about the assingment by writing some interesting tests using the provided wc-reducer and wc-mapper.

Initial Tests

Again, you will be writing your tests and implementation in one file. This file will contain the code for map-reduce, anagrams, and Nile.

Map-Reduce Implementation

5 Handing In

5.1 Initial Test Sweep

As with previous assignments, you will submit an initial test sweep. This is due 11:59 PM, Sunday, November 16th.

If you decide to opt-in for the test review, please complete your reviews no later than 11:59 PM, Monday, November 17th.

https://www.captain-teach.org/brown-cs019/assignments/

5.2 Implementation and Final Tests

Unlike previous assignments, you will not be submitting a separate test file. This does not mean that you will not be graded on your tests! Rather, you should write tests inside your implemetation itself, as you did earlier in the year.

To submit, return to the Captain Teach assignments page:

https://www.captain-teach.org/brown-cs019/assignments/

and click “Next Step” again. This time, upload a file named map-reduce.arr.

Note that you will submit twice from Captain Teach—once for JoinLists and once for MapReduce.