Every individual student must turn in their own separate sweep.
Each pair turns in a single final submission.
You will have to implement your own version of the essence of Google’s MapReduce.
One of the things you will discover is that their operator is horribly mis-named: it isn’t the composition of map and reduce. Nevertheless, we’ll stick to their naming to respect (unconventional) convention.
The MapReduce pattern allows you to divide problems into a series of independent, equivalent tasks that can be parallelized. Specifically, all the map tasks can be run at the same time, as can all of the reduce tasks, because the result of each task does not depend on any of the other tasks. Because your implementation will run on only a single processor, you won’t see any performance gains. However, working through the central idea should help familiarize you with it so you recognize applications that can use it later.
| tv(tag :: Any, value :: Any)
fun<A, B, M, N, O> map-reduce(
input :: List<Tv-pair<A, B>>,
mapper :: (Tv-pair<A, B> -> List<Tv-pair<M, N>>),
reducer :: (Tv-pair<M, List<N>> -> Tv-pair<M, O>))
-> List<Tv-pair<M, O>>
A word count mapper:
file :: Tv-pair<String, String>)
-> List<Tv-pair<String, Number>>
wc-map consumes a file, which is a Tv-pair with the filename as its tag and the file’s contents as its value. It produces a List<Tv-pair> where the tags are word tokens and the values are the corresponding counts of the tokens. In this case, wc-map produces the count 1 for every word token, because it counts words as independent tokens as it encounters them.
A word count reducer:
word-counts :: Tv-pair<String, List<Number>>)
-> Tv-pair<String, Number>
It takes a Tv-pair<String, List<Number>> in which the tag represents a word and the value is a list of word counts for that word (in this case, as we know, a list of 1s, though it won’t always be so simple!). It produces a Tv-pair<String, Number> in which the tag is a word, and the value is the number of times that word appeared in all of the input.
It’s the reducer’s job to sum the word token counts into a word type count. wc-map may be run independently on each file, and wc-reduce may be run independently on each word. This is the division of the problem into a series of equivalent tasks which can be run at the same time, and from here we gain the performance advantage over the undivided problem.
Once you and your partner are satisfied with your map-reduce, use it to solve the following problems.
Your input consists of a List<Tv-pair<String, String>> in which the tag is the file name, and the value is the string contents of the file (like in the word count example). For example, an input could be tv("words.txt", "star rats tarts"). Your output should be a List<Tv-pair<String, List<String>>> in which each tag represents a unique anagram group, and the value is a list of unique words that appear in the input that are all anagrams of each other. We will define anagrams to be strings made up of the exact same characters (case-sensitive). Your functions should be named anagram-map and anagram-reduce.
If you thought you were done with Nile, then you must be in de-nile!
recommend(title :: String,
book-records :: List<Tv-pair<String, String>)
-> Tv-pair<Number, List<String>>
popular-pairs(book-records :: List<Tv-pair<String, String>>)
-> Tv-pair<Number, List<String>>
Pairs should be formatted according to the original Nile spec.
How would JoinLists be useful for MapReduce? (One paragraph or less.)
Give an example of another problem that you could use MapReduce to solve beyond Anagrams and Nile. Provide the contracts for your input data, mapper, and reducer.
For your MapReduce sweep, we’d like you to start thinking about the assignment by writing some interesting tests using the provided wc-reducer and wc-mapper.
Remember that this assignment requires you to hand in sweeps.
To submit, return to the Captain Teach assignments page:
and click “Next Step” again. Upload a zip file containing both your implementation, named map-reduce.arr, and your tests. At the next step, please upload your analysis.
Note that you will submit twice from Captain Teach—