Map-Reduce
You will have to implement your own version of the essence of Google’s MapReduce.
One of the things you will discover is that their operator is horribly mis-named: it isn’t the composition of map and reduce. Nevertheless, we’ll stick to their naming to respect (unconventional) convention.
The map-reduce pattern allows you to divide problems into a series of independent, equivalent tasks that can be parallelized. Specifically, all the map tasks can be run at the same time, as can all of the reduce tasks, because the results of each task does not depend on any of the other tasks. Because your implementation will run on only a single processor, you won’t see any performance gains. However, working through the central idea should help familiarize you with it so you recognize applications that can use it later.
data Tv-pair:
| tv(tag :: Any, value :: Any)
end
1 Your Map-Reduce
fun<A, B, M, N, O> map-reduce(
input :: List<Tv-pair<A, B>>,
mapper :: (Tv-pair<A, B> -> List<Tv-pair<M, N>>),
reducer :: (Tv-pair<M, List<N>> -> Tv-pair<M, O>))
-> List<Tv-pair<M, O>>
wc-map(file :: Tv-pair<String, String>) -> List<Tv-pair<String, Number>>
This is the mapper. It takes a Tv-pair<String, String> representing a file, as described above, and produces a List<Tv-pair<String, Number>> in which the tag represents a word, and the value represents the number of times the word appears.
wc-reduce(word-counts :: Tv-pair<String, List<Number>>) -> Tv-pair<String, Number>
This is the reducer. It takes a Tv-pair<String, List<Number>> in which the tag represents a word and the value is a list of word counts for that word. It produces a Tv-pair<String, Number> in which the tag is a word, and the value is the number of times that word appeared in all of the input files.
Once you’re satisfied with your map-reduce, use it to solve these problems.
2 Anagrams
Your input consists of a List<Tv-pair<String, String>> in which the tag is the file name, and the value is the string contents of the file (like in the word count example). For example, an input could be tv("words.txt", "star rats tarts"). Your output should be a List<Tv-pair<String, List<String>>> in which each tag represents a unique anagram group, and the value is a list of unique words that appear in the input that are all anagrams of each other. We will define anagrams to be strings made up of the exact same characters. Your functions should be named anagram-map and anagram-reduce.
3 Nile 2.0
If you thought you were done with nile, then you must be in de-nile!
recommend(title :: String, book-records :: List<Tv-pair<String, String>)
-> Tv-pair<Number, List<String>>
popular-pairs(book-records :: List<Tv-pair<String, String>)
-> Tv-pair<Number, List<String>>
Pairs should be formatted according to the original Nile spec.
4 Template Files
For your map-reduce sweep, we’d like you to start thinking about the assingment by writing some interesting tests using the provided wc-reducer and wc-mapper.
Again, you will be writing your tests and implementation in one file. This file will contain the code for map-reduce, anagrams, and Nile.
5 Handing In
5.1 Initial Test Sweep
As with previous assignments, you will submit an initial test sweep. This is due 11:59 PM, Sunday, November 16th.
If you decide to opt-in for the test review, please complete your reviews no later than 11:59 PM, Monday, November 17th.
https://www.captain-teach.org/brown-cs019/assignments/
5.2 Implementation and Final Tests
Unlike previous assignments, you will not be submitting a separate test file. This does not mean that you will not be graded on your tests! Rather, you should write tests inside your implemetation itself, as you did earlier in the year.
To submit, return to the Captain Teach assignments page:
https://www.captain-teach.org/brown-cs019/assignments/
and click “Next Step” again. This time, upload a file named map-reduce.arr.
Note that you will submit twice from Captain Teach—