Everybody should do the first two sets of exercises (Data Design and Program Tracing), then one of the two Programming options (Standard or Advanced). Pick whichever programming option matches your level and/or interests.
Collaboration Policy: You may work on this assignment with others. Include a collaboration statement describing how you did the work.
Put your answers to this in a plain text file named email.txt.
Imagine that you were asked to build an email-system. The system has users, each of which has a real name, a username, and a password to the system. Each email message has a sender, one or more recipients, when it was sent, a subject, and its contents (ignore attachments for this assignment). For simplicity, we’ll assume that all senders and recipients are part of the same system (ie, that everyone has a gmail address, for example, rather than some in gmail and some in yahoo).
The system tracks which messages each recipient has read. A user can ask to see just their unread messages, or all of their messages.
The system needs to support people sending new messages, reading messages, and deleting messages. (We will ignore replies for now, treating each "reply" as a new message).
What data structures would you propose for maintaining information on users, email messages, and which messages have been read (by which recipients)? Your answer should specify which kind of data (lists, tables, data blocks, tuples, numbers, etc), including the types of any subcomponents or columns. You can think through this in either Pyret or Python (your choice). We just want to see how you envision organizing the data, with an eye towards supporting the operations that people want to do.
You are not being asked to write the code for the email system. This is only a question about the data structures you would use.
Put your answers to this in a plain text file named tracing.txt.
For each of the following two programs, show the memory contents at each point marked "memory point here" in the program (that point may be reached more than once, depending on how the program executes). Present your contents as a series of mappings from names to values (written in comments), in the form of:
x --> 5 |
y --> [1, 2, 3] |
z --> the same list that y points to |
The first program uses a simple for loop:
def concat_short(word_list : List[str]):
word = "";
for elem in word_list:
# memory point here
if len(elem) < 3:
word = word + elem
return(word)
concat_short(["a", "list", "has", "words"])
The second program revisits family trees in Python:
person = namedtuple("person", ["name", "mother", "father"])
unknown = namedtuple("unknown", [])
def count_named(a_name : str, ft):
# memory point here
if isinstance(ft, unknown):
return 0
else:
if a_name == ft.name:
return 1 \
+ count_named(a_name, ft.mother) \
+ count_named(a_name, ft.father)
else:
return count_named(a_name, ft.mother) \
+ count_named(a_name, ft.father)
sample_tree = person("Mia",
person("Mia",
unknown(),
person("John", unknown(), unknown())),
person("Abe",
person("Mia", unknown(), unknown()),
unknown()))
# added Thurs 7/27 after initial omission
count_named("Mia", sample_tree)
An environmental agency gathers and tracks data on air-quality. The agency collects readings of key air-quality factors from various locations, combines the data, and uses it to issue reports about which locations have concerning air-quality data. In the USA, air-quality is tracked through 5 measurements: this assignment will work with two of them: ozone levels, and sulfur dioxide (so2) levels.
Here are data types and tuples for air-quality readings. Use these as part of solving the problems that follow.
Date = NamedTuple("Date", [('month', int), ('day', int), ('year', int)]) |
Reading = NamedTuple("Reading", [('type', str), ('date', Date), |
('level', float), ('location', str)]) |
DateType = TypeVar('DateType', int, int, int) |
ReadingType = TypeVar('ReadingType', str, DateType, float, str) |
The type float is for non-integer numbers (like 3.14).
The agency stores readings about each of ozone and so2 in separate lists, each of which is sorted by date (most recent first). Create two variables named ozone_data and so2_data, each of which starts off empty.
Write a procedure (function that returns no answer) called register_data that takes in a list of ReadingType. The procedure adds the ozone readings from the list to ozone_data, and adds the so2 readings from the list to so2_data, making sure both of those lists are sorted before the procedure finishes. The procedure should only add readings that have valid dates (meaning dates for which the numbers make sense – Date(250, 45, 2014) does not, for example).
See the Tips and Hints section below for information on how to sort lists in Python.
Sometimes, the agency wants to look at the set of readings that occurred after a given date (such as the last week worth of readings). Write a function readings_after that takes a cutoff date and a list of readings sorted by date (from most to least recent) and returns a list of readings from the list that occur on or after the cutoff date.
Recall that the readings lists are sorted, so you should be able to do this without filtering the entire list. Instead, use a loop that exploits the sorted order of the input list.
If you need to access a specific element of a list, you use the following notation, where index is the position into the list that you want (counting from 0):
L = ["a", "b", "c"]
L[1] # evaluates to "b"
The agency wants a procedure that reports air-quality alerts arising from a given list (such as a list of readings from a given location in the past week). Write a procedure print_alerts that takes a list of readings and prints out a message for each reading whose level is too high. Ozone levels are too high once they are above .085 and so2 levels are too high once they are above .14 (this is a simplified version of the air-quality calculation).
A sample alert would look like
Alert for Houston
where Houston is the location in a reading whose levels were too high. I am not asking you to print the date because that gets into more advanced uses of print (but you are welcome to look up print and format strings if you want to know how to do this). As in Pyret, you can use + to concatenate strings. As long as your message includes the location, the message details are up to you.
If you don’t feel comfortable writing loops or assigning variables, do the standard problems instead. This problem set is more a test of your ability to write functions over complex data structures.
An IT company wants to be able to write programs that explore directories of files. Think of the directories (folders) on your own computer: you have folders, each of which may contain files and other folders. This is again a form of tree-shaped data, similar to the family trees that referenced people’s children.
The core tuples are as follows. subdirs is the list of folders within another folder. (I didn’t make types for these because I haven’t had time to figure out how to make recursive types in Python).
File = NamedTuple("File", [('name', str), ('size', int), ('contents', str)]) |
Folder = NamedTuple("Folder", [('name', str), ('subdirs', list), ('files', list)]) |
In writing your code, you might want to be able to check whether a value is from a particular named tuple. If you need this, you can use isinstance (which returns a boolean) as follows:
isinstance(3, int) |
isinstance(Time(12, 22), Time) |
Warning: Think about recursion here!. You can make this problem harder than necessary by working with global variables. Imagine how you would have done this in Pyret (when you didn’t have assignment), then replace each function that would have iterated over a list with a simple for loop. These problems are very natural with recursion; you should NOT need any global variables.
Write a function total_size that takes a (root) Folder and returns the total size of all files that are nested within that folder (or its subfolders, or their subfolders, etc).
Write a function has_contents that takes a string and a folder and returns a list of names of files nested within the folder whose contents contain (not necessarily exactly) the given string. Just return filenames; you don’t need to return full paths to the files.
Write a function add_file that takes a folder name, a file, and a root folder and puts the (new) file into the named folder (you may assume that only one folder with the given name is nested within the root folder).
Write a function find_path that takes a filename and a Folder and returns a path that gets to a folder with that name. If there are multiple files with the same name, return the path to any one of them. A path is a list of directory names followed by a file name such that opening the directories in order (from the first to the end) would get you to the file.
Sorting Lists
Python has a built-in operator for sorting lists. If you write
L = [2, 3, 1] |
L.sort() |
print(L) |
Python displays [1, 2, 3] – in other words, Python modifies the list to put the arguments in sorted order. When you have something more complicated that numbers, you have to tell Python which parts of the data to use for sorting, and in what order to consider the parts. Here is a function that sorts a list of dates into earliest to latest order (year most important, then month, then day):
def sort_by_date(date_list : List[Date]) -> None: |
date_list.sort(key = lambda d: (d.year, d.month, d.day)) |
To get the list sorted from latest to earliest order, we add an argument to sort telling it to use reverse order:
def sort_by_date(date_list) -> None: |
date_list.sort(key = lambda d: (d.year, d.month, d.day), |
reverse = True) |
You should be able to modify these procedures as needed to sort a list of readings by date, as required in this problem.
We’ll be looking for appropriate choice of loops or operators and careful attention to test cases in grading this assignment. Use the lighttest.py file from class for writing tests.
Submit a zip file with your files from parts 1 and 2, and a zip of your project folder from the programming questions.