CSCI0050 Python Programming and Data

CSCI0050 Homework #8: Python Programming with Variables and Data

Due: Monday July 29, 6pm

Handin as usual through the Google form

Everybody should do the first two sets of exercises (Data Design and Program Tracing), then one of the two Programming options (Standard or Advanced). Pick whichever programming option matches your level and/or interests. You don’t get extra points for trying the advanced exercise. At the same time, don’t shy away from the advanced problems for fear of hurting your grade – there’s some grading leeway on the advanced problems, as we want you to work on what’s best for your learning. If you are interested in the challenge and think you are up to it, give it a try.

Collaboration Policy: You may work on this assignment with others. Include a collaboration statement describing how you did the work.

Data Design Exercise (No Programming)

Put your answers to this in a plain text file named email.txt.

Imagine that you were asked to build an email-system. The system has users, each of which has a real name, a username, and a password to the system. Each email message has a sender, one or more recipients, when it was sent, a subject, and its contents (ignore attachments for this assignment). For simplicity, we’ll assume that all senders and recipients are part of the same system (ie, that everyone has a gmail address, for example, rather than some in gmail and some in yahoo).

The system tracks which messages each recipient has read. A user can ask to see just their unread messages, or all of their messages.

The system needs to support people sending new messages, reading messages, and deleting messages. (We will ignore replies for now, treating each "reply" as a new message).

What data structures would you propose for maintaining information on users, email messages, and which messages have been read (by which recipients)? Your answer should specify which kind of data (lists, tables, data blocks, tuples, numbers, etc), including the types of any subcomponents or columns. You can think through this in either Pyret or Python (your choice). We just want to see how you envision organizing the data, with an eye towards supporting the operations that people want to do.

You are not being asked to write the code for the email system. This is only a question about the data structures you would use.

Tracing Program Execution (No Programming)

Put your answers to this in a plain text file named tracing.txt.

For the following program, show the memory and environment (known names) contents every time execution reaches the a line marked "memory point here" (the same point may be reached more than once). Present your contents as a series of mappings from names to values (written in comments), in the form of:

ENVIRONMENT/KNOWN NAMES

x --> 5

y --> (loc 1001)

z --> (loc 1001)

MEMORY

(loc 1001) --> [1, 2, 3]

The program uses a simple for loop:

def concat_short(word_list : list):

word = "";

for elem in word_list:

if len(elem) <= 3:

# memory point #2

word = word + elem

return(word)

concat_short(["a", "list", "has", "words"])

# memory point 1

Python Programming: Standard

Put your work on these problems in a file called readings.py

An environmental agency gathers and tracks data on air-quality. The agency collects readings of key air-quality factors from various locations, combines the data, and uses it to issue reports about which locations have concerning air-quality data. In the USA, air-quality is tracked through 5 measurements: this assignment will work with two of them: ozone levels, and sulfur dioxide (so2) levels.

Here is a dataclass for air-quality readings. Use this as part of solving the problems that follow.

from dataclasses import dataclass

from datetime import date

@dataclass

class Reading:

type: str # what kind of reading (ozone, so2, etc)

when: date # what date was it taken

level: float # the numeric reading

location: str # where was the reading taken

The agency stores readings about each of ozone and so2 in separate lists, each of which is sorted by date (most recent first). Create two variables named ozone_data and so2_data, each of which starts off empty.
Write a procedure (the technical name for a function that doesn’t return anything) called register_data that takes in a list of Reading. The procedure adds the ozone readings from the list to ozone_data, and adds the so2 readings from the list to so2_data, making sure both of those lists are sorted (most to least recent) before the procedure finishes.
See the Tips and Hints section below for information on how to sort lists in Python.
Sometimes, the agency wants to look at the set of readings that occurred after a given date (such as the last week worth of readings). Write a function readings_after that takes a cutoff date and a list of readings sorted by date (from most to least recent) and returns a list of readings from the list that occur on or after the cutoff date.
Recall that the readings lists are sorted, so you should be able to do this without filtering the entire list. Instead, use a loop that exploits the sorted order of the input list, and returns as soon as it finishes.
Python note: You can compare two dates in Python using the usual <, >, <=, >= operations as on numbers. For example,
  >>> date(2011, 1, 2) > date(2011, 2, 3)
  False
Python note: If you need to access a specific element of a list, you use the following notation, where index is the position into the list that you want (counting from 0):
  L = ["a", "b", "c"]
  L[1]     # evaluates to "b"
The agency wants a procedure that reports air-quality alerts arising from a given list (such as a list of readings from a given location in the past week). Write a procedure print_alerts that takes a list of readings and prints out (not returns, just prints) a message for each reading whose level is too high. Ozone levels are too high once they are above .085 and so2 levels are too high once they are above .14 (this is a simplified version of the air-quality calculation).
A sample alert would look like
Alert for Houston on 2016-5-10
where Houston is the location in a reading for which some level was too high on 5/10/16. Don’t worry about duplicates – just iterate through the list and print alerts when appropriate.
Python note: You can build messages to print using a combination of the built-in function str, which converts its argument to a string, and + to concatenate strings. Here’s an example:
print("Today is " + str(date(2018, 7, 26)))

The environmental agency actually gets its data from weather sensors, which transmit their information over the internet. When data is sent over the internet, it comes "flat", not structured into dictionaries and other data structures. Therefore, real data usually has to get converted from some flat form (which could be a csv file) to the structured form we need for the rest of a computation.

The following list shows a flat(ter) form of readings data for Houston:

houston_raw = [[2016, 1, 6, "ozone", 1.0],

[2016, 1, 7, "so2", 0.07],

[2016, 1, 8, "so2", 0.15],

[2016, 1, 2, "ozone", 0.08]]

Write a function convert_data that takes raw data as shown here and the location name, and converts it into a list of readings. For example, running

convert_data(houston_raw, "Houston")

should produce the same list as having defined the Houston readings as follows:

houston_readings = [Reading("ozone", date(2016, 1, 6), 1.0, "Houston"),

Reading("so2", date(2016, 1, 7), 0.07, "Houston"),

Reading("so2", date(2016, 1, 8), 0.15, "Houston"),

Reading("ozone", date(2016, 1, 2), 0.08, "Houston")]

Again, see tips and hints for info on how to extract list elements at specific positions within the list.

Python Programming: Optional Challenge

If you choose do to these problems, put your work in a file called folders.py

If you don’t feel comfortable writing loops or assigning variables, do the standard problems instead. This problem set is more a test of your ability to write functions over complex data structures. These problems are NOT required.

An IT company wants to be able to write programs that explore directories of files. Think of the directories (folders) on your own computer: you have folders, each of which may contain files and other folders. This is again a form of tree-shaped data.

More concretely:

A file has a name, size (integer), and contents (string)
A folder has a name, list of files, and list of (sub) folders

For example, my laptop has a folder named "CS50", which has a subfolder named "Python", which has files "todolist.py" and "testlight.py". The contents of the files are irrelevant (any string will do).

You will create data for files and folders, then write some programs to process them.

Warning: Think about recursion here!. That pattern provides a natural way to organize your code for these problems.

Use dictionaries to create data structures for files and folders, and make a couple of example directories for use in testing.
Write a function total_size that takes a (root) folder and returns the total size of all files that are nested within that folder (or its subfolders, or their subfolders, etc).
Write a function has_contents that takes a string and a folder and returns a list of names of files nested within the folder whose contents contain (not necessarily exactly) the given string. Just return filenames; you don’t need to return full paths to the files.
Write a function add_file that takes a folder name, a file, and a root folder and puts the (new) file into the named folder (you may assume that only one folder with the given name is nested within the root folder).
Write a function find_path that takes a filename and a Folder and returns a path that gets to a folder with that name. If there are multiple files with the same name, return the path to any one of them. A path is a list of directory names followed by a file name such that opening the directories in order (from the first to the end) would get you to the file.

Tips and Hints

Sorting Lists

Python has a built-in operator for sorting lists. If you write

L = [2, 3, 1]

L.sort()

print(L)

Python displays [1, 2, 3] – in other words, Python modifies the list to put the arguments in sorted order. When you have something more complicated that numbers, you have to tell Python which parts of the data to use for sorting, and in what order to consider the parts. Here is a function that sorts a list of dates into earliest to latest order (year most important, then month, then day):

def sort_by_date(date_list : list) -> None:

date_list.sort(key = lambda d: (d.year, d.month, d.day))

To get the list sorted from latest to earliest order, we add an argument to sort telling it to use reverse order:

def sort_by_date(date_list) -> None:

date_list.sort(key = lambda d: (d.year, d.month, d.day),

reverse = True)

You should be able to modify these procedures as needed to sort a list of readings by date, as required in this problem.

Grading Expectations

For the data design exercise, we want to see whether the data organization you choose captures all the relevant information, and does so in suitable data structures. There isn’t a single correct answer to that problem. We are looking for you to have a reasonable answer, as judged by the two criteria in the previous sentence.

For the programming problems, we’ll be looking at your functionality (does your code pass our test cases), design (did you pick appropriate constructs and break your work down properly into separate functions), and testing (do you have tests for all functions, and cover more than just the trivial cases). Review your feedback on previous assignments to see what we have been looking for in grading.

Use the testlight.py file from class for writing tests.

What to Turn In

Handin as usual through the Google form