HW 1: Python Refresher

Due: September 15, 2020 at 9:00PM EST.

The goal of this assignment is to refresh your memory of how to write Python programs, to get practice organizing data for particular computations, and to get practice working with CSV files.

The Assignment

Congratulations! You’ve been selected as a contestant on this year’s edition of The Amazing Race!!! You can’t wait to travel the world, but the producers (your TAs and Professor) need your help to plan the competiton.

Before producing the Amazing Race, the course staff were actually extremely succesful musicians, writing hits like “I’m Testing and I Know It,” “All I Want for Christmas is Loops,” and “Pumped Up Lists.” Since they’re THAT succesful, they’ve played many world tours. They made a CSV (comma-separated-value) file containing all the cities they’ve toured in, and how many concerts they’ve played there.

The producers have asked you to analyze data from the CSV file to help them decide where to send you in the Amazing Race.

In the concerts.csv file, every entry is a course staff member’s name, then a city they’ve been too, then the number of concerts they played when they visited.

Here’s an example of what that file could look like (the real file is much longer):

Jessica,Oslo,4
Jessica,Cali,1
Brantley,Dakar,1
Doug,Cali,2

Notice that any musician could have visited multiple cities, and any given city could be visited by multiple musicians. In addition, any musician can visit a city multiple times.

Your goal is to write functions to answer the following questions (and in parentheses, answers for the example table given above):

  • What is the most played city, by the total number of times concerts have been played there (by our course staff as a whole)? (Oslo)
  • Which musician had the widest-ranging tours (i.e, has visited the most cities)? (Jessica)
  • How many total concerts did each musician play, on average? (2.6666…)
  • Which cities have been visited by only one musician? (Oslo, Dakar)

Note that the answers will be different on the real CSV file!

You’ll also need to write a function to take the CSV data and load it into a structure which is helpful for answering these questions.

Details

The concerts.csv file can be found here.

You’ll write your implementation code in a file named concerts.py. Here’s some code you can copy in to get started:

import csv

def load_data(filename: str):
    """load a CSV file into a useful format for computation"""
    file = open(filename, encoding="utf8")
    reader = csv.reader(file)
    data = {}
    for row in reader:
        # TODO: do something useful with each row
        # for instance,  add or modify an entry in a dictionary
        # row is a list of data--for instance,
        #   on the first row in the example data, it's:
        #     ["Jessica", "Oslo", "4"]
        pass
    return data

# TODO: implement this function
def most_played(data) -> str:
    """return the city with the highest total
    number of concerts played by our course staff"""
    pass

# TODO: implement this function
def widest_ranging(data) -> str:
    """return the name of the musician who  visited the most different cities"""
    pass

# TODO: implement this function
def average_concerts(data) -> float:
    """return the average total number of concerts played per musician"""
    pass

# TODO: implement this function
def only_once(data) -> list:
    """return all of the cities visited by exactly one musician"""
    pass

# This code allows the program to be run as a script.
# You shouldn't need to modify it!
if __name__ == '__main__':
    import sys
    # the first argument to the script is the filename
    filename = sys.argv[1]
    # the second argument to the script is the name of the function
    function_name = sys.argv[2]
    data = load_data(filename)
    if function_name == "most-played":
        print(most_played(data))
    elif function_name == "widest-ranging":
        print(widest_ranging(data))
    elif function_name == "average-concerts":
        print(average_concerts(data))
    elif function_name == "only-once":
        print(only_once(data))
    else:
        print("Unknown function name")

The first thing you should do is to decide how you want to structure your data. There are multiple approaches that will work fine, and some that are less good – think about how you will want to access the data in order to write each function! Then, complete the load_data function, which takes in the name of a CSV file and should return your structured data. Something that might be useful: you can convert a str to an int with the int function; for instance, int("4") == 4.

For your other functions, you don’t need to worry about CSV files–each one takes in your structured data. You should complete each function as specified:

  • most_played should return the name of the city where our staff played the most concerts overall.
  • widest_ranging should return the name of the musician who has visited the most cities (regardless of the number of visits they have made to each).
  • average_concerts should return the average number of concerts played per musician (so, the total number of concerts divided by the number of musicians).
  • only_once should return a list of the cities that were only visited by one musician (regardless of the number of times that musician visited, or the number of concerts they played there).

You may find that once you start writing your analysis functions you want to change the way your data are structured – this is expected! Keep in mind, though, that every analysis function needs to use the same structure.

The bottom of the provided code (starting with if __name__ == '__main__') lets you run your code as a script from the terminal. It calls your load_data function to read in CSV data, then calls one of your other functions and prints the result. So in order to get the average concerts played in the concerts.csv file, you can run the following command in the terminal:

python3 concerts.py concerts.csv average-concerts

If you’re not sure what this means, don’t worry – it will be covered in the first lab!

Testing

You should write tests for your functions in concerts_test.py. You should include tests for all of your analysis functions; you don’t need to write tests for load_data.

With Pytest, tests are written as Python functions in a testing file (in this case, concerts_test.py). For example, to test your only_once function you’d write something like:

def test_only_once():
  assert only_once(...) == ["San Salvador", "Christchurch"]
  assert only_once(...) == []

Remember: each analysis function takes as its argument whatever data structure you’ve decided to use to represent the concert data. If you pass in a CSV string or something similar, they will not work!

Note: your load_data function should work with any CSV file containing musicians, cities, and concert numbers, not just the concerts.csv file.

See this Guide for help with getting PyTest working with PyCharm.

Code style

Please follow these Python testing and clarity guidelines.

Readme

You should include a README.txt with answers to the following questions:

  • How did you structure your data? How did you decide on this structure?
  • Did you end up needing to change the structure once you started writing your analysis functions?
  • Would any of the functions have been easier to write if you had chosen a different structure?
  • Did you discuss this assignment with any other students? Please list their cs logins.
  • How many late days are you using on this assignment?

The README template can be found here.

Handin

Hand in your work on Gradescope. Look at this guide for directions.

You may submit as many times as you want. Only your latest submission will be graded. This means that if you submit after the deadline, you will be using a late day – so do NOT submit after the deadline unless you plan on using late days.

Please don’t put your name anywhere in any of the handin files – we grade assigments anonymously!

Don’t forget to follow the design and clarity guide!

After completing the homework, you will submit:

  • README.txt
  • concerts.py
  • concerts_test.py