Comprehensions, breaking down problems

Testing functions that modify memory

We spent a little while talking about how to test functions that modify memory (for another look at this material, see the end of these 111 notes).

Let’s say we have this (strange, useless) function:

def add_2_to_list(l: list):
  l.append(2)

How can we test it? We can’t look at what it returns, since it doesn’t return anything (i.e., it returns None). So to test what it does, we’ll need to create a list, call it on that list, and then assert something about the list. Something like this:

def test_add_2():
  l = []
  add_2_to_list(l)
  assert l == [2]

In lecture, we also talked about multiple tests on the same data, and what’s going on in memory when we run this test. See lecture capture for details.

Comprehensions

Let’s say we wanted to build a set of all of the words in a text that start with a capital letter, in uppercase. We can call our function cast_of_characters. Let’s go ahead and write some tests for this first:

def cast_of_chars(txt: str) -> set:
  pass

# in wordcount_test.py:

def test_cast_of_chars():
  assert cast_of_chars("") == set()
  assert cast_of_chars("Jessica") == {"JESSICA"}
  assert cast_of_chars("'hello,' said Jessica") == {"JESSICA"}
  assert cast_of_chars("hello Jessica hello Brantley hello Jessica") == {"JESSICA", "BRANTLEY"}

The function body might look something like this:

def cast_of_chars(txt: string) -> set:
    words = txt.split()
    s = set()
    for word in words:
        if word[0].isupper():
            s.add(word.upper())
    return s

Cool! But: there’s a much more concise way to write it.

Comprehensions let us write for-loops that build sets, hashtables, or lists in one line. It looks like this:

def cast_of_chars(txt: string) -> set:
    words = txt.split()
    s = {word.upper() for word in words if word[0].isupper()}
    return s

Neat, right?

The most basic comprehension looks like this:

[x for x in l]

This loops over l and creates a list (because we’ve used square brackets) of every element. We could do something else to x:

[x + 1 for x in l]

And we can add a conditional:

[x + 1 for x in l if x > 4]

We can also build hashtables with comprehensions:

{x: x + 1 for x in l if x > 4}

You don’t have to use comprehensions in your own code just yet, but you can if you want. I’ll be using them in lecture, so I wanted to introduce them now.

Breaking down problems

Let’s say we have a problem we want to solve by writing a program. How should we start?

I would recommend starting by breaking down the problem into smaller pieces. Then we can solve each subproblem, and combine these solutions together into a solution to the whole problem. Some subproblems can be solved by writing helper functions, and we’ve seen a number of examples of this. Other times, subproblems correspond to calls to built-in functions, or particular variables we keep track of.

Rainfall problem

Let’s say we are tracking daily rainfall in a particular location, and we want to compute the average rainfall over the period for which we have useful sensor readings. Our rainfall sensor is a bit unreliable, and reports data in a weird format (both of these problems are things you’re likely to encounter when dealing with real-world data!). In particular, our sensor data is a list of numbers like:

sensor_data = [1, 6, -2, 4, -999, 4, 5]

The -999 represents the end of the period we’re interested in. The other negative numbers represent sensor error–we can’t really have a negative amount of rainfall. So we want to take the average of the non-negative numbers in the input list before the -999. How would we solve this problem? What are the subproblems?

Finding the list segment before the -999
Filtering out the negative values
Computing the average of the positive rainfall days

Next time, we’ll talk about how to actually solve this problem.