Python: dataclasses

HW 8 preview

Review of last time

Last time, we ended with this function:

def zs_in_list(lst: list) -> int:
    count = 0
    for s in lst:
        for c in s:
            if c == 'z':
                count = count + 1
    return count

Let’s step through this call to the function:

> zs_in_list(["pizza", "dog", "adze"])

What would happen if we indented the return statement once? Twice? Thrice?

Dataclasses

We spent a while talking about Pyret’s datatypes. Datatypes give us a way of storing related data together. How can we do the same thing in Python? We’ll use something called dataclasses.

from dataclasses import dataclass
from datetime import date

# Pyret:
# data TodoItem:
#   | todo(deadline :: Date, tags :: List<String>, description :: String)
# end

@dataclass
class TodoItem:
    deadline: date
    tags: list
    description: str
    done: bool

Don’t worry too much about the class keyword. We’ll talk much more about what it means in 112 and 18.

Here we’re defining a dataclass called TodoItem with three components: a deadline, a list of tags, and a description. Unlike in Pyret, there’s no distinction between the name of the dataclass and the name of its constructor; we can build a TodoItem like so:

> TodoItem(date(2019, 11, 8), ["class"], "Prepare for CSCI 0111", False)

This means that we can’t have a dataclass with multiple constructors in the same way we could in Pyret. Python has other idioms for data with multiple shapes, which we’ll see in future CS classes.

Let’s build some TODO items:

class_item = TodoItem(date(2019, 11, 8), ["school", "class"], "Prepare for CSCI 0111", False)
avocado_item = TodoItem(date(2019, 11, 13), ["home", "consumption"], "Eat avocado", False)
birthday_item = TodoItem(date(2019, 11, 20), ["home", "friends"], "Buy present for friend", False)

todo_list = [class_item, avocado_item, birthday_item]

We can look at the members of our TODO list:

> todo_list[0]
> todo_list[0].description
> todo_list[2].deadline
> todo_list[3]
> todo_list[2].abc

We can write a function to see if a TODO item is past due:

def past_due(item: TodoItem, today: date) -> bool:
    return item.deadline > today and not item.done

We can test this function:

# in test_todo.py
from todo import *
import pytest

def test_past_due():
  assert past_due(TodoItem(date(2019, 11, 8), ["class"], "Prepare for CSCI 0111", False),
                  date(2019, 11, 7)) == True

So we can access the components of a dataclass with dot-notation, just like we did in Pyret.

Functions on Todo lists

def find_items_by_description(todo_list: list, descr: str) -> list:
  """return all items whose description matches descr"""
  return list(filter(lambda item: descr in item.description, todo_list))

def find_items_by_tag(todo_list: list, tag: str) -> list:
  """return all items tagged with tag"""
  return list(filter(lambda item: tag in item.tags, todo_list))

Notice that in is doing a different thing in each of these functions.

We can modify our TODO list:

def remove_finished(todo: list):
    """remove completed items from the TODO list"""
    completed_items = list(filter(lambda item: item.done == True, todo))
    for item in completed_items:
       todo_list.remove(item)

We can test this function:

def test_remove_finished():
    lst = [TodoItem(date(2019, 11, 8), [], "a", False), TodoItem(date(2019, 11, 20), ["a"], "b", True)]
    remove_finished(lst)
    assert lst == [TodoItem(date(2019, 11, 8), [], "a", False)]

We’ve defined a todo_list variable in todo.py–why not use that variable in our test?