Creating a class

Developing incrementally

When I’m programming, I often bite off more than I can chew: I write a whole function, or even several functions, and only then do I run anything to see if it works. Usually it doesn’t! Then I have to go back and figure out where exactly I made a mistake. I’d have had a better time if I tested things as I coded, making sure individual expressions worked as expected.

So, do as I say and not as I do: test frequently! These tests could be actual written tests, but could also just be little experiments in the Python console (which you could later turn into tests).

Objects for tables

In CSCI 0111, we spent a few weeks of class working with table data. Tables were sort of like spreadsheets, with named columns and an arbitrary amount of rows. For example, we had tables like:

State Capitol Population
VT Montpelier 600,000
RI Providence 1,000,000

There were functions on tables to add columns, do sums and averages on rows, etc.

Python doesn’t have such a data structure. Let’s say we wanted to implement some of the CSCI 0111 exercises in Python, and wanted to create table objects. How would we implement them?

First, we’ll make a class:

class Table:
  pass

Notice that we haven’t put @dataclass at the top of our class definition. As it turns out, dataclass is just a way of making Python classes behave a bit more like Pyret datatypes. We won’t use it for the rest of the class, but you’ll learn enough about Python’s objects to understand what it was doing.

The first thing we’ll often implement when writing a new class is the class’s constructor. This is the method that gets called to initialize a new instance of the class in question. It’s written like this:

class Table:
  def __init__(self):
    pass

That’s right: __init__ (those are two underscores on each side). There are things I really like about Python; this naming scheme is not one of them.

Now that we are filling out __init__, we have a decision to make. How should we actually store our table data?

There are a couple of choices here. We could store a list of column names and then store rows as dictionaries. We could store each column as a list. There are other options!

Let’s store the column names as a list and the rows as dictionaries. We probably want to take the column names as an argument when the table is built. We can write that like this:

class Table:
  def __init__(self, colnames: list):
    self.colnames = colnames
    self.rows = []

Notice that we haven’t actually defined the colnames and rows fields–we’re just adding them to the object in the __init__ method.

Now we probably want to implement some methods! For now we’re just going to do a couple: a method to add a row and a method to take the sum of particular column.

class Table:
  def __init__(self, colnames: list):
    self.colnames = colnames
    self.rows = []

  def add_row(self, row: dict):
    if any([colname not in row for colname in self.colnames]):
      raise Exception("Column missing")
    self.rows.append(row)

  def sum(self, colname: str):
    if colname not in self.colnames:
      raise Exception("Bad column")
    return sum([r[colname] for r in self.rows])

Neat, right? What do we think that any line is doing? It evaluates to true if there’s any column name that isn’t in the row.

We can write a function that uses this table implementation:

def table_function():
  table = Table(['name', 'cookies'])
  table.add_row({'name': 'Doug', 'cookies': 3})

  table2 = Table(['state', 'capital', 'population'])
  table2.add_row({'state': 'Vermont', 'capital': 'Montpelier', 'population': 600000})
  table.add_row({'state': 'Rhode Island', 'capital': 'Providence', 'population': 1100000})

  print(table.sum('cookies'))
  print(table2.sum('population'))

In CSCI 0111, we learned about the program dictionary and the memory. What do they look like before Python executes the print statements?

Dictionary name value
  table loc 1
  table2 loc 5
Memory location value
  loc 1 Table(colnames=loc 2, rows=loc 3
  loc 2 ["name", "cookies"]
  loc 3 [loc 4]
  loc 4 {'name': 'Doug', 'cookies': 3}
  loc 5 Table(colnames=loc 6, rows=loc 7)
  loc 7 ["state", "capital", "population"]
  loc 8 [loc 9, loc 10]
  loc 9 {"state": "Vermont", "capital": "Montpelier", "population": 600000}
  loc 10 {"state": "Rhode Island", "capital": "Providence", "population": 1100000}

What does the program dictionary look like inside the first call to sum? We add names for the function’s arguments, self and colname:

Dictionary name value
  table loc 1
  table2 loc 5
  self loc 1
  colname “cookies”

How about inside the second call to sum?

Dictionary name value
  table loc 1
  table2 loc 5
  self loc 5
  colname “population”

Changing the way data are represented

We could have implemented tables differently. We often access data by column rather than by row (for instance, in the sum function). How would we alter our Table class to make this change? We could do something like this:

class Table:
  def __init__(self, colnames: list):
    self.columns = {colname: [] for colname in colnames}

  def add_row(self, row: dict):
    if any([colname not in row for colname in self.columns]):
      raise Exception("Column missing")
    for colname in row:
      self.columns[colname].append(row[colname])

  def sum(self, colname: str):
    if colname not in self.columns:
      raise Exception("Bad column")
    return sum(self.columns[colname])

Notice that we’ve totally changed the way our data are represented, but if we re-run our program, it will behave identically–as long as we didn’t ever access colnames or rows from outside the class!