Creating a class
Developing incrementally
When I’m programming, I often bite off more than I can chew: I write a whole function, or even several functions, and only then do I run anything to see if it works. Usually it doesn’t! Then I have to go back and figure out where exactly I made a mistake. I’d have had a better time if I tested things as I coded, making sure individual expressions worked as expected.
So, do as I say and not as I do: test frequently! These tests could be actual written tests, but could also just be little experiments in the Python console (which you could later turn into tests).
Objects for tables
In CSCI 0111, we spent a few weeks of class working with table data. Tables were sort of like spreadsheets, with named columns and an arbitrary amount of rows. For example, we had tables like:
State | Capitol | Population |
---|---|---|
VT | Montpelier | 600,000 |
RI | Providence | 1,000,000 |
… | … | … |
There were functions on tables to add columns, do sums and averages on rows, etc.
Python doesn’t have such a data structure. Let’s say we wanted to implement some of the CSCI 0111 exercises in Python, and wanted to create table objects. How would we implement them?
First, we’ll make a class:
class Table: pass
Notice that we haven’t put @dataclass
at the top of our class definition. As it
turns out, dataclass
is just a way of making Python classes behave a bit more
like Pyret datatypes. We won’t use it for the rest of the class, but you’ll
learn enough about Python’s objects to understand what it was doing.
The first thing we’ll often implement when writing a new class is the class’s constructor. This is the method that gets called to initialize a new instance of the class in question. It’s written like this:
class Table: def __init__(self): pass
That’s right: __init__
(those are two underscores on each side). There are
things I really like about Python; this naming scheme is not one of them.
Now that we are filling out __init__
, we have a decision to make. How should
we actually store our table data?
There are a couple of choices here. We could store a list of column names and then store rows as dictionaries. We could store each column as a list. There are other options!
Let’s store the column names as a list and the rows as dictionaries. We probably want to take the column names as an argument when the table is built. We can write that like this:
class Table: def __init__(self, colnames: list): self.colnames = colnames self.rows = []
Notice that we haven’t actually defined the colnames
and rows
fields–we’re
just adding them to the object in the __init__
method.
Now we probably want to implement some methods! For now we’re just going to do a couple: a method to add a row and a method to take the sum of particular column.
class Table: def __init__(self, colnames: list): self.colnames = colnames self.rows = [] def add_row(self, row: dict): if any([colname not in row for colname in self.colnames]): raise Exception("Column missing") self.rows.append(row) def sum(self, colname: str): if colname not in self.colnames: raise Exception("Bad column") return sum([r[colname] for r in self.rows])
Neat, right? What do we think that any
line is doing? It evaluates to true if
there’s any column name that isn’t in the row.
We can write a function that uses this table implementation:
def table_function(): table = Table(['name', 'cookies']) table.add_row({'name': 'Doug', 'cookies': 3}) table2 = Table(['state', 'capital', 'population']) table2.add_row({'state': 'Vermont', 'capital': 'Montpelier', 'population': 600000}) table.add_row({'state': 'Rhode Island', 'capital': 'Providence', 'population': 1100000}) print(table.sum('cookies')) print(table2.sum('population'))
In CSCI 0111, we learned about the program dictionary and the memory. What do
they look like before Python executes the print
statements?
Dictionary | name | value |
---|---|---|
table |
loc 1 |
|
table2 |
loc 5 |
|
Memory | location | value |
loc 1 |
Table(colnames=loc 2, rows=loc 3 |
|
loc 2 |
["name", "cookies"] |
|
loc 3 |
[loc 4] |
|
loc 4 |
{'name': 'Doug', 'cookies': 3} |
|
loc 5 |
Table(colnames=loc 6, rows=loc 7) |
|
loc 7 |
["state", "capital", "population"] |
|
loc 8 |
[loc 9, loc 10] |
|
loc 9 |
{"state": "Vermont", "capital": "Montpelier", "population": 600000} |
|
loc 10 |
{"state": "Rhode Island", "capital": "Providence", "population": 1100000} |
What does the program dictionary look like inside the first call to sum
? We
add names for the function’s arguments, self
and colname
:
Dictionary | name | value |
---|---|---|
table |
loc 1 |
|
table2 |
loc 5 |
|
self |
loc 1 |
|
colname |
“cookies” |
How about inside the second call to sum
?
Dictionary | name | value |
---|---|---|
table |
loc 1 |
|
table2 |
loc 5 |
|
self |
loc 5 |
|
colname |
“population” |
Changing the way data are represented
We could have implemented tables differently. We often access data by column
rather than by row (for instance, in the sum
function). How would we alter our
Table
class to make this change? We could do something like this:
class Table: def __init__(self, colnames: list): self.columns = {colname: [] for colname in colnames} def add_row(self, row: dict): if any([colname not in row for colname in self.columns]): raise Exception("Column missing") for colname in row: self.columns[colname].append(row[colname]) def sum(self, colname: str): if colname not in self.columns: raise Exception("Bad column") return sum(self.columns[colname])
Notice that we’ve totally changed the way our data are represented, but if we
re-run our program, it will behave identically–as long as we didn’t ever access
colnames
or rows
from outside the class!