Introduction to objects

Software engineering

In the first lecture of the course, we talked about two concepts the course will cover: algorithms and software engineering. For the first two weeks, we have focused on algorithms. We learned about insertion sort, an algorithm to sort a list. We learned some new programming techniques that can be used to implement algorithms, and got some practice going from high-level descriptions of algorithms to programs.

We’re now going to switch gears for a bit and focus on software engineering. Let’s look at the “shopping discount” problem we ended with last time. Here are two implementations of a checkout function to compute the total price of a user’s cart:

from dataclasses import dataclass

@dataclass
class CartItem:
    name: str
    price: int

def checkout1(cart: list):
  shoes = [item for item in cart if item.name == "shoes"]
  shoes_cost = sum([item.price for item in shoes])
  shoes_discount = 0
  if shoes_cost > 100:
    shoes_discount = shoes_cost * 0.20

  hats = [item for item in cart if item.name == "hats"]
  hats_discount = 0
  if len(hats) >= 2:
    hats_discount = 10

  return sum(map(lambda item: item.cost, cart)) - shoes_discount - hats_discount

def checkout2(cart: list):
  shoe_total = 0
  hats_seen = 0
  total = 0
  for item in cart:
    if item.name == "shoes":
      shoe_total += item.cost
    if item.name == "hats":
      hats_seen += 1
    total += item.cost
  if shoe_total >= 100:
    total -= shoe_total * 0.20
  if hats_seen >= 2:
    total -= 10
  return total

Which one is clearer? Which one do you think is more efficient?

To me at least, the top version is a bit clearer. On the other hand, the bottom version is probably more efficient–it only looks at every item in the list once. A good software engineering practice would be to start by writing the first version. If it turns out that efficiency is important–for instance, if it turned out that users frequently checked out with very large carts–we might replace it with the second version.

Let’s say we did just that–we decided to replace our checkout implementation with the more efficient version. What else in the program would have to change?

Nothing! Nothing else would have to change. The second version behaves identically to the first version–on any possible cart, they will return the same answer. This is part of why we separate our code into functions. Even if checkout is called in 10 different places in our code, we don’t have to modify any of them!

As we build up to larger programs, we will find this property of functions very useful. Once we have built and tested a function, we don’t need to worry about that function’s body any more–only what it does.

Objects

Objects are one way of extending this powerful idea–of being able to modify a part of a program in isolation from the rest–from functions to data. We’ll learn about objects over the next several class sessions, and will use them in the rest of the course.

Imagine you’re a DJ at a radio station. The station is very democratic–you only play songs that your listeners call in and request. In addition, every thousandth listener who calls in gets a prize! You want to keep track of the queue of songs you want to play, as well as enough information to give out prizes.

We could implement this with a custom data type like this:

@dataclass
class DJData:
  num_callers: int
  queue: list

We can implement a function to update our data and to figure out what we’re going to say to a listener:

def request(data: DJData, caller: str, song: str) -> str:
  data.queue.append(song)
  data.num_callers += 1
  if data.num_callers % 1000 == 0:
    return "Congrats, " + caller + "! You get a prize!"
  else:
    return "Cool, " + caller

So here we’ve got a datatype and a function that reads and modifies that datatype’s contents. We can see how it works:

> djdata = DJData(0, [])
> request(djdata, "Doug", "Bulls on Parade")
"Cool, Doug"

We could have written this slightly differently:

@dataclass
class DJData:
  num_callers: int
  queue: list

  def request(self, caller: str, song: str) -> str:
    self.queue.append(song)
    self.num_callers += 1
    if self.num_callers % 1000 == 0:
      return "Congrats, " + caller + "! You get a prize!"
    else:
      return "Cool, " + caller

We’ve put the request function inside the definition of DJData. We’ve also modified the method a bit: instead of taking a data argument, we’ve called the argument “self” and left off the type annotation. This function is now a method on the DJData class.

We’ll call it slightly differently, too:

> djdata = DJData(0, [])
> djdata.request("Doug", "Guerilla Radio")
"Cool, Doug"

We call methods by writing the name of an object (djdata, in this case), then a dot, then the method arguments--excluding self. Since we’re not passing self in, how does Python know which object to call the method on?

We’ll keep learning about classes, objects and methods. I want to emphasize, though, that you’ve seen this before. We’ve called methods on lists–for instance, l.sort(). What we’re seeing now is how to add methods to our custom objects!