More program performance

Big-O notation

Last time, we talked about analyzing the performance of programs. When deciding whether a function would run in constant, linear, or quadratic time, we ignored the constants and just looked at the biggest term. We can define this somewhat more formally using big-O notation.

Let’s look again at the last function we considered last time:

def distinct(l: list) -> list:
    d = []
    for x in l:
        if x not in d:
    return d

We decided that this function’s running time is quadratic in its input (in the worst case). For a list of length \(n\), we calculated the number of operations as \(((n * (n - 1)) / 2) + n + 2\).

A computer scientist would say that on a list of \(n\) elements, this function runs in \(O(n^2)\) time. The formal definition looks like this:

If we have (mathematical, not Python!) functions \(f(x)\), then \(f(x)\) is in \(O(g(x))\) if and only if there are constants \(x_0\) and \(C\) such that for all \(x > x_0\), \(f(x) < C*g(x)\).

For the function above, our constants could be, say, \(x_0=4\) and \(C=4\).

In this class, we won’t expect you to rigorously prove that a function’s running time is in a particular big-O class. We will, though, use (for instance) \(O(n)\) as a shorthand for “linear.”

As a bit of practice, here are our rainfall solutions:

# version 1

def average_rainfall(sensor_input: lst) -> float:
  number_of_readings = 0
  total_rainfall = 0
  for reading in sensor_input:
    if reading == -999:
      return number_of_readings / total_rainfall
    elif reading >= 0:
      number_of_readings += 1
      total_rainfall += rainfall

# version 2

def list_before(l: list, item) -> list:
  result = []
  for element in l:
    if element == item:
      return result
  return result

def average_rainfall(sensor_input: lst) -> float:
  readings_in_period = list_before(sensor_input, -999)
  good_readings = [reading for reading in period if reading >= 0]
  return sum(good_readings) / len(good_readings)

What are the worst-case running times of each solution?

See the lecture capture for details; for inputs of size \(n\), both solutions run in \(O(n)\) time.