More program performance
Big-O notation
Last time, we talked about analyzing the performance of programs. When deciding whether a function would run in constant, linear, or quadratic time, we ignored the constants and just looked at the biggest term. We can define this somewhat more formally using big-O notation.
Let’s look again at the last function we considered last time:
def distinct(l: list) -> list: d = [] for x in l: if x not in d: d.append(x) return d
We decided that this function’s running time is quadratic in its input (in the worst case). For a list of length \(n\), we calculated the number of operations as \(((n * (n - 1)) / 2) + n + 2\).
A computer scientist would say that on a list of \(n\) elements, this function runs in \(O(n^2)\) time. The formal definition looks like this:
If we have (mathematical, not Python!) functions \(f(x)\), then \(f(x)\) is in \(O(g(x))\) if and only if there are constants \(x_0\) and \(C\) such that for all \(x > x_0\), \(f(x) < C*g(x)\).
For the function above, our constants could be, say, \(x_0=4\) and \(C=4\).
In this class, we won’t expect you to rigorously prove that a function’s running time is in a particular big-O class. We will, though, use (for instance) \(O(n)\) as a shorthand for “linear.”
As a bit of practice, here are our rainfall solutions:
# version 1 def average_rainfall(sensor_input: lst) -> float: number_of_readings = 0 total_rainfall = 0 for reading in sensor_input: if reading == -999: return number_of_readings / total_rainfall elif reading >= 0: number_of_readings += 1 total_rainfall += rainfall # version 2 def list_before(l: list, item) -> list: result = [] for element in l: if element == item: return result result.append(element) return result def average_rainfall(sensor_input: lst) -> float: readings_in_period = list_before(sensor_input, -999) good_readings = [reading for reading in period if reading >= 0] return sum(good_readings) / len(good_readings)
What are the worst-case running times of each solution?
See the lecture capture for details; for inputs of size \(n\), both solutions run in \(O(n)\) time.