Class summary:   Arrays and Break
1 The internal structure of lists
1.1 Two Memory Layouts for Ordered Items
2 Back to the Runners
2.1 Printing instead of returning runners
3 The Role of Variables

Class summary: Arrays and Break

Copyright (c) 2017 Kathi Fisler

1 The internal structure of lists

Last lecture, we ended with a question about how fast one can access a specific element of a list. Specifically, if you have a list called finishers of Runners (our example from last time) and you write:

  finishers[9]

How long does it take to locate the Runner in 10th place (remember, indices start at 0)?

It depends on how the list is laid out in memory.

1.1 Two Memory Layouts for Ordered Items

When we say "list", we usually mean simply: a collection of items with order. How might a collection of ordered items be arranged in memory? Here are two examples, using a list of course names:

  courses = ["CS111", "ENGN90", "VISA100"]

In the first version, the elements are laid out in consecutive memory locations (this is rougly how we’ve shown lists up to now):

Prog Dictionary           Memory

--------------------------------------------------------------------

courses --> loc 1001      loc 1001 --> [loc 1002, loc1003, loc 1004]

                          loc 1002 --> "CS111"

                          loc 1003 --> "ENGN90"

                          loc 1004 --> "VISA100"

In the second version, each element is captured as a datatype containing the element and the next list location. When we were in Pyret, this datatype was called link.

Prog Dictionary           Memory

--------------------------------------------------------------------

courses --> loc 1001      loc 1001 --> link("CS111", loc 1002)

                          loc 1002 --> link("ENGN90", loc 1003)

                          loc 1003 --> link("VISA100", loc 1004)

                          loc 1004 --> empty

What are the tradeoffs between the two versions? In the first, we can access items by index in constant time, as we could for hashtables, but changing the contents (adding or deleting) requires moving things around in memory. In the second, the size of the collection can grow or shrink arbitrarily, but it takes time proportional to the index to look up a specific value. Each organization has its place in some programs.

In data structures terms, the first organization is called an array. The second is called a linked list. Pyret implements linked lists, with arrays being a separate data type (with a different notation from lists). Python implements lists as arrays. When you approach a new programming language, you need to look up whether its lists are linked lists or arrays if you care about the run-time performance of the underlying operations.

2 Back to the Runners

Going back to our Runners discussion from last class, we can simply use Python lists (arrays) rather than a hashtable, and be able to access the names of Runners who finished in particular positions. But let’s instead ask a different question.

How would we report the top finishers in each age category? In particular, we want to write a function such as the following:

  def top_5_range(runners: list, lo: int, high: int) -> list:

      """get list of top 5 finishers with ages in

         the range given by lo to high, inclusive

      """

Think about how you would write this code.

Here’s our solution:

  def top_5_range(runners: list, lo: int, high: int) -> list:

      """get list of top 5 finishers with ages in

         the range given by lo to high, inclusive

      """

  

      # count of runners seen who are in age range

      in_range: int = 0

      # the list of finishers

      result: list = []

  

      for r in runners:

          if lo <= r.age and r.age <= high:

              in_range += 1

              result.append(r)

          if in_range == 5:

              return result

      print("Fewer than five in category")

      return result

Here, rather than return only when we get to the end of the list, we want to return once we have five runners in the list. So we set up an additional variable (in_range) to help us track progress of the computation. Once we have gotten to 5 runners, we return the list. If we never get to 5 runners, we print a warning to the user then return the results that we do have.

Couldn’t we have just looked at the length of the list, rather than maintain the in_range variable? Yes, we could have, though this version sets up a contrast to our next example.

2.1 Printing instead of returning runners

What if instead we just wanted to print out the top 5 finishers, rather than gather a list? While in general it is usually better to separate computing and displaying data, in practice we do sometimes merge them, or do other operations (like write some data to file) which won’t return anything. How do we modify the code to print the names rather than build up a list of the runners?

The challenge here is how to stop the computation. When we are building up a list, we stop a computation using return. But if our code isn’t returning, or otherwise needs to stop a loop before it reaches the end of the data, what do we do?

We use a command called break, which says to terminate the loop and continue the rest of the computation. Here, the break is in place of the inner return statement:

  def print_top_5_range(runners: list, lo: int, high: int):

      """print top 5 finishers with ages in

         the range given by lo to high, inclusive

      """

  

      # count of runners seen who are in age range

      in_range: int = 0

  

      for r in runners:

          if lo <= r.age and r.age <= high:

              in_range += 1

              print(r.name)

          if in_range == 5:

              break

      print("End of results")

  

If Python reaches the break statement, it terminates the for loop and goes to the next statement, which is the print at the end of the function.

3 The Role of Variables

This example also gives us a chance to review when to use variables in programming. If we look at these examples, we have two (or three) variables:

These variables illustrate two different roles that variables and updates to them play in programs:

The variables we have created to hold hashtables, lists like finishers, the collections of bank accounts, and the next_id for a new account are in the first category: we have multiple operations (or multiple calls to the same operation) that need to update and/or lookup in a common piece of data. We use a variable to name that data so all functions can access it over time.

Variables like in_range and result in our top_5_range program(s) are of the second kind. They aren’t being maintained over time or across function calls, but they are tracking progress of a computation so we can either tell when we are done or build up a local answer to return after a computation finishes.

If you see the need for either of these patterns – shared data or the need to track information to finish a computation – you’ll create a variable to manage it.

This leaves a question to ponder – why did we not need variables in Pyret? Can we write programs like top_5_range in Pyret? What does the tracking of progress there?