Class summary:   Working with Hashtables
1 What are hashtables and when do we use them?
2 Hashtables in code
2.1 Creating from Scratch
2.2 Looking up Values
2.3 Updating Values
2.4 Adding New Keys
2.5 Removing Keys
2.6 Checking whether a key is in the table
2.7 Iterating over key-value pairs
2.8 Hashtables with more complicated values
2.9 Wrapup
3 Converting code from lists to hashtables
4 The global annotation

Class summary: Working with Hashtables

Copyright (c) 2017 Kathi Fisler

Last lecture, we hinted that we could reduce the time to find a value in a collection to constant time (instead of linear search through a list), by leveraging knowledge about where data live in memory. The data structure that does this is called a hashtable. We gave the intuition about how this works, but skimmed over details about how this works in code, and how it works under the hood.

Today, we study hashtables at the level of code. Next lecture, we’ll look at how they work under the hood.

Hashtables are called dictionaries in Python. To avoid terminology confusion with our use of the term dictionary (for mapping names to memory locations and values), I’ll continue to refer to them as hashtables in notes.

1 What are hashtables and when do we use them?

A hashtable is a data structure designed for looking up values as indexed by some notion of a key. Imagine that you had a two-column table: the first column holds keys, and the second column holds the value associated with each key. For example, here’s a table that has Brown classrooms as keys, and their seating capacities as values.

Key        Value

--------------------

CIT368     63

BERT130    200

FRDMN101    48

This is the essential form of data that we can turn into a hashtable: keys and values, with one value per key. In many languages, all of the values need to be of the same type (which can be anything). Python is more forgiving, and would allow different types to be stored under different keys. The keys need to be of a common type, however – that will be part of how hashtables work under the hood.

2 Hashtables in code

2.1 Creating from Scratch

Let’s turn our two-column table from above into a proper hashtable. Here’s the general notation for creating a hashtable by hand in Python:

  {key: value,

   key: value,

   key: value,

   ...

  }

The curly brackets say "hashtable". After that, we are writing down pairs of keys and their corresponding values.

So our table above would be written as follows:

  # a hashtable mapping classrooms to numbers of seats

  room_capacity = {"CIT368": 63,

                   "BERT130": 200,

                   "FRDMN101": 48

                   }

The type for hashtables in Python is dict (short for dictionary)

2.2 Looking up Values

The following notation looks up the value for key in hashtable:

  hashtable[key]

So, to get the number of seats in CIT368, we’d write:

  room_capacity["CIT368"]

If the key you request isn’t in the hashtable, you’ll get an error.

2.3 Updating Values

What if we want to update the value associated with a key? We use an assignment statement as follows:

  hashtable[key] = new_value

So, to change the number of seats in CIT368 to 70, we’d write

  room_capacity["CIT368"] = 70

2.4 Adding New Keys

To add a new key (and its value) to a hashtable, we use the same notation as for updating a value, just with an unused key:

  room_capacity["CIT506"] = 25

2.5 Removing Keys

To remove a key (and its value) from the hashtable, we write

  del hashtable[key]

So, we can take "CIT506" out of the hashtable using

  del room_capacity["CIT506"]

If the key isn’t in the hashtable, you’ll get an error.

2.6 Checking whether a key is in the table

To ask whether a particular key is in the hashtable we reuse the in notation that we have seen elsewhere in Python:

  "Sayles" in room_capacity

As elsewhere, this returns a Boolean.

2.7 Iterating over key-value pairs

What if we wanted to find all of the rooms with at least 50 seats? How do we search all the pairs, building up a list of keys whose value meet a criteria?

We use a for-loop:

  my_rooms = []

  

  # the room variable takes on each key in the hashtable

  for room in room_capacity:

      if room_capacity[room] > 50:

          my_rooms.append(room)

Note here that this is linear. Hashtables reduce lookup of values based on keys to constant, but some computations still require looking at all the "rows" of the table (as it were).

2.8 Hashtables with more complicated values

If we’re going to frequently look for rooms with certain capacities, we might want to organize our data differently, so that a number range maps to a list of room names. Here’s our original table written in that fashion:

  # hashtable mapping seat ranges to classrooms

  capacities = {"45-65": ["CIT368", "FRDMN101"],

                "65-100: [],"

                "150-200": ["BERT130"]}

Remember that a hashtable can only have one value for each key. So when we have multiple values for the same key, we have to put them in some datastructure that we in turn store in the hashtable. Here, we put a list of room names under each range key.

Wouldn’t it be nice to leave the ranges as numbers instead of strings? For example, don’t we want a table like:

Low  High  Rooms

45   65    "CIT368", "FRDMN101"

65   100

150  200   "BERT130"

This doesn’t have the form of a hashtable that we showed at the start of the lecture. There can be only one key column. Our version creates a key from the first two columns by making a string out of them. We could have also made a dataclass for this. We’ll see more about using dataclasses as keys in the next lecture.

2.9 Wrapup

This is pretty much what you need to know to program with hashtables. There’s one last detail around using classes as keys, which we will cover next time. And of course, we still need to show how all of this works under the hood.

3 Converting code from lists to hashtables

We practiced these operations by trying to convert a version of our banking code that maintained a list of accounts to one that maintained a hashtable of accounts. This code file shows both the original list version and the hashtable version.

4 The global annotation

We spent a bit of time understanding the global annotation and how it works. To keep the notes cleaner, we’ll provide those notes in a separate file.