CS1950Y Lecture 18: Model Checking
Date


Locking

We want to set up a protocol for protecting shared resources. It’s a similar idea to people raising their hands to talk, but for a computer. What can we do? We want the multiple processors contending for a resource to “raise their hands” in some sense.

For now, we’ll assume we have two processes and both are constantly trying to get at this resource.

Here’s the algorithm in a simplified form:

while (true) {
    // This is like raising your hand
    flags(self) = true;

    // Wait until no one else is interested
    wait until flags(other) = false;

    // The "critical section" is the code that needs to have exclusive access to the shared resource
    enter critical section and exit it

    // Signal that you're done
    flags(self) = false;
}

What if the other process stays in the critical section forever? This protocol breaks down if there’s a bad actor who never gives up the resources.

This protocol does have a flaw, which we’ll use TLA+ to find. It ends up being a liveness issue, so it’d be difficult to find with Alloy.

We’re going to have two variables, pcs and flags. pcs is short for “program counters,” and it’s a common technique in tools like this for keeping track of where each process is in the program. There’ll be 3 states for the program counters: waiting, checking, and cs. From waiting to checking, the process sets its flag to true. In the checking state, the process checks for the other process to be done. Then, cs is when the processor is in the critical section. To transition back to waiting, it sets its flag to false.

What’s our initial state? Both processes should start in the waiting state with their flags set to false:

Init == /\ pcs = [x \in {1,2} |-> "waiting"]
        /\ flags = [x \in {1,2} |-> ]

pcs is a function from processor ID (1 or 2) to state, so here we set it to map 1 to waiting and 2 to waiting. As a syntactic note, we have to name x here because, in the general case, we might actually want to use x when defining the value of the function. If you’ve taken CS22 or other classes using LaTeX, you might notice some similar syntax, like \in for set membership. A lot of the syntax for TLA+ is inspired by LaTeX, and they were both initially developed by the same person.

Just like pcs, flags is a function whose domain is the set of processor IDs {1, 2}. The codomain will be true or false, because those are the values the flag can have. We’ll use 0 and 1 for false and true, but we could also use TRUE and FALSE if we wanted.

TLA+ does support relations, so we could represent pcs and flags as sets of tuples. However, then we have to do extra work to make sure these relations keep functional properties, and functions also have some convenient syntax.

Now, we’re going to define a helper for figuring out what the other process is, given some process:

OtherProcess(p) == IF p = 1 THEN 2 ELSE 1

Now, we can define some actions. What are some things our processes can do?

If a process is in the waiting state, then it can become interested. This means that it has to set its waiting flag and transition to the checking state. That means that we have to update our two variables:

BecomeInterested(p) == /\ pcs[p] = "waiting"
                       /\ pcs' = [pcs EXCEPT ![p] = "checking"]
                       /\ flags' = [flags EXCEPT ![p] = 1]

This EXCEPT form looks a bit weird at first, but it’s very useful. Essentially, it builds a new function that’s the same as pcs except that the mapping for p is changed to checking.

A process can also check the flags and notice that the other process isn’t using the resource. Then, it can enter the critical section.

CheckFlagAndEnter(p) == /\ pcs[p] = "checking"
                        /\ flags[OtherProcess(p)] = 0
                        /\ pcs' = [pcs EXCEPT ![p] = "cs]
                        /\ UNCHANGED flags

One thing to keep in mind is that TLA+ will require that you provide complete frame conditions. It needs to know what the next state should look like, so if you don’t tell it what the prime of some variable should be, it raises an error. We can use UNCHANGED for this, which is equivalent to flags' = flags.

Now, a process in the critical section can leave the critical section.

FinishUp(p) == /\ pcs[p] = "cs"
               /\ flags' = [flags EXCEPT ![p] = 0]
               /\ pcs' = [pcs EXCEPT ![p] = "waiting"]

Now that we have these 3 actions, how can we define Next? The whole system moves forward if any process takes some action. In Alloy, we would express this by saying there is some p : Process such that it takes an action. We do the same thing here:

Next == \E p \in {1,2} : BecomeInterested(p)
                         \/ CheckFlagAndEnter(p)
                         \/ FinishUp(p)

Like \in, \E is derived from the LaTeX notation for “there exists”. This says that there is some p in {1, 2} such that one of our transition actions holds for it.

In this model, only one process can take a step at each turn. Our framing conditions all say that nothing changes except for the transitions for that particular process, so they implicitly prevent multiple processes from acting in the same Next transition.

Now, we have to define a Spec formula. We’ll get into what it means next week, but it’s kind of like a trace fact in Alloy.

vars = <<pcs, flags>>

Spec == /\ Init
        /\ [][Next]_vars
        /\ WF_vars(Next)

Defining vars is a common element of good TLA+ style. We could also say things like WF_<<pcs, flags>>(Next), but it’s nice to have a variables tuple so we don’t have to repeat ourselves. We want weak fairness (WF) to make sure both processes get to act.

What are some properties we can check? One important safety property is mutual exclusion - there should only ever be one process in the critical section at any given time.

MutualExclusion = [] (\A p \in {1,2} : pcs[p] = "cs" => pcs[OtherProcess(p)] /= "cs")

The [] means that we want this to hold globally (this will also be covered more next week). We say that, for all processes (\A p \in {1,2}), that process being in the critical section (pcs[p] = "cs"), then the other process is not (/=) in the critical section.

We also have a non-starvation, or progress, property. This states that every process that wants to enter the critical section eventually will.

NonStarvation == \A p \in {1,2} : []<>(pcs[p] = "cs")

The []<> is “always eventually”, so this property is true if for every process, at every time, there is eventually a time where the process is in the critical section. We’ll cover this sort of temporal logic next week. It’s applied over all possible traces - the [] lets us specify that something holds for every state, and the <> expresses that something holds at some future state.

There are some implicit bounds on this spec. One is that all of our functions have finite domains, so we have a finite number of states. Our transitions are also finite, so we have a finite transition graph. The fact that we have a finite space to explore is what makes this problem decidable.

To check for a deadlock, we need to use the model checker component of the toolbox. When we create a new model running Spec and run it, we get a deadlock with a 3-state trace. In the first state, both flags are 0 and both processes are waiting. In the second, the second process is checking and has set its flag to 1. Non-deterministically, this process got to go first in this instance. Next, process 1 moves into the checking state as well and sets its flag. Now, both processes are in checking with their flags raised - there’s no way to progress, since both will wait forever for the other to put their flag down.

On Monday, we’ll get into how to fix this!