CS1950Y Lecture 22: SMT
April 4, 2018


This class, we'll continue our discussion of so-called Satisfiability Modulo Theories solvers, or SMT solvers. In essence, these tools are like SAT solvers with a little extra interpretive power on top; they have theories of interpretation for symbols like < and ==.

So what does such a theory of interpretation look like? Imagine we have a binary relation == in Alloy, and we want equals to behave like true equality. What constraints must we write to achieve this behavior?

For starters, we'll want our new equality to be reflexive: it should relate everything to itself. We can encode this (in somewhat sketchy notation) as

all x | x == x.

We also need our equality to be symmetric. We can encode this as

all x,y | x == y implies y == x.

Finally, equality needs to be transitive. We can encode this as

all x,y,z | x == y and y == z implies x == z.

Are the three constraints above--just reflexivity, symmetry and transitivity--sufficient to capture the notion of equality? It turns out that if there are other relations or properties in our universe, they alone aren't enough. Consider a simple universe containing the nodes 0, 1, and 2. Let there be a relation edge which contains only 1 tuple, (0, 1). This situation is outlined pictorally below:

0 1 2 edge

We can construct a possible value of the == relation which conforms to our three axioms in this universe, but which seems counter to the idea of equality as we understand it:

0 1 2 edge == == == ==

The relation ==, as drawn above, is reflexive, symmetric and transitive, but it states that 1 == 2 and 2 == 1. This is a problem because 1 and 2 are distinct: 1 has an incoming edge that 2 does not have! To fix this problem, we need to modify our definition of equality to handle the edge relation.

To do so, we add:

all x1,x2,y1,y2 | x1 == x2 and y1 == y2 implies (edge(x1, y1) iff edge(x2, y2))

That is, if x1 and x2 are equal, and y1 and y2 are equal, we require that there be an edge from x1 to y1 if and only if there is an edge from x2 to y2. This new constraint prevents the incorrect interpretation above.

Ultimately, we'll need to add new formulas like the one above into our theory for every new relation or sig introduced into the universe.

Now, let's get back to SMT solvers. We can divide these solvers into two broad categories: eager and lazy.

An eager solver immediately converts a specifiction into a boolean formula in CNF and feeds this new formula to the underlying SAT solver. The Boolean formula that it generates actually contains all the information about how it interprets symbols in the specification. Given the specification x < 7, a eager solver would proceed like this: first, it would convert x into binary form, say as an 8-bit integer. It would assign a boolean variable to each digit of x, so for all n in [0, 7], vn is the value of the nth bit of x, taking the 0th bit to be the most significant one. If any of n0 through n4 is true, we know that x is greater than 7, so our boolean formula will include the negations of n0 through n4. We also know that n5, n6 and n7 cannot be simultaneously true, because that would mean that x was exactly equal to 7. So we also need a clause of the form NOT n5 OR NOT n6 OR NOT n7. Our eager solver would pass this set of clauses down to its SAT solver, then convert the output back to the number x. Notably, since the formula passed to the SAT solver had information about what x < 7 really meant, the truth assignment it returns is an answer already (barring the conversion back from binary).

Rather than immediately converting high-level input to boolean logic and passing that down to a SAT solver, a lazy SMT solver uses its own tools for handling and interpreting the symbols in the specification outside of the SAT solver, both before invoking SAT, and when converting the SAT output back to a solution. Let's work through an example. Consider the formula below:

(y ≤ 5 OR x > y) AND (y ≥ 6 OR x > 5 OR x > 10)

Rather than decompose the each of the inequalities contained in the formula above into sets of clauses like our eager solver did above, our lazy solver just assigns a variable to each of the inequalities, yielding:

(v0 OR v1) AND (v2 OR v3 OR v4)

Using its own inequality-solving capabilities, our lazy solver recognizes that y ≤ 5, or v0, is actually the negation of y ≥ 6, or v2. This means it can reduce the number of variables in the formula above by 1, turning it into:

(v0 OR v1) AND (NOT v0 OR v3 OR v4)

Then, our lazy solver passes this formula into its SAT solver. However, since we didn't actually encode the meaning of the inequalities into the SAT version of the problem, the SAT solver might spit out a truth assignment which represents a set of inequalities that is impossible to satisfy. For example, our SAT solver here could generate a solution where v3 is false but v4 is true. This would mean that x was greater than 10 but not greater than 5--there exists no such x! In a case like this, our lazy solver rejects the truth assignment and asks the SAT solver for another truth assignment, until it finds one that is consistent and allows it to assign values to x and y. Unlike the eager solver, the lazy solver still has non-trivial work to do after recieving an answer from the SAT solver.