Guest Lecture: Kathi Fisler

Today we're talking about how we manipulate and reason about formulas. We have propositional formulas (and, or, not, implies; no quantifiers).

Exercise: How would you think about setting up data structures and algorithms to represent and manipulate propositional formulas? Let's have an actual formula in mind that we might use for this: (a ∧ ~c ∧ d) ∨ (~a ∧ b) Suggestions:

Make a parse tree with nodes for logical operators and literals.
You could also just make a string of the formula (this is a representation)
Make a truth table containing all possible truth values

What are the tradeoffs of all of these representations? An important question that no one asked - what are we trying to do with the formulas? This is an incredibly important question to ask.

What operations do we want on formulas?

Is it satisfiable?
Get a model of the formula
Get all models of the formula
Do two formulas have the same set of models?
Is a particular assignment of values a model for the formula?

Note that where Kathi used the term 'model', we might say 'instance' when talking about Alloy.

This seems to bias us towards the truth tables. The string is not very useful. What about the parse tree? It provides us with the ability to manipulate the formulas. The parse tree is a syntactic representation. The truth table is a semantic representation.

Under the hood, Alloy uses a semantic representation. What are your opinions of the truth table?

Not very space efficient. The size is exponential to the number of propositions

We want a space-efficient semantic representation. What are some alternatives? We'll use something called a binary decision diagram on our original formula from above. Let's look at what happens if a is false, and what happens if it is true?

If a is false, then b must be true to satisfy the formula. If a is true, then we can ignore the right part of the formula. If c is true, then our formula is false. If c is true, then we have to check d. If d is true then the formula is true, otherwise false.

Here is the diagram for this decision tree:

Notice that this data structure takes a lot less space than the truth table, but encodes just as much information. A binary decision diagram captures the set of models of a formula. The paths through the tree that end at the 1 node are the models.

Question: Doesn't this capture the non-models as well in the paths that end at 0?
Answer: Yes it does.

Now, we made this diagram by hand, but there is an algorithm that can take a parse tree of a formula and produce a binary decision diagram. This construction takes polynomial time in the number of variables in our formula.

We could have also represented the truth table as a decision diagram in the tree, with an exponential size again. The big observation to be made is that subtrees contain duplicate nodes. Some subtrees result in the same truth value regardless of the path taken. BDDs do a couple things better:

share isomorphic (identical) subtrees
if a node has both children with identical subtrees, eliminate the node.

We started with a in the first BDD we drew out, but why did we start with a? What happens if we take the same formula, but use a different variable order (e.g. d, c, b, a)

So the choice of variable order matters to the size of the BDD. Does the order matter for anything else? If we'd like to check if two formulas have the same set of models, then it does. If we build BDDs for two formulas using the same order of variables, then they have the same set of models if and only if their BDDs are equal. Unfortunately, identifying optimal order of variables for a BDD is NP-hard (not computationally feasible in all cases).

This is a huge problem for circuit designers. They have and do struggle with circuit designs and being able to tell if two circuits satisfy the same formula. BDDs give them a way to compare realisticly sized formulas.

It is also important to note that BDDs are not a catch all solution, which is why we use multiple semantic representations. Syntactic representations still have their place in certain contexts.

Let's look at one more problem where BDDs have a large impact on computer science. Say we have a graph and want to compute depth first search (DFS) on the graph.

Take this graph:

A set of nodes in the graph is just a propositional formula (e.g. 00 ∨ 01). An edge can be represented as: (for a node 00 → 01) ~x ∧ ~y ∧ ~x' ∧ y' Takeaway: formulas are really powerful ways of representing other kinds of systems. If we have good data structures to represent these formulas, then solving the problems we have becomes computationally feasible.

This is a great start to learning about some of what's going on under the hood in alloy.