SAT Solver

This lab will take you through implementing a basic DPLL SAT-solver in
the language of your choice (though we provide basic input/output code snippets in Python).

As a refresher--the motivation for a SAT solver is to determine if, for a given boolean formula, there exists an assignment of true and false to the variables such that the entire formula evaluates to true. Algorithms that solve the boolean satisfiability problem are employed by Alloy under-the-hood to determine the satisfiability of higher-level constraint models. In general, efficient SAT-solving is an extremely important and widely-used problem in computer science.

Some definitions from the class:

A literal is a positive or negative occurrence of a Boolean variable, like x1 (a positive literal) or !x2 (a negative literal).
A clause is a set of literals, joined together by "or" (disjunction of literals), like (x1 or !x3 or x4). We may call each literal here a disjunct.

DPLL expects as input a formula in conjunctive normal form (CNF)[^1], i.e., a set of clauses that are joined together with "and" (conjunction of clauses). This means that to satisfy the input clause set, every clause must be satisfied.
A unit clause is a clause with only one literal. Likewise, we may call each clause here a conjunct.
An empty clause is a clause with no literals. That is, it is a disjunction of empty set. This is the same as falsehood or a contradiction, denoted by □.
A literal is pure if, whenever it appears in the clause set, it always has the same sign.
A set of unit clauses is consistent if no two have different signs for the same variable.

[^1]: There is a way to convert an arbitrary formula to CNF. That's why dealing with CNF is enough.

Internal Representation

When writing a solver, it's convenient to refer to variables as positive integers. That is, you can call the first variable 1, the tenth variable 10, and so on. We'll use this convention here.

For this handout, +/-x denotes a literal (a variable with either positive or negative sign). -/+x is therefore the same variable with its sign flipped.

To begin this lab, first decide which language you'd like to use. Then, think about how to construct useful internal representations for literals (which are composed of an integer and a sign), clauses (a set of literals where order is unimportant), and formulas (an unordered a set of clauses).

Take a moment to discuss your internal representation with a TA or another student, then write the necessary code (class declaration, data definition, etc.) Construct a few example formulas to ensure that your representation works as expected.

Input Specification

Your program should take in a formula from standard input, where a formula is a list of clauses separated by newlines and each clause is a list of literals separated by spaces. Literals should be represented as non-zero numbers where a positive number denotes that the variable must be true and a negative number denotes it must be false. There is an implicit "or" between each literal and an implicit "and" between clauses.

For example:

-1 2
2 3

represents an input formula with two clauses. The first says that either variable 1 is false or variable 2 is true; the second says that either variable 2 or variable 3 is true.

You may assume that the input is well-formed and that your program will not be given any literal that wouldn't fit in a 32-bit signed integer.

Below, we've provided the scaffolding for one possible implementation of reading input in Python. Feel free to use it directly or implement your own.

#!/usr/bin/python3
import sys

def read_input():
	clause_strs = sys.stdin.read().split('\n')
	for s in clause_strs:
		literals = s.split()
		for lit in literals:
			sign = (lit[0] != '-')
			variable = lit.strip('-')
			<add to internal representation>
	return <internal representation>

Note: if you run your program, you have to manually press ctrl+d after entering data to signal the end of the input. There's no need to worry about this when running via the testing script (see below) because the script automatically signals the end of input.

Unit Propagation

The DPLL algorithm works by choosing an assignment of true or false for a variable, simplifying the formula based on that choice, then recursively checking the satisfiability of the simplified formula.

First, we'll implement unit propagation, which is one aspect of this formula simplification. If a clause is a unit clause (one with only one variable), then the only way for the formula to be satisfiable is for that variable to be assigned consistently with its sign in that clause. Thus, we know that variable must take on that assignment, which tells us that all other clauses with that literal are now satisfied. In addition, we know that clauses with the literal's negation will not be satisfied by that component, so we can remove the negation from all clauses.

The pseudocode for propagate-units is included below. First, write out a few test cases with your internal representation, and discuss them with a TA or another student. Then, implement and test this function.

propagate-units(F):
	for each unit clause {+/-x} in F
		remove all non-unit clauses containing +/-x
		remove all instances of -/+x in every clause // flipped sign!

Pure Literal Elimination

The second aspect of formula simplification is eliminating pure literals (ones that occur with only one polarity throughout the formula). If a literal is pure, then each clause it appears in can be satisfied by assigning the variable consistent to the sign it appears with. Thus, without changing the satisfiability of the formula, we can add the pure literal as a unit clause and remove all other clauses is occurs in from the formula.

The pseudocode for pure-elim is included below. Again, write out some test cases and discuss them with a TA or another student. Then, implement and test this function.

pure-elim(F):
	for each variable x
		if +/-x is pure in F
			remove all clauses containing +/-x
			add a unit clause {+/-x}

Recursive Backtracking

After the DPLL employs these two simplification steps, it must pick some variable to branch on. The satisfiability problem is then split into two sub-problems: whether the formula is satisfiable with the chosen variable assigned as either true or false. Essentially, the algorithm "guesses" a variable to be true, recursively determines if that subproblem is satisfiable; if it is not, the algorithm then "guesses" the variable to be false and tries again.

You're free to choose any reasonable method of picking variables to branch on. You might randomly select a variable, pick the variable that occurs the most in the formula, or even just take the alphabetically-next variable. By "reasonable" we mean a method that leads to correct results.

Note that if you do not include "involves all VARS" in the check below, DPLL will return a partial instance that potentially omits values for some variables that were removed in simplification steps. Since we're changing the formula recursively, we need to remember the original set of variables (VARS).

solve(VARS, F):
	propagate-units(F)
	pure-elim(F)
	if F contains the empty clause, return the empty clause // call this "unsat" in output
	if F is a consistent set of unit clauses that involves all VARS, return F
	x := pick-a-variable(F) // do anything reasonable here
	if solve(VARS, F + {x}) isn't the empty clause, return solve(F + {x}) // works to have +x
	else return solve(VARS, F + {-x}) // check -x

OUTPUT specification:

If a formula is unsatisfiable, your program should output the string "unsat", otherwise it should output a solution in the form

list of true literals
list of false literals

where each literal is separated by a space and there is a newline between lists.

For example:

2 3
-1

gives an instance for the previous example input. Variables 2 and 3 are true, and variable 1 is false.

Note that an empty set of clauses is equivalent to "true". Since this input involves no variables, your program should produce an empty set of unit clauses (not "unsat!"). An empty clause is equivalent to "false" and should result in "unsat".

Unlike in assignments, you may print to standard error (to make debugging with the oracle easier).

Below, we're provided the scaffolding for one possible implementation of printing output in python. Feel free to use it directly or implement your own.

def format_output(F):
	true_l = []
	false_l = []
	literals = <dictionary of variable integer to sign boolean>

	for var, sign in literals.items():
		if sign:
			true_l.append(var)
		else:
			false_l.append('-'+var)
	print(' '.join(true_l))
	print(' '.join(false_l))

Please include a new line at the end of your output (note that print automatically does this, so if you follow the template, everything should be okay).

Checkoff

When you're ready to test your solver, you can run cs1950y_sat <path-to-your-sat-executable>. This script will provide feedback as to whether your program is correct or not.

If you need to, you can refer to the instructions in the first Oracle assignment (topsort) for how to make your file executable. You can run this check script as often as you like. The full run should take less than 15 seconds (but if your program is incorrect, the grader will take only a few seconds). When you pass all the tests, you can call a TA to check you off for the lab.