Handling errors

Right now, here’s how we’re defining correctness for our compiler:

For all programs p, if the interpreter produces a value when run on p, the compiler produces machine code that produces that same value.

But the interpreter doesn’t produce a value for every program! On (add1 false), for instance, the interpreter throws an exception.

For these programs, we’re currently making no claims about our compiler’s behavior. Maybe it will return an error of some kind–for instance, on (add1 false) we get an error from the runtime because it doesn’t know how to print the value. On totally invalid programs like (hello hello) our compiler will raise the same error as our interpreter–we don’t know how to compile programs like that.

But on some of these programs, our compiler will actually produce a value (or really, produce a machine-code program that produces a value). (add1 (sub1 false)), for instance, produces false in the compiler even though the interpreter doesn’t recognize it as a valid program.

Today, we’ll fix this issue, modifying our compiler to handle these errors.

Modifying the runtime

First, we’ll add an error-handling function to the runtime. We’ll call this function from our compiled programs when an error occurs.

Listing 1: runtime.c

void error() {
  printf("ERROR");
  exit(1);
}

As usual, we’ll need to recompile the runtime:

gcc -c runtime.c -o runtime.o

Modifying the compiler

First, we’ll need to modify our compiler’s output so that we can call our new error function:

Listing 2: compile.ml

let compile (program : s_exp) : string =
  [Global "entry"; Extern "error"; Label "entry"]
  @ compile_exp Symtab.empty (-8) program
  @ [Ret]
  |> List.map string_of_directive
  |> String.concat "\n"

That Extern "error" directive is sort of the inverse of Global: it tells the assembler that our program will be linked against a program that includes a definition for the error label.

We’ll jump to this label whenever we want to signal an error at runtime. For instance, add1 should raise an error if its argument isn’t a number:

Listing 3: compile.ml

let rec compile_exp (tab : int symtab) (stack_index : int) (exp : s_exp) :
    directive list =
  match exp with
  (* some cases elided ... *)
  | Lst [Sym "add1"; arg] ->
      compile_exp tab stack_index arg
      @ [ Mov (Reg R8, op)
        ; And (Reg R8, Imm num_mask)
        ; Cmp (Reg R8, Imm num_tag)
        ; Jnz "error" ]
      @ [Add (Reg Rax, operand_of_num 1)]

We raise an error by jumping to our error function. In general calling C functions will be more complex than this since we want to preserve our heap pointer and values on our stack, but since the error function stops execution we don’t need to worry about any of that.

We can extract these directives into a helper function:

Listing 4: compile.ml

let ensure_num (op : operand) : directive list =
  [ Mov (Reg R8, op)
  ; And (Reg R8, Imm num_mask)
  ; Cmp (Reg R8, Imm num_tag)
  ; Jnz "error" ]

(We should only call ensure_num when we’re not using the value in r8!)

We can use this to add error handling to functions that should take numbers:

Listing 5: compile.ml

let rec compile_exp (tab : int symtab) (stack_index : int) (exp : s_exp) :
    directive list =
  match exp with
  (* some cases elided ... *)
  | Lst [Sym "add1"; arg] ->
      compile_exp tab stack_index arg
      @ ensure_num (Reg Rax)
      @ [Add (Reg Rax, operand_of_num 1)]
  | Lst [Sym "+"; e1; e2] ->
      compile_exp tab stack_index e1
      @ ensure_num (Reg Rax)
      @ [Mov (stack_address stack_index, Reg Rax)]
      @ compile_exp tab (stack_index - 8) e2
      @ (ensure_num (Reg Rax)
      @ [Mov (Reg R8, stack_address stack_index)]
      @ [Add (Reg Rax, Reg R8)]

and so on. We can write a similar function for pairs:

Listing 6: compile.ml

let ensure_pair (op : operand) : directive list =
  [ Mov (Reg R8, op)
  ; And (Reg R8, Imm heap_mask)
  ; Cmp (Reg R8, Imm pair_tag)
  ; Jnz "error" ]

Compiler correctness revisited

We can now make a stronger statement about compiler correctness:

For all programs p, if the interpreter produces a value when run on p, the compiler produces machine code that produces that same value. If the interpreter produces an error, the compiler will either produce an error or produce a program that produces an error.

We can add support for erroring programs to our tester:

Listing 7: interp.ml

let interp_err (program : string) : string =
  try interp program with BadExpression _ -> "ERROR"

Listing 8: compile.ml

let compile_and_run_err (program : string) : string =
  try compile_and_run program with BadExpression _ -> "ERROR"

let difftest (examples : string list) =
  let results =
    List.map (fun ex -> (compile_and_run_err ex, Interp.interp_err ex)) examples
  in
  List.for_all (fun (r1, r2) -> r1 = r2) results

We have one lingering problem: there are some programs that produce an error in our compiler but not in our interpreter. An example is (if true 1 (hello hello)). Since the interpreter never evaluates (hello hello), it happily produces the value 1. The compiler, however, will throw an error at compile-time. We could fix this by adding a check to the interpreter to ensure that the programs it’s trying to interpret are well-formed (i.e., don’t contain expressions like (hello hello)) even if they aren’t type-correct.