More booleans, conditionals
Boolean support in the runtime
While we’re editing the runtime, let’s also add support for booleans.
#include <stdio.h> #include <inttypes.h> #define num_shift 2 #define num_mask 0b11 #define num_tag 0b00 #define bool_shift 7 #define bool_mask 0b1111111 #define bool_tag 0b0011111 extern uint64_t entry(); void print_value(uint64_t value) { if ((value & num_mask) == num_tag) { int64_t ivalue = (int64_t)value; printf("%" PRIi64, ivalue >> num_shift); } else if ((value & bool_mask) == bool_tag) { if (value >> bool_shift) { printf("true"); } else { printf("false"); } } else { printf("BAD VALUE %" PRIu64, value); } } int main(int argc, char **argv) { print_value(entry()); return 0; }
We’ll need to recompile the runtime:
$ gcc -c runtime.c -o runtime.o
Boolean support in the compiler
We can now add support for true
and false
pretty easily:
let bool_shift = 7 let bool_mask = 0b1111111 let bool_tag = 0b0011111 let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Sym "true" -> [Mov (Reg Rax, Imm ((1 lsl bool_shift) lor bool_tag))] | Sym "false" -> [Mov (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag))]
Handling our other operations will be a little trickier. Let’s start with
not
. As a reminder, not
should evaluate to true
(i.e., should put the
runtime representation of true
into rax
!) when its argument is false
;
otherwise, it should evaluate to false
.
It seems like we need a way to compare the runtime representations of
values. For this, we’ll use the x86-64 instruction cmp
. cmp X,Y
compares X
to Y
. It then sets processor flags based on the result. There are a bunch of
flags, and we’ll talk about more of them later in the class; for now, we just
need to know that cmp
sets the flag ZF
to 1 if its arguments are the same
and 0
otherwise.
Flags aren’t like registers–we don’t access them directly in assembly
code1. These flags then modify the behavior of subsequent
instructions. We’ll see more examples of this next lecture when we talk about
conditionals. For now, we’re going to use another instruction, setz
, in order
to access ZF
. setz
takes a register2 and sets the last byte of that
register to 1 (i.e., 0b00000001
) if ZF
is set and 0 if ZF
is not set.
In pseudo-assembly, how we’re going to implement the not
operator:
not: cmp rax, 0b00011111 mov rax, 0 setz rax shl rax, 7 or rax, 0b0011111
So, now we can implement not
:
let bool_shift = 7 let bool_mask = 0b1111111 let bool_tag = 0b0011111 let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Sym "true" -> [Mov (Reg Rax, Imm ((1 lsl bool_shift) lor bool_tag))] | Sym "false" -> [Mov (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag))] | Lst [Sym "not"; arg] -> compile_exp arg @ [ Cmp (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag)) (* compare rax to false *) ; Mov (Reg Rax, Imm 0) (* zero out rax *) ; Setz (Reg Rax) (* 1 if ZF is set (meaning rax contained false), 0 otherwise *) ; Shl (Reg Rax, Imm bool_shift) (* rax << bool_shift *) ; Or (Reg Rax, Imm bool_tag) (* tag rax as a boolean: rax = rax | bool_tag *) ]
There’s some duplicate logic here. We’re going to make a helper function called
operand_of_bool
, which makes an instruction operand from a boolean using shift
and or:
let operand_of_bool (b : bool) : operand = Imm (((if b then 1 else 0) lsl bool_shift) lor bool_tag)
We can do the same thing for numbers:
let operand_of_num (x : int) : operand = Imm ((x lsl num_shift) lor num_tag)
(We include lor num_tag
here to be symmetric with operand_to_bool
, but
everything would work if we left it off–why?)
Lastly, we’re going to re-use the code to convert ZF
to a boolean:
let zf_to_bool: directive list = [Mov (Reg Rax, Imm 0) (* zero out rax *) ; Setz (Reg Rax) (* 1 if ZF is set, 0 otherwise *) ; Shl (Reg Rax, Imm bool_shift) (* rax << bool_shift *) ; Or (Reg Rax, Imm bool_tag) (* tag rax as a boolean: rax = rax | bool_tag *) ]
zf_to_bool
is a list, not a function. How is that possible? Won’t it depend on
the value we’re trying to convert? It does not! This is a list of instructions
that set rax
to the runtime representation of true
if ZF
is set and to the
runtime representation of false
otherwise.
Now we can implement zero?
easily:
let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Sym "true" -> [Mov (Reg Rax, operand_of_bool true)] | Sym "false" -> [Mov (Reg Rax, operand_of_bool false)] | Lst [Sym "not"; arg] -> compile_exp arg @ [ Cmp (Reg Rax, operand_of_bool false) ] @ zf_to_bool | Lst [Sym "zero?"; arg] -> compile_exp arg @ [ Cmp (Reg Rax, operand_of_num 0) ] @ zf_to_bool
Lastly, we can implement num?
. We can detect if a value is a number by looking
at the last two bits and seeing if they are both zero. We can do that like this:
let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Lst [Sym "num?"; arg] -> compile_exp arg @ [ And (Reg Rax, Imm num_mask); Cmp (Reg Rax, Imm num_tag) ] @ zf_to_bool
Conditionals
Now that we’ve implemented booleans, we can implement if
. Our if
form looks
like this:
(if <test> <then> <else>)
An if
expression evaluates to the then
expression if test
evaluates to a
“truthy” value, and evaluates to the else
expression otherwise. Remember that
in our language, all values other than false
are truthy!
What makes these conditional expressions different from operations we’ve seen
before is that we’ll need to evaluate different expressions depending on the
value of another expression. This is easy in the interpreter–we’ll just use
OCaml’s if
expression! In the compiler, we’ll rely on a feature of x86-64 that
we haven’t seen yet.
Conditionals in the interpreter
This part is pretty simple! We’ll just add a case to interp_exp
:
let rec interp_exp (exp : s_exp) : value = match exp with (* some cases elided... *) | Lst [Sym "if"; test_exp; then_exp; else_exp] -> if interp_exp test_exp = Boolean false then interp_exp else_exp else interp_exp then_exp
And that’s it! The one thing to note here is that we only evaluate one of the two expressions.
Conditionals in the compiler
x86-64 doesn’t have “if” built in, but it does have a standard way of implementing conditionals: with conditional jumps.
So far, all of the assembly code we’ve seen is straight-line code: we start
executing instructions at the entry
label, and keep going until we get to
ret
. We can write straight-line code in higher-level languages too (and this
code is generally pretty easy to compile to assembly). Higher-level languages
also have various constructs to execute code conditionally, or more than
once–things like conditionals and loops and functions.
x86-64 machine code, like most machine codes, really only has one way of writing
non straight-line code: jumps. A jump instruction lets us start executing from a
label elsewhere in our program. It’s what the runtime does to start executing
from our entry
label.
A conditional jump lets us jump to another label depending on the flags we
talked about above. We’ll be using jz <label>
, which jumps to a given label
if and only if the ZF
flag is set. So in order to compile (if test then
else)
, we’ll want something like:
; code for the test expression cmp rax, 0b00011111 ; compare to boolean false jz else ; code for the then expression jmp continue else: ; code for the else expression continue:
The “then” code is skipped when the test expression is false, because of the
jz
instruction. The “else” code is skipped whenever we evaluate the “then”
code, because of the jmp
instruction. Cool, right?
Our OCaml implementation follows that pseudocode:
let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Lst [Sym "if"; test_exp; then_exp; else_exp] -> compile_exp test_exp @ [Cmp (Reg Rax, operand_of_bool false); Jz "else"] @ compile_exp then_exp @ [Jmp "continue"] @ [Label "else"] @ compile_exp else_exp @ [Label "continue"]
There’s one big problem here. What if we have more than one if
expression?
Something like this:
(if (num? 4) (if (num? false) 1 2) 3)
Right now, our assembler is going to throw an error if we try to compile this program, something like:
program.s:17: error: label `_else' inconsistently redefined program.s:13: note: label `_else' originally defined here program.s:19: error: label `_continue' inconsistently redefined program.s:15: note: label `_continue' originally defined here
We’re using our label names more than once! That’s not going to work. We’ll need
to make sure that each if
expression has its own labels for else
and
continue
. We’ll use a function called gensym
3 in order to generate
unique label names. We can call gensym like this:
$ Util.gensym "else";; "else__0" $ Util.gensym "else";; "else__1" $ Util.gensym "continue";; "continue__2"
This function is very different from, say, our compile_exp
or interp_exp
functions: it returns a different output every time we call it! (Indeed, its
whole purpose is to return a different output every time we call it.) It’s
defined like this:
let gensym : string -> string = let counter = ref 0 in fun s -> let symbol = Printf.sprintf "%s__%d" s !counter in counter := !counter + 1 ; symbol
The counter
variable is what makes this function work. counter
is a
reference to an integer; it works sort of like a variable in a typical
imperative language like Java or Python. We can update its value with counter
:= <new value>
and read its value with !counter
. This little function is a
good example of idiomatic usage of references in OCaml: use references as little
as possible, and hide them in functions that do a specific thing. We can’t
update counter
from outside this function.
Using our gensym
function, we can complete our if
compiler:
let rec compile_exp (exp : s_exp) : directive list = match exp with (* some cases elided ... *) | Lst [Sym "if"; test_exp; then_exp; else_exp] -> let else_label = Util.gensym "else" in let continue_label = Util.gensym "continue" in compile_exp test_exp @ [Cmp (Reg Rax, operand_of_bool false); Jz else_label] @ compile_exp then_exp @ [Jmp continue_label] @ [Label else_label] @ compile_exp else_exp @ [Label continue_label]
Looking ahead
Today we introduced two new concepts in x86-64 machine code: flags and jumps. Next time we’ll implement binary operations, for which we’ll need one more concept: memory. After that, though, we really won’t be further complicating our model of how the processor executes. We’ll need a few more instructions here and there, but there won’t be any more big ideas at the assembly level. This will be a blessing and a curse: the way the processor executes is relatively simple and easy to understand, which means that compiling high-level language constructs like functions is pretty challenging! It’s going to be fun.
Footnotes:
Actually, all of the flags are packed together in the same special RFLAGS register
It actually just takes the lower
byte of a register, which are notated differently in assembly–for instance, the
lower byte of rax
is written al
. Our assembly library takes care of this, so
we won’t talk about it too much in class.
Short for “generated symbol”; the name comes from early LISP implementations