6 Type Inference
In this assignment, you will implement the type inference algorithm we studied in lecture on the (untyped) Paret language.
Chapter 15.3.2 of PLAI 2/e.
6.3 The Language
The Paret language for this assignment is nearly the same as the Typed Paret language from Type Checker
, except that the type annotations on lam
expressions have been removed: putting them back in is, after all, the job of type inference.
We have also reintroduced and and or expressions back into the language as syntactic sugar. We also have changed let from a base expression into syntactic sugar. Thus, you will need to implement a desugarer that converts and, or, and let expressions into functionally equivalent expressions.
The grammar of Untyped Paret is as follows:
<expr> ::= <num>
| (+ <expr> <expr>)
| (++ <expr> <expr>)
| (num= <expr> <expr>)
| (str= <expr> <expr>)
| (if <expr> <expr> <expr>)
| (<expr> <expr>)
| (first <expr>)
| (rest <expr>)
| (is-empty <expr>)
| (link <expr> <expr>)
| (lam <id> <expr>)
| (let (<id> <expr>) <expr>)
| (and <expr> <expr>)
| (or <expr> <expr>)
Refer to the support.rkt file for the definition of LExpr.
As usual, we have provided a function parse, which consumes an expression in Paret’s concrete syntax and returns the abstract syntax representation of that expression. However, the parser in this assignment returns an LExpr, which is exactly the same as an Expr, except that each LExpr has an additional label field and e-lam also has a param-label field. Labels are used to stand for program points in the type inference algorithm.
(define-type-alias Label Symbol)
Given the updated Paret language and the updated LExpr abstract syntax, you will implement two functions: desugar and type-of.
desugar :: LExpr -> LExpr
which consumes an abstract syntax tree (i.e. an LExpr, as returned by parse), replaces all instances of sugar-and, sugar-or, and sugar-let with desugared equivalents, and returns the result.
type-of :: LExpr -> Term
that consumes a Paret program in abstract syntax form. If the program is well-typed, type-of returns the type of that program (represented as a Term as defined below); otherwise, it raises an exception.
type-of should assume that it’s given an LExpr from desugar. Thus, type-of should assume that there are no sugar-ands, sugar-ors, or sugar-lets in the input Expr.
Once you have implemented desugar
, you should use the provided type-infer
function (analogous to type-check
from Type Checker
) to write test cases for your type checker in your testing file.
As discussed in lecture, Terms describe the type of a subexpression of a program and are used in the constraint generation process. The language of Terms is as follows:
(t-var [label : Label])
(t-con [head : Symbol]
[args : (Listof Term)]))
t-var denotes a term variable (also referred to as a type variable). A type variable (t-var ’g123) can be read as “the type of the expression (or program position) labeled ’g123”.
t-con denotes a term constructor, which builds up a more complicated term from a list of subterms. For example, (t-con ’-> (list (t-var ’g123) (t-var ’g123))) is a representation of the more familiar ’a -> ’a. Primitives are also represented this way. For example, (t-con ’Num empty) is a representation of the type Num.
For convenience, we provide helper functions to simulate the Type
language of Type Checker
in the language of Term
. For example, (t-num)
produces (t-con ’Num empty)
, and t-list
are all provided, and you can reference the support code for their definitions. However, these are helper functions
and not type variants
, so you will not be able to match against them in a type-case
6.5 Features to Implement
6.5.1 Desugaring with Labels
When producing desugared expressions, you may need to create fresh Symbols. For example, you may need to generate new Labels for non-sugar LExprs. To do this, you should use the built-in gensym function, which generates a unique, unused Symbol.
You may also use gensym if you need to create Terms with unknown types.
6.5.2 Label Environment
Your type inference algorithm should use a label environment (LEnv) to keep track of label identifiers that are in scope.
(define-type-alias LEnv (Hashof Symbol Label))
For convenience, we provide a get-label function that consumes a LExpr and retrieves its Label.
6.5.3 The Inference Algorithm
In the starter code, we provide templates for two optional helper functions: constraint-gen and unify.
constraint-gen :: LExpr -> (Setof Constraint)
unify :: (Setof Constraint) -> Substitution
The two phases of the inference algorithm discussed in lecture (and discussed below) map to these two helper functions. We suggest using these two function templates to guide your overall type-of implementation, but they are not required—we will not be testing these functions directly, and you are free to ignore them or change them as you wish.
18.104.22.168 Phase 1: Constraint Generation
Recall that Constraints relate program subexpressions by describing how they work together in a successful execution. The language of Constraints is:
(eq [l : Term]
[r : Term]))
Constraints equate two Terms, often a type variable to the type of some other expression. For example, a Constraint that would be generated for the LExpr (e-num ’g123 5) is:
(eq (t-var 'g123) (t-num))
22.214.171.124 Phase 2: Unification
The process used to solve constraints is known as unification. In unification, a unifier algorithm is given a set of equations (e.g. Constraints), and each Constraint maps a type variable (t-var) to a Term. Given a consistent set of Constraints, we try to unify them in such a way that every t-var gets assigned the correct type.
To do this, our unifier should generate a mapping from variables (Labels) to Terms that do not contain any variables. We define the Substitution alias to represent this mapping:
(define-type-alias Substitution (Hashof Label Term))
Unification should always terminate. Thus, your unifier should perform an “occurs” check that causes it to terminate given a circular Constraint. Specifically, given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with. (Make sure you can construct a test case for this!)
Make sure you understand why replace = with should pass the “occurs” check.
You may find the high-level description of unification here
helpful as well.
The full set of exceptions is as follows:
(ti-err-fails-occurs-check [replace : Term] [with : Term])
(ti-err-constructor-mismatch [t1 : Term] [t2 : Term])
(ti-err-unbound-id [name : Symbol]))
Given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with.
Given the Constraint (eq t1 t2), you should raise (ti-err-constructor-mismatch t1 t2) if t1 and t2 are t-cons with different head symbols.
Given an expression that references an unbound identifier, you should raise ti-err-unbound-id with the name of the identifier.
The priority of TypeInferenceErrors is unspecified—when multiple TypeInferenceErrors could be raised in a program, your program just needs to raise one of the valid exceptions. The reason for this is that the Constraints in a Paret program are really a set of Constraints rather than a list of Constraints. Thus, the order in which your unifier views each Constraint can change the order in which errors are raised. (Be careful when you’re writing test cases against wheats and chaffs!)
from Type Checker
can produce types with variables. For example, the term (lam x x)
might produce (t-fun (t-var ’g123) (t-var ’g123))
. Since Label
s are random, writing tests for types with variables can be difficult.
To help you write deterministic tests, we provide a function
normalize :: Term -> Term
that consumes a Term and renames all t-var labels to be lowercase letters in alphabetical order relative to how they appear in the Term. For example, normalizing the type inferred for (lam x (lam y empty)) should produce
(t-fun (t-var 'a) (t-fun (t-var 'b) (t-list 'c)))
which is equivalent to
(t-con '-> (list (t-var 'a) (t-fun (t-var 'b) (t-list 'c))))
You should use normalize when writing test cases in your testing file, or your test cases may cause the wheats to unexpectedly fail. You are not permitted to use normalize in your actual implementation.
Finally, in addition to the testing forms documented in the Testing Guidelines
, we provide the following testing forms:
(test-error-fails-occurs-check? name expr)
(test-error-constructor-mismatch? name expr)
(test-error-unbound-id? name expr)
These test that the given expr raises the correct error. (Example usage can be found in the testing stencil.)
6.7 Starter Code
We’ve provided starter code for your implementation at type-inference.rkt
and support code at support.rkt
. You are not allowed to change the signature of desugar
, and type-infer
, but you are welcome to add any helper functions that you need for your implementation.
We’ve also provided a stencil for your type-check
test cases at type-inference-tests.rkt
and testing support code at test-support.rkt
. You should check that you can run your type-inference-tests.rkt
file successfully in DrRacket before submitting—if you can’t, it means that a definition is missing or you’re trying to test a function that you shouldn’t be testing.
6.7.1 Set Library
As mentioned previously, it may be useful to think of the Constraints in a Paret program as a set of Constraints rather than a list of Constraints. Plait does not have a built-in set library, so we provide a (Setof ’a) abstraction in the support.rkt file:
Type for an immutable set containing elements of type ’a.
Creates an empty set.
Convenience macro around list->set; returns a (Setof ’a) containing all of the elements listed in .....
list->set :: (Listof ’a) -> (Setof ’a)
Returns a (Setof ’a) containing all of the elements in the input (Listof ’a).
set->list :: (Setof ’a) -> (Listof ’a)
Returns a (Listof ’a) containing all of the elements in the input (Setof ’a).
We also provide the following (Setof ’a) utility functions:
set-empty? :: (Setof ’a) -> Boolean
set-count :: (Setof ’a) -> Number
set-member? :: (Setof ’a), ’a -> Boolean
set-add :: (Setof ’a) ’a -> (Setof ’a)
set-remove :: (Setof ’a) ’a -> (Setof ’a)
set-union :: (Setof ’a), (Setof ’a) -> (Setof ’a)
set-pick :: (Setof ’a) -> (Pick ’a)
Picks an arbitrary element out of the set, and returns a Pick data structure.
(define-type (Pick 'a)
(pick-some [element : 'a] [rest : (Setof 'a)]))
If the set is empty, a pick-none is returned. Otherwise, a pick-some is returned, and the rest of the set (without the picked value) is stored in the rest field of the pick-some. The element returned in a pick-some is not guaranteed to be the same between applications of set-pick on the same (Setof ’a).
You are not required to use the provided (Setof ’a) interface, but you may find it helpful—especially since the templates for our suggested constraint-gen and unify functions utilize the (Setof ’a) abstraction.
6.8 What To Hand In
You will submit two files for this assignment:
type-inference.rkt, which should be uploaded to the “Code” drop on Gradescope.
type-inference-tests.rkt, which should be uploaded to the “Tests” drop on Gradescope.
You can update your submissions as many times as you want before the deadline.