6 Type Inference
6.1 Introduction
In this assignment, you will implement the type inference algorithm we studied in lecture on the (untyped) Paret language.
6.2 Reading
Chapter 15.3.2 of PLAI 2/e.
6.3 The Language
The Paret language for this assignment is nearly the same as the Typed Paret language from
Type Checker, except that the type annotations on
lam and
empty expressions have been removed: putting them back in is, after all, the job of type inference.
We have also reintroduced and and or expressions back into the language as syntactic sugar. We also have changed let from a base expression into syntactic sugar. Thus, you will need to implement a desugarer that converts and, or, and let expressions into functionally equivalent expressions.
6.3.1 Grammar
The grammar of Untyped Paret is as follows:
<expr> ::= <num> 
 <string> 
 true 
 false 
 (+ <expr> <expr>) 
 (++ <expr> <expr>) 
 (num= <expr> <expr>) 
 (str= <expr> <expr>) 
 (if <expr> <expr> <expr>) 
 <id> 
 (<expr> <expr>) 
 (first <expr>) 
 (rest <expr>) 
 (isempty <expr>) 

 (link <expr> <expr>) 
 empty 
 (lam <id> <expr>) 

 (let (<id> <expr>) <expr>) 
 (and <expr> <expr>) 
 (or <expr> <expr>) 
6.4 Assignment
Refer to the support.rkt file for the definition of LExpr.
As usual, we have provided a function parse, which consumes an expression in Paret’s concrete syntax and returns the abstract syntax representation of that expression. However, the parser in this assignment returns an LExpr, which is exactly the same as an Expr, except that each LExpr has an additional label field and elam also has a paramlabel field. Labels are used to stand for program points in the type inference algorithm.
(definetypealias Label Symbol) 
Given the updated Paret language and the updated LExpr abstract syntax, you will implement two functions: desugar and typeof.
desugar :: LExpr > LExpr
which consumes an abstract syntax tree (i.e. an LExpr, as returned by parse), replaces all instances of sugarand, sugaror, and sugarlet with desugared equivalents, and returns the result.
typeof :: LExpr > Term
that consumes a Paret program in abstract syntax form. If the program is welltyped, typeof returns the type of that program (represented as a Term as defined below); otherwise, it raises an exception.
typeof should assume that it’s given an LExpr from desugar. Thus, typeof should assume that there are no sugarands, sugarors, or sugarlets in the input Expr.
Once you have implemented
desugar and
typeof, you should use the provided
typeinfer function (analogous to
typecheck from
Type Checker) to write test cases for your type checker in your testing file.
6.4.1 Terms
As discussed in lecture, Terms describe the type of a subexpression of a program and are used in the constraint generation process. The language of Terms is as follows:
(definetype Term 
(tvar [label : Label]) 
(tcon [head : Symbol] 
[args : (Listof Term)])) 
tvar denotes a term variable (also referred to as a type variable). A type variable (tvar ’g123) can be read as “the type of the expression (or program position) labeled ’g123”.
tcon denotes a term constructor, which builds up a more complicated term from a list of subterms. For example, (tcon ’> (list (tvar ’g123) (tvar ’g123))) is a representation of the more familiar ’a > ’a. Primitives are also represented this way. For example, (tcon ’Num empty) is a representation of the type Num.
For convenience, we provide helper functions to simulate the
Type language of
Type Checker in the language of
Term. For example,
(tnum) produces
(tcon ’Num empty);
tnum,
tbool,
tstr,
tfun, and
tlist are all provided, and you can reference the support code for their definitions. However, these are helper
functions and not
type variants, so you will not be able to match against them in a
typecase expression.
6.5 Features to Implement
6.5.1 Desugaring with Labels
When producing desugared expressions, you may need to create fresh Symbols. For example, you may need to generate new Labels for nonsugar LExprs. To do this, you should use the builtin gensym function, which generates a unique, unused Symbol.
You may also use gensym if you need to create Terms with unknown types.
6.5.2 Label Environment
Your type inference algorithm should use a label environment (LEnv) to keep track of label identifiers that are in scope.
(definetypealias LEnv (Hashof Symbol Label)) 
For convenience, we provide a getlabel function that consumes a LExpr and retrieves its Label.
6.5.3 The Inference Algorithm
In the starter code, we provide templates for two optional helper functions: constraintgen and unify.
constraintgen :: LExpr > (Setof Constraint) 
unify :: (Setof Constraint) > Substitution 
The two phases of the inference algorithm discussed in lecture (and discussed below) map to these two helper functions. We suggest using these two function templates to guide your overall typeof implementation, but they are not required—we will not be testing these functions directly, and you are free to ignore them or change them as you wish.
6.5.3.1 Phase 1: Constraint Generation
Recall that Constraints relate program subexpressions by describing how they work together in a successful execution. The language of Constraints is:
(definetype Constraint 
(eq [l : Term] 
[r : Term])) 
Constraints equate two Terms, often a type variable to the type of some other expression. For example, a Constraint that would be generated for the LExpr (enum ’g123 5) is:
(eq (tvar 'g123) (tnum)) 
6.5.3.2 Phase 2: Unification
The process used to solve constraints is known as unification. In unification, a unifier algorithm is given a set of equations (e.g. Constraints), and each Constraint maps a type variable (tvar) to a Term. Given a consistent set of Constraints, we try to unify them in such a way that every tvar gets assigned the correct type.
To do this, our unifier should generate a mapping from variables (Labels) to Terms that do not contain any variables. We define the Substitution alias to represent this mapping:
(definetypealias Substitution (Hashof Label Term)) 
Unification should always terminate. Thus, your unifier should perform an “occurs” check that causes it to terminate given a circular Constraint. Specifically, given the Constraint (eq replace with), you should raise (tierrfailsoccurscheck replace with) if replace is a strict subterm of with. (Make sure you can construct a test case for this!)
Make sure you understand why replace = with should pass the “occurs” check.
You may find the highlevel description of unification
here helpful as well.
6.5.4 Exceptions
The full set of exceptions is as follows:
(definetype TypeInferenceError 
(tierrfailsoccurscheck [replace : Term] [with : Term]) 
(tierrconstructormismatch [t1 : Term] [t2 : Term]) 
(tierrunboundid [name : Symbol])) 
Given the Constraint (eq replace with), you should raise (tierrfailsoccurscheck replace with) if replace is a strict subterm of with.
Given the Constraint (eq t1 t2), you should raise (tierrconstructormismatch t1 t2) if t1 and t2 are tcons with different head symbols.
Given an expression that references an unbound identifier, you should raise tierrunboundid with the name of the identifier.
The priority of TypeInferenceErrors is unspecified—when multiple TypeInferenceErrors could be raised in a program, your program just needs to raise one of the valid exceptions. The reason for this is that the Constraints in a Paret program are really a set of Constraints rather than a list of Constraints. Thus, the order in which your unifier views each Constraint can change the order in which errors are raised. (Be careful when you’re writing test cases against wheats and chaffs!)
6.6 Testing
Unlike
typecheck from
Type Checker,
typeinfer can produce types with variables. For example, the term
(lam x x) might produce
(tfun (tvar ’g123) (tvar ’g123)). Since
Labels are random, writing tests for types with variables can be difficult.
To help you write deterministic tests, we provide a function
normalize :: Term > Term 
that consumes a Term and renames all tvar labels to be lowercase letters in alphabetical order relative to how they appear in the Term. For example, normalizing the type inferred for (lam x (lam y empty)) should produce
(tfun (tvar 'a) (tfun (tvar 'b) (tlist 'c))) 
which is equivalent to
(tcon '> (list (tvar 'a) (tfun (tvar 'b) (tlist 'c)))) 
You should use normalize when writing test cases in your testing file, or your test cases may cause the wheats to unexpectedly fail. You are not permitted to use normalize in your actual implementation.
Finally, in addition to the testing forms documented in the
Testing Guidelines, we provide the following testing forms:
(testerrorfailsoccurscheck? name expr)
(testerrorconstructormismatch? name expr)
(testerrorunboundid? name expr)
These test that the given expr raises the correct error. (Example usage can be found in the testing stencil.)
6.7 Starter Code
We’ve provided starter code for your implementation at
typeinference.rkt and support code at
support.rkt. You are not allowed to change the signature of
desugar,
typeof, and
typeinfer, but you are welcome to add any helper functions that you need for your implementation.
We’ve also provided a stencil for your
typecheck test cases at
typeinferencetests.rkt and testing support code at
testsupport.rkt. You should check that you can run your
typeinferencetests.rkt file successfully in DrRacket before submitting—if you can’t, it means that a definition is missing or you’re trying to test a function that you shouldn’t be testing.
6.7.1 Set Library
As mentioned previously, it may be useful to think of the Constraints in a Paret program as a set of Constraints rather than a list of Constraints. Plait does not have a builtin set library, so we provide a (Setof ’a) abstraction in the support.rkt file:
(Setof ’a)
Type for an immutable set containing elements of type ’a.
emptyset
Creates an empty set.
(set ....)
Convenience macro around list>set; returns a (Setof ’a) containing all of the elements listed in .....
list>set :: (Listof ’a) > (Setof ’a)
Returns a (Setof ’a) containing all of the elements in the input (Listof ’a).
set>list :: (Setof ’a) > (Listof ’a)
Returns a (Listof ’a) containing all of the elements in the input (Setof ’a).
We also provide the following (Setof ’a) utility functions:
setempty? :: (Setof ’a) > Boolean
setcount :: (Setof ’a) > Number
setmember? :: (Setof ’a), ’a > Boolean
setadd :: (Setof ’a) ’a > (Setof ’a)
setremove :: (Setof ’a) ’a > (Setof ’a)
setunion :: (Setof ’a), (Setof ’a) > (Setof ’a)
setpick :: (Setof ’a) > (Pick ’a)
Picks an arbitrary element out of the set, and returns a Pick data structure.
(definetype (Pick 'a) 
(picknone) 
(picksome [element : 'a] [rest : (Setof 'a)])) 
If the set is empty, a picknone is returned. Otherwise, a picksome is returned, and the rest of the set (without the picked value) is stored in the rest field of the picksome. The element returned in a picksome is not guaranteed to be the same between applications of setpick on the same (Setof ’a).
You are not required to use the provided (Setof ’a) interface, but you may find it helpful—especially since the templates for our suggested constraintgen and unify functions utilize the (Setof ’a) abstraction.
6.8 What To Hand In
You will submit two files for this assignment:
typeinference.rkt, which should be uploaded to the “Code” drop on Gradescope.
typeinferencetests.rkt, which should be uploaded to the “Tests” drop on Gradescope.
You can update your submissions as many times as you want before the deadline.