9 Type Inference
9.1 Introduction
In this assignment, you will implement the type inference algorithm we studied in lecture on the (untyped) Paret language.
Before starting, you may find it helpful to read Chapter 30 of PAPL. Please note that the corresponding chapter in PLAI contains some errors.
9.2 The Language
The Paret language for this assignment is nearly the same as the Typed Paret language from
Type Checker, except that the type annotations on
lam and
empty expressions have been removed: putting them back in is, after all, the job of type inference.
We have also reintroduced and and or expressions back into the language as syntactic sugar. We also have changed let from a base expression into syntactic sugar. Thus, you will need to implement a desugarer that converts and, or, and let expressions into functionally equivalent expressions.
9.2.1 Grammar
The grammar of Untyped Paret is as follows:
<expr> ::= <num> |
| <string> |
| true |
| false |
| (+ <expr> <expr>) |
| (++ <expr> <expr>) |
| (num= <expr> <expr>) |
| (str= <expr> <expr>) |
| (if <expr> <expr> <expr>) |
| <id> |
| (<expr> <expr>) |
| (first <expr>) |
| (rest <expr>) |
| (is-empty <expr>) |
|
| (link <expr> <expr (list)>) |
| (empty) |
| (lam (<id>) <expr>) |
|
| (let (<id> <expr>) <expr>) |
| (and <expr> <expr>) |
| (or <expr> <expr>) |
9.3 Assignment
Refer to the support.rkt file for the definition of LExpr.
As usual, we have provided a function parse, which consumes an expression in Paret’s concrete syntax and returns the abstract syntax representation of that expression. However, the parser in this assignment returns an LExpr, which is exactly the same as an Expr, except that each LExpr has an additional label field and e-lam also has a param-label field. Labels are used to stand for program points in the type inference algorithm.
(define-type-alias Label Symbol) |
Given the updated Paret language and the updated LExpr abstract syntax, you will implement two functions: desugar and type-of.
desugar :: LExpr -> LExpr
which consumes an abstract syntax tree (i.e. an LExpr, as returned by parse), replaces all instances of sugar-and, sugar-or, and sugar-let with desugared equivalents, and returns the result.
The desugaring of sugar-let may not be obvious, so here’s a hint: it involves e-app and e-lam.
type-of :: LExpr -> Term
that consumes a Paret program in abstract syntax form. If the program is well-typed, type-of returns the type of that program (represented as a Term as defined below); otherwise, it raises an exception.
type-of should assume that it’s given an LExpr from desugar. Thus, type-of should assume that there are no sugar-ands, sugar-ors, or sugar-lets in the input Expr.
Once you have implemented
desugar and
type-of, you should use the provided
type-infer function (analogous to
type-check from
Type Checker) to write test cases for your type checker in your testing file.
9.3.1 Terms
As discussed in lecture, Terms describe the type of a subexpression of a program and are used in the constraint generation process. The language of Terms is as follows:
#lang racket |
(define-type Term |
(t-var [label : Label]) |
(t-con [head : Symbol] |
[args : (Listof Term)])) |
t-var denotes a term variable (also referred to as a type variable). A type variable (t-var ’g123) can be read as “the type of the expression (or program position) labeled ’g123”.
t-con denotes a term constructor, which builds up a more complicated term from a list of subterms. For example, (t-con ’-> (list (t-var ’g123) (t-var ’g123))) is a representation of the more familiar ’a -> ’a. Primitives are also represented this way. For example, (t-con ’Num empty) is a representation of the type Num.
For convenience, we provide helper functions to simulate the
Type language of
Type Checker in the language of
Term. For example,
(t-num) produces
(t-con ’Num empty);
t-num,
t-bool,
t-str,
t-fun, and
t-list are all provided, and you can reference the support code for their definitions. However, these are helper
functions and not
type variants, so you will not be able to match against them in a
type-case expression.
9.4 Features to Implement
9.4.1 Desugaring with Labels
When producing desugared expressions, you may need to create fresh Symbols. For example, you may need to generate new Labels for non-sugar LExprs. To do this, you should use the built-in gensym function, which generates a unique, unused Symbol.
You may also use gensym if you need to create Terms with unknown types.
9.4.2 Label Environment
Your type inference algorithm should use a label environment (LEnv) to keep track of label identifiers that are in scope.
#lang racket |
(define-type-alias LEnv (Hashof Symbol Label)) |
For convenience, we provide a get-label function that consumes a LExpr and retrieves its Label. We also provide a map-table-values function that you can use to map over all values in your environment.
map-table-values :: ('a -> 'b), (Hashof 'k 'a) -> (Hashof 'k 'b) |
9.4.3 The Inference Algorithm
In the starter code, we provide templates for two optional helper functions: constraint-gen and unify.
constraint-gen :: LExpr -> (Setof Constraint) |
unify :: (Setof Constraint) -> Substitution |
The two phases of the inference algorithm discussed in lecture (and discussed below) map to these two helper functions. We suggest using these two function templates to guide your overall type-of implementation, but they are not required—we will not be testing these functions directly, and you are free to ignore them or change them as you wish.
9.4.3.1 Phase 1: Constraint Generation
Recall that Constraints relate program subexpressions by describing how they work together in a successful execution. The language of Constraints is:
#lang racket |
(define-type Constraint |
(eq [l : Term] |
[r : Term])) |
Constraints equate two Terms, often a type variable to the type of some other expression. For example, a Constraint that would be generated for the LExpr (e-num ’g123 5) is:
#lang racket |
(eq (t-var 'g123) (t-num)) |
9.4.3.2 Phase 2: Unification
The process used to solve constraints is known as unification. In unification, a unifier algorithm is given a set of equations (e.g. Constraints), and each Constraint maps a type variable (t-var) to a Term. Given a consistent set of Constraints, we try to unify them in such a way that every t-var gets assigned the correct type.
To do this, our unifier should generate a mapping from variables (Labels) to Terms. We define the Substitution alias to represent this mapping:
#lang racket |
(define-type-alias Substitution (Hashof Label Term)) |
Unification should always terminate. Thus, your unifier should perform an “occurs” check that causes it to terminate given a circular Constraint. Specifically, given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with. (Make sure you can construct a test case for this!)
Make sure you understand why replace = with should pass the “occurs” check.
You may find the high-level description of unification
here helpful as well.
9.4.4 Exceptions
The full set of exceptions is as follows:
#lang racket |
(define-type TypeInferenceError |
(ti-err-fails-occurs-check [replace : Term] [with : Term]) |
(ti-err-constructor-mismatch [t1 : Term] [t2 : Term]) |
(ti-err-unbound-id [name : Symbol])) |
Given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with.
Given the Constraint (eq t1 t2), you should raise (ti-err-constructor-mismatch t1 t2) if t1 and t2 are t-cons with different head symbols.
Given an expression that references an unbound identifier, you should raise ti-err-unbound-id with the name of the identifier.
The priority of TypeInferenceErrors is unspecified—when multiple TypeInferenceErrors could be raised in a program, your program just needs to raise one of the valid exceptions. The reason for this is that the Constraints in a Paret program are really a set of Constraints rather than a list of Constraints. Thus, the order in which your unifier views each Constraint can change the order in which errors are raised. (Be careful when you’re writing test cases against wheats and chaffs!)
9.5 Testing
Unlike
type-check from
Type Checker,
type-infer can produce types with variables. For example, the term
(lam x x) might produce
(t-fun (t-var ’g123) (t-var ’g123)). Since
Labels are random, writing tests for types with variables can be difficult.
To help you write deterministic tests, we provide a function
normalize :: Term -> Term |
that consumes a Term and renames all t-var labels to be lowercase letters in alphabetical order relative to how they appear in the Term. For example, normalizing the type inferred for (lam x (lam y empty)) should produce
#lang racket |
(t-fun (t-var 'a) (t-fun (t-var 'b) (t-list 'c))) |
which is equivalent to
#lang racket |
(t-con '-> (list (t-var 'a) (t-fun (t-var 'b) (t-list 'c)))) |
You should use normalize when writing test cases in your testing file, or your test cases may cause the wheats to unexpectedly fail. You are not permitted to use normalize in your actual implementation.
Finally, in addition to the testing forms documented in the
Testing Guidelines, we provide the following testing forms:
(test-error-fails-occurs-check? name expr)
(test-error-constructor-mismatch? name expr)
(test-error-unbound-id? name expr)
These test that the given expr raises the correct error. (Example usage can be found in the testing stencil.)
9.6 Starter Code
We’ve provided starter code for your implementation at
type-inference.rkt and support code at
support.rkt. You are not allowed to change the signature of
desugar,
type-of, and
type-infer, but you are welcome to add any helper functions that you need for your implementation.
We’ve also provided a stencil for your
type-check test cases at
type-inference-tests.rkt and testing support code at
test-support.rkt. You should check that you can run your
type-inference-tests.rkt file successfully in DrRacket before submitting—if you can’t, it means that a definition is missing or you’re trying to test a function that you shouldn’t be testing.
9.6.1 Set Library
As mentioned previously, it may be useful to think of the Constraints in a Paret program as a set of Constraints rather than a list of Constraints. Plait does not have a built-in set library, so we provide a (Setof ’a) abstraction in the support.rkt file:
(Setof ’a)
Type for an immutable set containing elements of type ’a.
empty-set
Creates an empty set.
(set ....)
Convenience macro around list->set; returns a (Setof ’a) containing all of the elements listed in .....
list->set :: (Listof ’a) -> (Setof ’a)
Returns a (Setof ’a) containing all of the elements in the input (Listof ’a).
set->list :: (Setof ’a) -> (Listof ’a)
Returns a (Listof ’a) containing all of the elements in the input (Setof ’a).
We also provide the following (Setof ’a) utility functions:
set-empty? :: (Setof ’a) -> Boolean
set-count :: (Setof ’a) -> Number
set-member? :: (Setof ’a), ’a -> Boolean
set-add :: (Setof ’a), ’a -> (Setof ’a)
set-remove :: (Setof ’a), ’a -> (Setof ’a)
set-union :: (Listof (Setof ’a)) -> (Setof ’a)
set-pick :: (Setof ’a) -> (Pick ’a)
Picks an arbitrary element out of the set, and returns a Pick data structure.
(define-type (Pick 'a) |
(pick-none) |
(pick-some [element : 'a] [rest : (Setof 'a)])) |
If the set is empty, a pick-none is returned. Otherwise, a pick-some is returned, and the rest of the set (without the picked value) is stored in the rest field of the pick-some. The element returned in a pick-some is not guaranteed to be the same between applications of set-pick on the same (Setof ’a).
map-set :: (’a -> ’b), (Setof ’a) -> (Setof ’b)
You are not required to use the provided (Setof ’a) interface, but you may find it helpful—especially since the templates for our suggested constraint-gen and unify functions utilize the (Setof ’a) abstraction.
9.7 What To Hand In
You will submit two files for this assignment:
type-inference.rkt, which should be uploaded to the “Code” drop on Gradescope.
type-inference-tests.rkt, which should be uploaded to the “Tests” drop on Gradescope.
You can update your submissions as many times as you want before the deadline.