On this page:
6.1 Introduction
6.2 Reading
6.3 The Language
6.3.1 Grammar
6.4 Assignment
6.4.1 Terms
6.5 Features to Implement
6.5.1 Desugaring with Labels
6.5.2 Label Environment
6.5.3 The Inference Algorithm
6.5.3.1 Phase 1:   Constraint Generation
6.5.3.2 Phase 2:   Unification
6.5.4 Exceptions
6.6 Testing
6.7 Starter Code
6.7.1 Set Library
6.8 What To Hand In
6 Type Inference

    6.1 Introduction

    6.2 Reading

    6.3 The Language

      6.3.1 Grammar

    6.4 Assignment

      6.4.1 Terms

    6.5 Features to Implement

      6.5.1 Desugaring with Labels

      6.5.2 Label Environment

      6.5.3 The Inference Algorithm

        6.5.3.1 Phase 1: Constraint Generation

        6.5.3.2 Phase 2: Unification

      6.5.4 Exceptions

    6.6 Testing

    6.7 Starter Code

      6.7.1 Set Library

    6.8 What To Hand In

6.1 Introduction

In this assignment, you will implement the type inference algorithm we studied in lecture on the (untyped) Paret language.

6.2 Reading

Chapter 15.3.2 of PLAI 2/e.

6.3 The Language

The Paret language for this assignment is nearly the same as the Typed Paret language from Type Checker, except that the type annotations on lam and empty expressions have been removed: putting them back in is, after all, the job of type inference.
We have also reintroduced and and or expressions back into the language as syntactic sugar. We also have changed let from a base expression into syntactic sugar. Thus, you will need to implement a desugarer that converts and, or, and let expressions into functionally equivalent expressions.

6.3.1 Grammar

The grammar of Untyped Paret is as follows:

<expr> ::= <num>

         | <string>

         | true

         | false

         | (+ <expr> <expr>)

         | (++ <expr> <expr>)

         | (num= <expr> <expr>)

         | (str= <expr> <expr>)

         | (if <expr> <expr> <expr>)

         | <id>

         | (<expr> <expr>)

         | (first <expr>)

         | (rest <expr>)

         | (is-empty <expr>)

 

         | (link <expr> <expr>)

         | empty

         | (lam <id> <expr>)

 

         | (let (<id> <expr>) <expr>)

         | (and <expr> <expr>)

         | (or <expr> <expr>)

6.4 Assignment

Refer to the support.rkt file for the definition of LExpr.

As usual, we have provided a function parse, which consumes an expression in Paret’s concrete syntax and returns the abstract syntax representation of that expression. However, the parser in this assignment returns an LExpr, which is exactly the same as an Expr, except that each LExpr has an additional label field and e-lam also has a param-label field. Labels are used to stand for program points in the type inference algorithm.

(define-type-alias Label Symbol)

Given the updated Paret language and the updated LExpr abstract syntax, you will implement two functions: desugar and type-of.
  • desugar :: LExpr -> LExpr
    which consumes an abstract syntax tree (i.e. an LExpr, as returned by parse), replaces all instances of sugar-and, sugar-or, and sugar-let with desugared equivalents, and returns the result.

  • type-of :: LExpr -> Term
    that consumes a Paret program in abstract syntax form. If the program is well-typed, type-of returns the type of that program (represented as a Term as defined below); otherwise, it raises an exception.
    type-of should assume that it’s given an LExpr from desugar. Thus, type-of should assume that there are no sugar-ands, sugar-ors, or sugar-lets in the input Expr.

Once you have implemented desugar and type-of, you should use the provided type-infer function (analogous to type-check from Type Checker) to write test cases for your type checker in your testing file.

6.4.1 Terms

As discussed in lecture, Terms describe the type of a subexpression of a program and are used in the constraint generation process. The language of Terms is as follows:

(define-type Term

  (t-var [label : Label])

  (t-con [head : Symbol]

         [args : (Listof Term)]))

  • t-var denotes a term variable (also referred to as a type variable). A type variable (t-var g123) can be read as “the type of the expression (or program position) labeled g123”.

  • t-con denotes a term constructor, which builds up a more complicated term from a list of subterms. For example, (t-con -> (list (t-var g123) (t-var g123))) is a representation of the more familiar a -> a. Primitives are also represented this way. For example, (t-con Num empty) is a representation of the type Num.

For convenience, we provide helper functions to simulate the Type language of Type Checker in the language of Term. For example, (t-num) produces (t-con Num empty); t-num, t-bool, t-str, t-fun, and t-list are all provided, and you can reference the support code for their definitions. However, these are helper functions and not type variants, so you will not be able to match against them in a type-case expression.

6.5 Features to Implement
6.5.1 Desugaring with Labels

When producing desugared expressions, you may need to create fresh Symbols. For example, you may need to generate new Labels for non-sugar LExprs. To do this, you should use the built-in gensym function, which generates a unique, unused Symbol.

You may also use gensym if you need to create Terms with unknown types.

6.5.2 Label Environment

Your type inference algorithm should use a label environment (LEnv) to keep track of label identifiers that are in scope.

(define-type-alias LEnv (Hashof Symbol Label))

For convenience, we provide a get-label function that consumes a LExpr and retrieves its Label.

6.5.3 The Inference Algorithm

In the starter code, we provide templates for two optional helper functions: constraint-gen and unify.

constraint-gen :: LExpr -> (Setof Constraint)

unify :: (Setof Constraint) -> Substitution

The two phases of the inference algorithm discussed in lecture (and discussed below) map to these two helper functions. We suggest using these two function templates to guide your overall type-of implementation, but they are not required—we will not be testing these functions directly, and you are free to ignore them or change them as you wish.

6.5.3.1 Phase 1: Constraint Generation

Recall that Constraints relate program subexpressions by describing how they work together in a successful execution. The language of Constraints is:

(define-type Constraint

  (eq [l : Term]

      [r : Term]))

Constraints equate two Terms, often a type variable to the type of some other expression. For example, a Constraint that would be generated for the LExpr (e-num g123 5) is:

(eq (t-var 'g123) (t-num))

6.5.3.2 Phase 2: Unification

The process used to solve constraints is known as unification. In unification, a unifier algorithm is given a set of equations (e.g. Constraints), and each Constraint maps a type variable (t-var) to a Term. Given a consistent set of Constraints, we try to unify them in such a way that every t-var gets assigned the correct type.
To do this, our unifier should generate a mapping from variables (Labels) to Terms that do not contain any variables. We define the Substitution alias to represent this mapping:

(define-type-alias Substitution (Hashof Label Term))

Unification should always terminate. Thus, your unifier should perform an “occurs” check that causes it to terminate given a circular Constraint. Specifically, given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with. (Make sure you can construct a test case for this!)

Make sure you understand why replace = with should pass the “occurs” check.

You may find the high-level description of unification here helpful as well.

6.5.4 Exceptions

The full set of exceptions is as follows:

(define-type TypeInferenceError

  (ti-err-fails-occurs-check [replace : Term] [with : Term])

  (ti-err-constructor-mismatch [t1 : Term] [t2 : Term])

  (ti-err-unbound-id [name : Symbol]))

  • Given the Constraint (eq replace with), you should raise (ti-err-fails-occurs-check replace with) if replace is a strict subterm of with.

  • Given the Constraint (eq t1 t2), you should raise (ti-err-constructor-mismatch t1 t2) if t1 and t2 are t-cons with different head symbols.

  • Given an expression that references an unbound identifier, you should raise ti-err-unbound-id with the name of the identifier.

The priority of TypeInferenceErrors is unspecified—when multiple TypeInferenceErrors could be raised in a program, your program just needs to raise one of the valid exceptions. The reason for this is that the Constraints in a Paret program are really a set of Constraints rather than a list of Constraints. Thus, the order in which your unifier views each Constraint can change the order in which errors are raised. (Be careful when you’re writing test cases against wheats and chaffs!)

6.6 Testing

Unlike type-check from Type Checker, type-infer can produce types with variables. For example, the term (lam x x) might produce (t-fun (t-var g123) (t-var g123)). Since Labels are random, writing tests for types with variables can be difficult.
To help you write deterministic tests, we provide a function

normalize :: Term -> Term

that consumes a Term and renames all t-var labels to be lowercase letters in alphabetical order relative to how they appear in the Term. For example, normalizing the type inferred for (lam x (lam y empty)) should produce

(t-fun (t-var 'a) (t-fun (t-var 'b) (t-list 'c)))

which is equivalent to

(t-con '-> (list (t-var 'a) (t-fun (t-var 'b) (t-list 'c))))

You should use normalize when writing test cases in your testing file, or your test cases may cause the wheats to unexpectedly fail. You are not permitted to use normalize in your actual implementation.
Finally, in addition to the testing forms documented in the Testing Guidelines, we provide the following testing forms:
  • (test-error-fails-occurs-check? name expr)

  • (test-error-constructor-mismatch? name expr)

  • (test-error-unbound-id? name expr)

These test that the given expr raises the correct error. (Example usage can be found in the testing stencil.)

6.7 Starter Code

We’ve provided starter code for your implementation at type-inference.rkt and support code at support.rkt. You are not allowed to change the signature of desugar, type-of, and type-infer, but you are welcome to add any helper functions that you need for your implementation.
We’ve also provided a stencil for your type-check test cases at type-inference-tests.rkt and testing support code at test-support.rkt. You should check that you can run your type-inference-tests.rkt file successfully in DrRacket before submitting—if you can’t, it means that a definition is missing or you’re trying to test a function that you shouldn’t be testing.

6.7.1 Set Library

As mentioned previously, it may be useful to think of the Constraints in a Paret program as a set of Constraints rather than a list of Constraints. Plait does not have a built-in set library, so we provide a (Setof a) abstraction in the support.rkt file:
  • (Setof a)
    Type for an immutable set containing elements of type a.

  • empty-set
    Creates an empty set.

  • (set ....)
    Convenience macro around list->set; returns a (Setof a) containing all of the elements listed in .....

  • list->set :: (Listof a) -> (Setof a)
    Returns a (Setof a) containing all of the elements in the input (Listof a).

  • set->list :: (Setof a) -> (Listof a)
    Returns a (Listof a) containing all of the elements in the input (Setof a).

We also provide the following (Setof a) utility functions:
  • set-empty? :: (Setof a) -> Boolean

  • set-count :: (Setof a) -> Number

  • set-member? :: (Setof a), a -> Boolean

  • set-add :: (Setof a) a -> (Setof a)

  • set-remove :: (Setof a) a -> (Setof a)

  • set-union :: (Setof a), (Setof a) -> (Setof a)

  • set-pick :: (Setof a) -> (Pick a)
    Picks an arbitrary element out of the set, and returns a Pick data structure.

    (define-type (Pick 'a)

      (pick-none)

      (pick-some [element : 'a] [rest : (Setof 'a)]))

    If the set is empty, a pick-none is returned. Otherwise, a pick-some is returned, and the rest of the set (without the picked value) is stored in the rest field of the pick-some. The element returned in a pick-some is not guaranteed to be the same between applications of set-pick on the same (Setof a).

You are not required to use the provided (Setof a) interface, but you may find it helpful—especially since the templates for our suggested constraint-gen and unify functions utilize the (Setof a) abstraction.

6.8 What To Hand In

You will submit two files for this assignment:
  • type-inference.rkt, which should be uploaded to the “Code” drop on Gradescope.

  • type-inference-tests.rkt, which should be uploaded to the “Tests” drop on Gradescope.

You can update your submissions as many times as you want before the deadline.