Algorithm W is the original algorithm for infering types in the Damas-Hindley-Milner type
system. It supports polymorphic types, such as forall[a] a -> a
, let generalization, and
infers principal (most general) types. Although it is formally described using explicit
substitutions, it permits an efficient implemenation using updatable references, achieving
close to linear time complexity (in terms of the size of the expression being type-infered).
For a general description of Algorithm W, see the Wikipedia article. This implementation uses several optimizations over the naive implementation. Instead of explicit substitutions when unifying types, it uses updateable references. It also tags unbound type variables with levels or ranks to optimize generalizations of let bindings, a technique first described by Didier Rémy [1]. A very eloquent description of the ranked type variables algorithm and associated optimizations was written by Oleg Kiselyov.
The basic terms of Lambda Calculus and the structure of the types are defined in expr.ml
.
Lexer is implemented in lexer.mll
using ocamllex
, and a simple parser in file parser.mly
using ocamlyacc
. The main type inference is implemented in the file infer.ml
.
The function infer
takes an environment, a level used for let-generalization, and an
expression, and infers types for each term type. Variables are looked up in the environment
and instantiated. The type of functions is inferred by adding the function parameters to
the type environment using fresh type variables, and inferring the type of the function body.
The type of let expression is inferred by first inferring the type of the let-bound value,
generalizing the type, and the inferring the type of the let body in the extended type
environment. Finally, the type of a call expression is inferred by first matching the
type of the expression being called using the match_fun_ty
function, and then inferring
the types of the arguments and unifying them with the types of function parameters.
The function unify
takes two types and tries to unify them, i.e. determine if they can
be equal. Type constants unify with identical type constants, and arrow types and other
structured types are unified by unifying each of their components. After first performing
an "occurs check", unbound type variables can be unified with any type by replacing their
reference with a link pointing to the other type.
The function occurs_check_adjust_levels
makes sure that the type variable being unified
doesn't occur within the type it is being unified with. This prevents the algorithm from
inferring recursive types, which could cause naively-implemented type checking to diverge.
While traversing the type tree, this function also takes care of updating the levels of the
type variables appearing within the type, thus ensuring the type will be correctly generalized.
Function generalize
takes a level and a type and turns all type variables within the type
that have level higher than the input level into generalized (polymorphic) type variables.
Function instantiate
duplicates the input type, transforming any polymorphic variables
into normal unbound type variables.
Although this implementation is reasonably efficient, state-of-the-art implementations of
HM type inference employ several more advanced techniques which were avoided in this
implementation for the sake of clarity. As outlined in Oleg's article, OCaml's type checker
marks every type with a type level, which is the maximum type level of the type variables
occuring within it, to avoid traversing non-polymorphic types during instantiation. It also
delays occurs checks and level adjustments. It can deal with recursive types, which arise
often while type checking objects, by marking types during unification. Oleg describes a
further optimization that could be performed by condensing the sequences
of type links as we follow them, but our generalize
function takes care of that problem.
[1] Didier Rémy. Extending ML type system with a sorted equational theory. 1992