This repository contains an attempt to replace GHC’s code generator (i.e. everything that comes between Core and LLVM IR) with a very naive and straight-forward implementation that is so simple, we can prove it correct.
The correctness proof eventually ought to bridge the gap between CoreSpec (the Haskell Core formalization) and VeLLVM (the verified LLVM project) in the DeepSpec project.
All Haskell code in ghc-prim
, integer-gmp
and base
compiles. What does
not work:
- In out-of-memory (or out-of-stack?) situations, the program segfaults.
- Not all primitive operations have been implemented; the program will abort with a descriptive message when an unimplemented PrimOp is called. Patches to file <./GenPrimOpStubs.hs> are welcome.
- FFI calls that involve floating points numbers or the `ThreadId# are not implemented, and will abort the program with a descriptive message when used. In general, though, FFI works.
- No garbage collection (it may be that this will not ever change).
The resulting programs will be much slower than with any other Haskell compiler out there!
- Ensure that
ghc
is GHC version 8.0, and thatclang-4.0
is in your path. - Run
cabal install groom llvm-hs-pure wl-pprint-text
. - Clone the repository
- Run
./boot.sh some-directory
. - You can now use
some-directory/bin/veggies
like you would useghc
.
You can use it with cabal
by passing -w some-directory/bin/veggies
to
cabal configure
.
Every object is a pointer (%hs*
). Even unboxed values (Int#
) are hidden
behind a pointer. This makes stuff very uniform.
Every heap object is of the form {%hs* (%hs*)*, …}
, where the first field is
the enter function. It always returns something evaluated. Unlifted types,
who should not be entered, also have an enter function, but that one aborts the
program. This is for consistency and error checking.
The two types of unevaluated closures are:
- A thunk is represented as
{%hs* (%hs*)* enter, [1 x %hs*] captured_variables}
. Even if the thunk does not capture a single variable, the allocated memory must allow for one (to override it with an indirection). Theenter
is called with the pointer to the thunk as an argument. It will evaluate the expression, override the thunk with an indirection to the result, and return a pointer to the result. - An indirection is represented as
{%hs* (%hs*)* enter, %hs* indirectee}
. Theenter
is called with the pointer to the indirection as an argument, and will simply call the code of that.
The enter function for value closures is always the same, and simply returns the pointer to the object.
- A data constructor is represented as
{%hs* (%hs*)* enter, i64 tag, [0 x %hs*] args}
, where the tag is positive and indicates which data constructor we have. - An
Int#
value is represented as{%hs* (%hs*)* enter, i64 val}
. Similarly for other primitive values (Char#
,Addr#
). - A function closure is represented as
{%hs* (%hs*)* enter, %hs* (%hs*, [0 x %hs*])* fun, i64 arity, [0 x %hs*] captured_vars}
, where thefun
is a function pointer. The function expects the closure as the first argument (to access the captured variables), and a pointer to preciselyarity
further arguments.
Every function call should go through rts_call
which compares the number of
arguments with the arity of and either calls the function, possibly passing
left-over arguments to the result, or creates a PAP (partial application)
closure.
Contact Joachim Breitner <[email protected]> if you have questions.