Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
nikomatsakis committed Jul 14, 2015
1 parent 2604c80 commit e52ff7d
Show file tree
Hide file tree
Showing 2 changed files with 923 additions and 6 deletions.
108 changes: 102 additions & 6 deletions 0000-template.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,117 @@
- Feature Name: (fill me in with a unique ident, my_awesome_feature)
- Feature Name: N/A
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

One para explanation of the feature.
Introduce a "mid-level IR" (MIR) into the compiler. The MIR desugars
most of Rust's surface representation, leaving a simpler form that is
well-suited to type-checking and translation.

# Motivation

Why are we doing this? What use cases does it support? What is the expected outcome?
The current compiler uses a single AST from the initial parse all the
way to the final generation of LLVM. While this has some advantages,
there are also a number of distinct downsides.

1. The complexity of the compiler is increased because all passes must
be written against the full Rust language, rather than being able
to consider a reduced subset. The MIR proposed here is *radically*
simpler than the surface Rust syntax -- for example, it contains no
"match" statements, and converts both `ref` bindings and `&`
expresions into a single form.

a. There are numerous examples of "desugaring" in Rust. In
principle, desugaring one language feature into another should
make the compiler *simpler*, but in our current implementation,
it tends to make things more complex, because every phase must
simulate the desugaring anew. The most prominent example are
closure expressions (`|| ...`), which desugar to a fresh struct
instance, but other examples abound: `for` loops, `if let` and
`while let`, `box` expressions, overloaded operators (which
desugar to method calls), method calls (which desugar to UFCS
notation). There are also a number of features (such as `box`
patterns) which are almost infeasible to implement today but
which should be nearly trivial given a MIR representation.

2. Reasoning about fine-grained control-flow in an AST is rather
difficult. The right tool for this job is a control-flow graph
(CFG). We currently construct a CFG that lives "on top" of the AST,
which allows the borrow checking code to be flow sensitive, but it
is awkward to work with. Worse, because this CFG is not used by
trans, it is not necessarily the case that the control-flow as seen
by the analyses corresponds to the code that will be generated.
The MIR is based on a CFG, resolving this situation.

3. The reliability of safety analyses is reduced because the gap
between what is being analyzed (the AST) and what is being executed
(LLVM bitcode) is very wide. The MIR is very low-level and hence the
translation to LLVM should be straightforward.

4. The reliability of safety proofs, when we have some, would be
reduced because the formal language we are modeling is so far from
the full compiler AST. The MIR is simple enough that it should be
possible to (eventually) make safety proofs based on the MIR
itself.

5. Rust-specific optimizations, and optimizing trans output, are very
challenging. There are numerous cases where it would be nice to be
able to do optimizations *before* translating to LLVM
bitcode. Currently, we are forced to do these optimizations as part
of lowering to bitcode, which can get quite complex. Having an intermediate
form improves the situation because:

a. In some cases, we can do the optimizations in the MIR itself before translation.
b. In other cases, we can do analyses on the MIR to easily determine when the optimization
would be safe.
c. Finally, because the MIR so much closer to LLVM bitcode, the complexity of trans
is greatly reduced, and so it is easier to manage a more optimized translation.

6. Migrating away from LLVM is nearly impossible. It would be nice to
provide a choic of backends. Currently though this is infeasible,
since so much of the semantics of Rust itself are embedded in the
`trans` step which converts to LLVM IR. Under the MIR design, those
semantics are instead described in the translation from AST to MIR,
and the LLVM step itself simply applies optimizations.

# Detailed design

This is the bulk of the RFC. Explain the design in enough detail for somebody familiar
with the language to understand, and for somebody familiar with the compiler to implement.
This should get into specifics and corner-cases, and include examples of how the feature is used.
### Prototype

The MIR design being described here [has been prototyped][proto-crate]
and can be viewed in the `nikomatsakis` repository on github. In
particular, [the `repr` module][repr] defines the MIR representation,
and [the `build` module][build] contains the code to create a MIR
representation from an AST-like form.

For increased flexibility, as well as to make the code simpler, the
prototype is not coded directly against the compiler's AST, but rather
against an idealized representation defined by [the `HIR` trait][hir].
The `HIR` trait contains a number of opaque associated types for the
various aspects of the compiler. For example,
[the type `H::Expr`][hirexpr] represents an expression. In order to
find out what kind of expression it is, the `mirror` method is called,
which converts an `H::Expr` into an [`Expr<H>` mirror][expr]. This
mirror then contains [embedded `ExprRef<H>` nodes][exprref] to refer
to further subexpressions; these may either be mirrors themselves, or
else they may be additional `H::Expr` nodes. This allows the tree that
is exported to differ in small ways from the actual tree within the
compiler; the primary intention is to use this to model "adjustments"
like autoderef.

Note that the HIR mirroring system is an experiemnt and not really
part of the MIR itself. It does however present an interesting option
for (eventually) stabilizing access to the compiler's internals.

[proto-crate]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir
[repr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/repr.rs
[build]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir/build
[hir]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs
[hirexpr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L28
[mirror]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L32-L35
[expr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L111-L161
[exprref]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L163-L167

# Drawbacks

Expand Down
Loading

0 comments on commit e52ff7d

Please sign in to comment.