Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor HIR to support structured control flow, signless integral types #291

Open
bitwalker opened this issue Aug 30, 2024 · 0 comments
Open
Assignees
Labels
blocker This issue is one of our top priorities
Milestone

Comments

@bitwalker
Copy link
Contributor

Improvements to the IR would solve a handful of outstanding issues, and also pave the way for much more optimal codegen than we can currently produce.

Signless Integral Types

Signless integral types, with explicit signed operations, is a more natural translation of Wasm semantics, that would allow us to omit a lot of unnecessary assertions that are emitted due to the semantics of our IR requiring stronger type safety than Wasm provides. This is a major priority for generation of efficient programs.

Structured Control Flow

Our current HIR representation is a hybrid CFG and dataflow graph in SSA form, with only unstructured control flow primitives. Both WebAssembly and Miden Assembly only have structured control flow, and so in the process of translating through our IR, we lower structured control flow to unstructured, then have to lift it back again. This is ripe for introducing bugs, and indeed that has been the case. The treeify pass we use to simplify CFGs for lowering to MASM has been a regular source of issues, and there is at least one major outstanding bug that I aim to fix with this refactoring.

To facilitate structured control flow in the IR, my plan is to refactor the IR into something a bit more RVSDG (Regionalized Value-State Dependency Graph) shaped, in the spirit of MLIR. The primary difference is that operations in the IR can have regions, which have one or more blocks, which in turn contain operations. Regions can have operands, which correspond to the block arguments of the entry block of that region; and they can have results. An example of a region is a function body - the function parameters are the "operands" of the region, it has one or more blocks, in which instructions representing primitive operations are sequenced, and are terminated with control flow operations.

A more pertinent example for our purposes, is if/then/else conditional branches, where each branch consists of a region (which have no operands), consisting of a single block. You can nest these arbitrarily deep. This has the effect of making control flow hierarchical and structured, ideal for lowering to Miden Assembly. The other structured operation we'd implement is a do/while loop construct, consisting of two regions, before and after. The before region has operands that represent loop-carried state, and is responsible for returning a condition that signals whether to transfer control out of the loop, or continue it. The after region acts as the loop body, and must yield operands to the before region on each iteration. Depending on how you represent the two regions, you can represent both do/while and while style loops.

This representational change will allow us to significantly simplify parts of the compilation pipeline, and emit more efficient code than we do today.

Additional Extensions

A few other items I'd like to set up infra for as part of this:

  • Make the set of IR operations open, rather than closed, and use traits/interfaces to abstract over families of operations (e.g. binary operators)
  • Make canonicalization a core part of defining an operation. Canonicalization would then be run as part of the pipeline, and allow all subsequent passes to rely on canonical forms for optimization/analysis.
  • Implement folding as part of the operation interface, this will allow us to fold constants into operations for tighter codegen, and makes further analysis and optimization simpler.
  • Some basic conversion/legalization infrastructure. This is to ensure that when we go to lower to Miden Assembly, or lower from some initial phase of the pipeline to a later one, that we guarantee that all ops after that point are converted/legal forms. Right now we just panic if we hit an illegal form that we expected to have been converted earlier in the pipeline, which is not ideal.
  • Some basic pattern matching infrastructure. This would allow us to express transformations as rewrite rules over patterns. The primary reason for this is to be able to implement a set of peephole optimizations to emit more efficient code, but in a way that is open-ended and doesn't require us to implement individual passes for each transformation (or try to cram them all into one pass).

Roadmap

The initial refactoring for the two main goals I plan to implement during the Beta 1 milestone, as it will be necessary to address some outstanding bugs/blockers in the current pipeline. The nice-to-have extras will be implemented as-needed.

This has to happen in concert with packaging support, and implementation of annotations/types in the assembler, so I don't have a precise timeline for when this will be done, but I consider it my main priority for the next couple weeks. I've spiked the implementation of this in my bitwalker/ir-refactoring branch, and will continue iterating on it there until it is in a state where a PR can be opened.

@bitwalker bitwalker added the blocker This issue is one of our top priorities label Aug 30, 2024
@bitwalker bitwalker added this to the Beta 1 milestone Aug 30, 2024
@bitwalker bitwalker self-assigned this Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is one of our top priorities
Projects
None yet
Development

No branches or pull requests

1 participant