Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement rodata initialization #260

Closed
wants to merge 20 commits into from
Closed

Conversation

bitwalker
Copy link
Contributor

@bitwalker bitwalker commented Jul 26, 2024

NOTE: This PR is a work in progress, and blends multiple issues together as I work through implementation and testing of both rodata init, as well as some related issues in our test suite.

A rough summary of what this contains:

  • Refactor the compiler driver to support multiple inputs
  • Refactor the compiler pipeline to support a mix of MASM and HIR inputs
  • Refactor the compiler pipeline to always output a midenc_codegen_masm::Program
  • Move responsibility for assembly of a miden_core::Program into midenc_codegen_masm::Program::assemble
  • Implement support for generating two different types of extra setup code for executable programs:
    • Data segment initialization on program start, before invoking the entrypoint
    • Test harness, which allows initializing memory of the program from the advice stack on startup. This is primarily intended to allow initializing shadow stack memory such that a Rust-compiled function can be called as if the caller is itself a Rust function, which is useful for testing, and necessary for testing non-immediate inputs.
  • Implement a richer Miden VM test executor, which allows initializing advice inputs in addition to the operand stack. It also captures much more information about the program under test, and can dump a variety of useful information when errors occur.
  • Make use of the fact that we parse DWARF debug info when parsing Wasm modules, to allow reifying actual source spans during translation to HIR. This depends on having DWARF debug info in the Wasm module, as well as the source paths contained in the Wasm, actually mapping to a file on disk, so that we can read it into memory in order to compute byte offsets (WIP)
  • Fix the blake3 and get_inputs tests in the integration test suite (WIP, trying to get debug info working so that it is easier to determine where an assertion is being raised when the test fails; would also like to test that the test harness itself is set up properly).

This PR will remain a draft until the blake3 and get_inputs tests are working, at which point I will try to rework the commits in this PR into more granular changes. At this point, there are debugging statements and other cruft present throughout the code, and the test suite may be broken due to some of the changes not being handled yet.

@bitwalker bitwalker added bug Something isn't working codegen blocker This issue is one of our top priorities labels Jul 26, 2024
@bitwalker bitwalker requested a review from greenhat July 26, 2024 04:15
@bitwalker bitwalker self-assigned this Jul 26, 2024
This commit introduces a new `CompilerTestBuilder` type, which is used
to configure compiler tests before they are instantiated. This lets us
build up the configuration that will be used to generate compiler inputs
in multiple stages, and only instantiate it once we wish to run the
test. Prior to this, we mixed these stages together, which made it
difficult to configure the input generation step in more than one place,
which is a problem in more complex tests, or edge case tests which do
not neatly fit the mold of one of the default configurations.

In addition to `CompilerTestBuilder`, there are also two configuration
types introduced, `CargoTest` and `RustcTest`, which configure tests
based on `cargo` or `rustc` outputs (as inputs to `midenc`). These are
used in conjunction with `CompilerTestInputType` to instantiate a
`CompilerTestBuilder` for one of the three input types currently used
for compiler testing.
Copy link
Contributor

@greenhat greenhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think intrinsics::mem::store_dw silently doing nothing might cause troubles. See

export.store_dw # [waddr, index, offset, value]
# TODO: implement
# cleanup the operand stack
dropw
end

There are a few of them in both blake3 and get_inputs tests and I'm not so sure anymore that we're not hitting them on the happy path.
I suggest to crash(via assert) there until I have it implemented.

@bitwalker
Copy link
Contributor Author

@greenhat Yeah I've been suspicious that we're hitting it and not aware of it, and unfortunately the way I stubbed out that intrinsic doesn't help! Good call on making it assert, I'll give it a unique error code too, so we'll know for sure if we hit it.

I've got the last debug info PR for the VM in review now, so I'm going to get back to this today and try to get it wrapped up now that I can actually see what's going on.

* Diagnostics infra is now implemented in miden-core
* Unified source code abstraction and source spans
* Compiler still retains the `DiagnosticsHandler`, with some minor
  improvements, but `CodeMap` is replaced with `SourceManager`
* Use unified spans to emit precise debug information from the compiler
* Use precise information in assembler to produce debug info useful for
  high-level languages
* Numerous cleanups along the way to try and tidy up the way we
  represent and use diagnostics throughout the compiler
…ents

Previously, the Br, CondBr, and Switch branch instructions all used
slightly different representations for their successor information, for
no real purpose. This commit unifies these, as well as provides a more
uniform API (e.g. `analyze_branch`'s `BranchInfo` is also unified in the
same manner).

The Switch instruction type is also enhanced with the ability to pass
block arguments to successor blocks in each arm, as well as the default
case. These statements will be lowered using a binary search approach on
the discriminant value, so as to lower to the least number of
conditional branches possible. Branch weights would be exceedingly
useful here, but we don't have any meaningful way to obtain that
information from Rust currently.

To better support the new Switch API, the `switch` builder has been
updated to use a specialized builder that guarantees that the `Switch`
is well-formed.

The Wasm frontend has had its lowering reworked to make use of the new
builder, and take advantage of the support for successor arguments.
Copy link
Contributor

@greenhat greenhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Just a couple of minor notes.

frontend-wasm/src/module/module_env.rs Show resolved Hide resolved
stdlib::mem::PIPE_DOUBLE_WORDS_TO_MEMORY => return TransformStrategy::ReturnViaPointer,
_ => (),
},
"std::crypto::hashes::blake3" => match function_id {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"std::crypto::hashes::blake3" string literal is also used in stdlib::crypto::hashes module. It'd be better to define in one place.

@greenhat
Copy link
Contributor

greenhat commented Aug 6, 2024

I'm taking a closer look into the failing tests now. From the get-go, I see that some expected tests are failing, so UPDATE_EXPECT=1 should take care of those issues, and we will see what's failing next.

@greenhat
Copy link
Contributor

greenhat commented Aug 6, 2024

After fixing the expected code tests I'm seeing a bunch of failures:

  • instructions::*_i64 due to stdlib is missing;
  • all Wasm CM tests due to outdated cargo-component (fixed in my chore: add test coverage reporting job to CI #262 which I'll split update the handling of the new rustc warning for dead code ASAP);
  • intrinsics::* due to MASM artifact being a library and not an executable;

I'm preparing a PR to fix the "not an executable" error.

@greenhat
Copy link
Contributor

greenhat commented Aug 6, 2024

#271 is ready to merge

@bitwalker bitwalker closed this Aug 6, 2024
@bitwalker bitwalker mentioned this pull request Aug 6, 2024
@bitwalker bitwalker deleted the bitwalker/rodata-init branch September 6, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is one of our top priorities bug Something isn't working codegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants