Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with upstream #226

Merged
merged 49 commits into from
Sep 14, 2024
Merged

Merge with upstream #226

merged 49 commits into from
Sep 14, 2024

Conversation

dhil
Copy link
Member

@dhil dhil commented Sep 13, 2024

No description provided.

alexcrichton and others added 30 commits August 30, 2024 15:27
)

Realized we don't actually talk about this in the documentation but it's
an important point, so make sure to document the interaction with
blocking host calls here.
* Upgrade to regalloc-0.9.4

* Update filetests

* Run `cargo vet`
* Reapply "Upgrade regalloc2 to 0.9.4 (bytecodealliance#9191)" (bytecodealliance#9193)

This reverts commit 7081b8f.

* Upgrade to regalloc-0.10.0
Additionally address some new warnings that are cropping up throughout
the codebase.
* Refactor backends to use the same `CallInfo`

This commit refactors the various backends of cranelift, except for
s390x, to use a shared definition of `CallInfo`. They were all already
quite similar and the main change here is to push platform-specific
pieces into the instructions outside of `CallInfo`. This is intended to
make additions to `CallInfo` easier and require less refactoring in the
future. Additionally this enables passing a `CallInfo` structure around
instead of passing around all of its components which helps reduce the
amount of arguments to various functions.

* s390x: Use the same `CallInfo` as other backends

This commit refactors s390x the same way as the previous commit to use
the shared `CallInfo` that all other backends are using. This required
more refactoring on the s390x side of things to notably extract a
dedicated pseudo-instruction for `ElfTlsGetOffset` rather than bundling
it within the `Call` instruction.

* Review comments and test fixes

* Fold `ExternalName` into `CallInfo`

As predicted instruction sizes got larger when outlining this on some
platforms so apply the same fix across all platforms by changing to
`CallInfo<T>` where the `T` will change depending on whether it's an
indirect or direct call.

* Update test expectations
…ance#9198)

The default token for this repository is now read-only but this workflow
needs writable permissions. Local testing seems to show that this should
work so let's try it out here.
This is coupled with today's release of Rust 1.81.0 and the policy of
updating our MSRV when that comes out.
…ytecodealliance#9196)

* Use the CallHook::CallingHost and ReturningFromHost with components

We never implemented calling the CallingHost and ReturningFromHost hooks
for component host calls.

co-authored-by: Nick Fitzgerald <[email protected]>

* Add tests

* Run hooks under `catch_unwind_and_longjmp`

* Cleanup the imports

* Remove todo

* Ensure that returning hooks are run

* Appease clippy

* Suggestions from code review

* Reuse infrastructure from the core-wasm call-hook test

* Remove redundant test

* Add a realloc test

* Switch the realloc test to handle returning a string

* Rework the realloc test to check that we're tracking host reallocs

* Use the call hook in the realloc host call

* Unnecessary pub mod

* Add a post-return test

* Remove unnecessary assertions

* Format

* Remove incorrect hook calls for realloc uses

---------

Co-authored-by: Nick Fitzgerald <[email protected]>
These docs have fallen a bit out of date so this is an attempt to bring
them back up to speed with the current organization of tests in this
repository.
* vtune: check that VTune profiling works

This adds a smoke test for checking that the VTune functionality still
works.
- if `vtune` is not on the system path, skip the test
- if it is, run Wasmtime with VTune profiling enabled
- check that it captured the CPU time and didn't fail.

This test doesn't rigorously check all the ins and outs of the VTune
functionality but is better than before.

* ci: install VTune

In order to actually run the VTune CLI test, we need `vtune` on the
system. The `install-vtune-action` I created does this ([script]).

[script]: https://github.com/abrown/install-vtune-action/blob/main/install.sh

* ci: further restrict workflow runs

Running the VTune installation and CLI test on any `ubuntu` runner meant
that even test runs for `riscv64` (emulated on QEMU) would attempt this.
Instead, restrict this test only to `linux-x64` runs.

prtest:full

* review: add `cfg(target_arch)`
…ce#9201)

This commit adds a special case to the lowering of 128-bit
multiplication on the aarch64 backend along the same lines as was done
in bytecodealliance#9136 for the x64 backend. Notably zero and sign-extended values
which are multiplied to produce a 128-bit result can skip some of the
arithmetic of the fully general 128-bit lowering.
…sent (bytecodealliance#9208)

* [wasi-http] Return `none` from `future-trailers.get` when trailers absent

Previously, we would return `error-code::connection-terminated` in this case,
whereas the `wasi-http` spec says we should simply return `none` (i.e. it's not
an error if there are no trailers present).

This also adds code to `api_proxy_streaming.rs` to check for trailers after
reading the body content.

Signed-off-by: Joel Dice <[email protected]>

* minor code cleanup

Signed-off-by: Joel Dice <[email protected]>

---------

Signed-off-by: Joel Dice <[email protected]>
bytecodealliance#9199)

* Remove `iadd_cin` and `isub_bin`, split `isub_borrow` and `iadd_carry`

This commit refactors the opcodes the Cranelift supports for
add-with-carry and subtract-with-borrow. None of these opcodes are
currently in use by the wasm frontend nor supported by any backend. In
that sense it's unlikely they have many users and the hope is that
refactoring won't cause much impact.

The `iadd_cin` and `isub_bin` opcodes are the equivalent of `*_borrow`
and `*_carry` except that they don't return the carry flag, they only
return the result of the operation. While theoretically useful I've
elected to remove them here in favor of only the borrow-returning
operations. They can be added back in in the future though if necessary.

I've split the preexisting operations `isub_borrow` and `iadd_carry`
additionally into signed/unsigned portions:

* `isub_borrow` => `usub_borrow` and `ssub_borrow`
* `iadd_carry` => `uadd_carry` and `sadd_carry`

This reflects how the condition needs to differ on the carry flag
computation for signed/unsigned inputs. I've additionally fixed the
interpreter's implementation of `IsubBorrow` when switching to the
signed/unsigned opcodes.

Finally the documentation for these instructions now explicitly say that
the incoming carry/borrow is zero-or-nonzero even though it's typed as
`i8`. Additionally the tests have been refactored to make use of
multi-return which may not have existed when they were first written.

* Rename instructions

* Fix more renames

* Update instruction descriptions
…alliance#9209)

* Warn against `clippy::cast_possible_truncation` in Wasmtime

This commit explicitly enables the `clippy::cast_possible_truncation`
lint in Clippy for just the `wasmtime::runtime` module. This does not
enable it for the entire workspace since it's a very noisy lint and in
general has a low signal value. For the domain that `wasmtime::runtime`
is working in, however, this is a much more useful lint. We in general
want to be very careful about casting between `usize`, `u32`, and `u64`
and the purpose of this module-targeted lint is to help with just that.
I was inspired to do this after reading over bytecodealliance#9206 where especially when
refactoring code and changing types I think it would be useful to have
locations flagged as "truncation may happen here" which previously
weren't truncating.

The failure mode for this lint is that panics might be introduced where
truncation is explicitly intended. Most of the time though this isn't
actually desired so the more practical consequence of this lint is to
probably slow down wasmtime ever so slightly and bloat it ever so
slightly by having a few more checks in a few places. This is likely
best addressed in a more comprehensive manner, however, rather than
specifically for just this one case. This problem isn't unique to just
casts, but to many other forms of `.unwrap()` for example.

* Fix some casts in tests
Notably `*mut [T]` now sports a `len` method to avoid the need for an
`unsafe` block to read the length field.
* Update wit-bindgen crates

Prunes some older versions of wasm-tools from the tree

* Update vets
…s across await points (bytecodealliance#9217)

* Use `tracing::Instrument` in generated bindings to avoid holding spans
across await points

Previously, if we had a WIT file with a function

async-fn: func();

then the generated code in `add_to_linker_get_host` would look like
this (if tracing is enabled and the function is async)

inst.func_wrap_async(
    "my-func",
    move |mut caller: wasmtime::StoreContextMut<'_, T>, (): ()| {
        wasmtime::component::__internal::Box::new(async move {
            let span = tracing::span!(
                tracing::Level::TRACE,
                "wit-bindgen import",
                module = "test",
                function = "my-func",
            );
            let _enter = span.enter();
            tracing::event!(tracing::Level::TRACE, "call");
            let host = &mut host_getter(caller.data_mut());
            let r = Host::my_func(host).await;
            tracing::event!(
                tracing::Level::TRACE,
                result = tracing::field::debug(&r),
                "return"
            );
            Ok(r)
        })
    },
)?;

This keeps the tracing span active, even when the resulting future is
suspended. The end result is that other unrelated tokio tasks running on
the same thread in the meantime will be shown as executing within the
`wit-bindgen import` span.

This commit changes the codegen to instead generate

inst.func_wrap_async(
    "async-fn",
    move |mut caller: wasmtime::StoreContextMut<'_, T>, (): ()| {
        use tracing::Instrument;

        let span = tracing::span!(
            tracing::Level::TRACE,
            "wit-bindgen import",
            module = "test",
            function = "async-fn",
        );
        wasmtime::component::__internal::Box::new(
            async move {
                tracing::event!(tracing::Level::TRACE, "call");
                let host = &mut host_getter(caller.data_mut());
                let r = Host::async_fn(host).await;
                tracing::event!(
                    tracing::Level::TRACE,
                    result = tracing::field::debug(&r),
                    "return"
                );
                Ok(r)
            }
            .instrument(span),
        )
    },
)?;

Here, `tracing::Instrument` takes care of entering the span when the
future is polled and exiting it when it is suspended.

Fixes bytecodealliance#9210

* Bless expanded macro outputs
* Update wasm-tools to 217

Brings a few fixes and such from recent PRs and additionally some new
`*.wast` directives. These aren't supported yet and will come in a
future commit.

* Remove temporary `[patch]`

* Add vets
…e#9223)

* Implement new `*.wast` directives in `wasmtime-wast`

This is a follow-up from bytecodealliance#9219 to add support for a few more directives
in `*.wast` tests to notably compile a module and possibly instantiate
it multiple times.

This copies the upstream spec test using these new directives and edits
it to remove the unsupported exception-handling bits.

* Fix tests to test the right thing

* Fix a test to use `assert_unlinkable`

* Add a component/instance test for components too
See the comment in the source file, we can't have FORTIFY_SOURCE enabled for longjmp code, so undefine it.

See the source for longjmp_chk in glibc, this assertion is fired: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/____longjmp_chk.c;hb=HEAD#l54

Here is a similar commit in glibc tests: https://sourceware.org/git/?p=glibc.git;a=commit;h=46432be2f1d4de962b51ca6b9f80fc37744be9f7
…dealliance#9226)

An OSS-Fuzz bug came in last night about `stack_switch` not being
supported on s390x, understandably so. This commit disables the
instruction entirely in fuzzing because it requires precise runtime
support which the fuzzer doesn't support at this time.
…iance#9206)

This commit implements the table64 extention in both Wasmtime and
Cranelift.

Most of the work was changing a bunch of u32 values to u64/usize.
The decisions were made in align with the PR bytecodealliance#3153 which
implemented the memory64 propsal itself.

One significant change was the introduction of `IndexType`
and `Limits` which streamline and unify the handling of limits
for both memories and tables.

The spec and fuzzing tests related to table64 are re-enabled which
provides a good coverage of the feature.
* Bump regalloc2 to 0.10.2

* Audit rustc-hash-2.0.0, and regalloc2-0.10.2

* Switch to rustc-hash-2.0.0
* winch aarch64: fix multi values results

* winch aarch64: test multi values results

* add some comments

* cranelift aarch64 abi: is_winch_return variable

* cranelift aarch64 abi: factorize winch stack reversal
…nce#9215)

* Add some tests that will get optimized next

These tests currently reflect the output of Cranelift today and will
showcase the diff in the next commit of the optimizations implemented.

* Optimize 128-bit multiplication some more in backends

This commits adds support for more patterns of improved 128-bit
multiplication to the various backends of Cranelift. Notably:

* Using `isplit` to discard the lower bits of a multiplication maps
  directly to instructions on aarch64 and riscv64.
* Multiplying sign-extended 64-bit halves maps directly to supported
  instructions on s390x/riscv64/aarch64 (x86_64 already has these
  implemented).

This relies on adding a new method to test whether a `Value` is dead and
unused during lowering. While generally not useful this is applicable
for multi-result instructions such as `isplit` where one result may end
up not being used. This also is required because the egraph
optimizations can't currently handle multi-result instructions like
`isplit` so this can't be optimized to `umulhi` or `smulhi` in CLIF.
* Rename:
LastWrite -> WriteState,
Done -> Ready,
Waiting -> Writing

* Wrap the TCP streams inside a mutex.
And rename `abort_wait` to `cancel`

* Move shutdown into the stream types

* Fix Linux-specific peculiarity where `recv` continues to work even after `shutdown(sock, SHUT_RD)` has been called.

* Refactor poll_cancel

* Fix data loss issue.
Calling `shutdown` _after_ data has been accepted by `write`, but _before_ it has been fully flushed, caused the in-flight data to be lost.

* Simplify poll_cancel

* Use tokio::sync::Mutex instead of std::sync::Mutex so I can revert to regular async code
Last night's OSS-Fuzz build failed because the compiler used there
hasn't been updated in quite some time. It's a nightly of 1.78 from the
beginning of this year which means the bump to 1.79 recently for this
workspace meant that Wasmtime stopped building there.

This commit reverts ac607ed and
1ad3da8 and then updates our github
action to say that we're 3 versions behind the latest Rust right now
with some commentary as to why.
alexcrichton and others added 19 commits September 12, 2024 18:52
…9232)

* Add a `Signed` trait to match the `Unsigned` trait

I've wanted to use this every now and then but haven't gotten around to
adding this so figured I'd go ahead and add it.

* Update cranelift/entity/src/signed.rs

Co-authored-by: Nick Fitzgerald <[email protected]>

---------

Co-authored-by: Nick Fitzgerald <[email protected]>
…ytecodealliance#9235)

* Pulley: Add 64-bit variants of compare-and-branch super instructions

* Fix Pulley fuzz build

* Fix Pulley's disas tests

* Fix feature `cfg` in pulley test

* Update pulley's clif filetests
This was beginning to reference some changes that were ultimately thrown away,
so I've mostly just reverted the incomplete bits but also slightly reworded the
old comment.

No functional changes.
…9240)

* Wasmtime: Allow compiling Wasm modules with Pulley

This does not yet add support for running Wasm modules compiled with
Pulley, but it does allow compiling them. This is a first step towards
integrating Pulley into Wasmtime itself.

This is also enough to get basic `tests/disas` tests working with Pulley.

* fix clippy

* Use named struct instead of tuple
…e#9242)

* CI: Split no-std and clippy out from monolith checks

Fixes bytecodealliance#9236

* Make CI status job depend on new CI jobs
* Make host trap handlers optional

This commit introduces a new configuration option
`Config::host_trap_handlers` which enables disabling the reliance on
host trap handlers (e.g. signal handlers on Unix) at runtime. This
is intended to help increase portability in cases where signal handlers
are otherwise difficult. This is achieved by plumbing the translation
environment to more locations which now conditionally lowers to calls to
a function to raise a trap instead of a trap instruction.

The caveats of this support are:

* This is not yet implemented for Winch
* This requires disabling spectre mitigations
* This does not yet support shared memories since it forces dynamic
  memories to be used.

These points are all possible to address but it might be best to see how
this feature evolves over time. I'll also note that this is not an
optimized implementation and likely has a fair bit of overhead. For
example the solution in bytecodealliance#6926 of a more optimized stub to jump-and-trap
is not implemented yet.

Closes bytecodealliance#6926

* Fix build

* Fix clippy warnings

* Review comments

* Ignore new test on miri
All of the `cargo check`-style jobs are always run on both PRs and merge
so there's no need to gate on the `determine` step. That's only needed
when the output of `determine` conditionally decides to run a job or
not, but these are always unconditionally run. This will help a tiny bit
with CI time where the `determine` job doesn't need to run, but
otherwise cuts down a bit on the amount of configuration.
)

* Use inline GC barriers for accessing Wasm globals

Previously, we would call out to the runtime via libcalls to get and set Wasm
globals of types that are managed by the GC. This change introduces inline
barriers for these operations, which should improve performance, and removes
unnecessary libcall implementations.

Co-Authored-By: Trevor Elliott <[email protected]>

* Update other disas tests for new vmctx offsets

* really update disas tests for new vmctx offsets

* Remove unused functions when GC is disabled

* remove unused import

---------

Co-authored-by: Trevor Elliott <[email protected]>
Used to be used for testing, now we use the disas test infrastructure instead,
but this file was still hanging around.

Co-authored-by: Trevor Elliott <[email protected]>
…iron` (bytecodealliance#9246)

* Split GC type layout computation into a shared trait in `wasmtime_environ`

This stuff was previously living in the `GcRuntime` trait, but it turns out that the
compiler will also need to know the layout of GC types, who'd have thunk it.

* Fix disabled GC builds
This commit adds backend support for the `mulx` instruction in the BMI2
instruction set which supports arbitrary destination registers. This
instruction also doesn't read/clobber flags so it can help pipeline
instructions in niche situations.
@dhil dhil merged commit 6c9e9f8 into wasmfx:main Sep 14, 2024
54 checks passed
@dhil dhil deleted the wasmfx-merge branch September 14, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.