Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Compile wasmtime with effect-handlers/wasm-tools:func-ref-2 #3

Merged
merged 37 commits into from
Aug 5, 2022

Conversation

dhil
Copy link

@dhil dhil commented Aug 2, 2022

This patch makes wasmtime compile with branch func-ref-2 in our wasm-tools fork. There are many todos left to be completed.

The test suite compiles and passes.

Instead of a regular `Vec`.

These vectors are usually very small, for example here is the histogram of sizes
when running Sightglass's `pulldown-cmark` benchmark:

```
;; Number of samples = 10332
;; Min = 0
;; Max = 11
;;
;; Mean = 2.496128532713901
;; Standard deviation = 2.2859559855427243
;; Variance = 5.225594767838607
;;
;; Each ∎ is a count of 62
;;
 0 ..  1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  159 ]: ∎∎
 3 ..  4 [  838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
 4 ..  5 [  970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 5 ..  6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [  303 ]: ∎∎∎∎
 7 ..  8 [  272 ]: ∎∎∎∎
 8 ..  9 [   40 ]:
 9 .. 10 [   18 ]:
```

By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap
allocations and get some nice benchmark wins of up to ~1.11x faster compilation.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 340361395.90 ± 63384608.15 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [3101467423 3425524333.41 4060621653] main.so
  [2820915877 3085162937.51 3375167352] smallvec.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 988446098.59 ± 184075718.89 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [9006994951 9948091070.66 11792481990] main.so
  [8192243090 8959644972.07 9801848982] smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 7854567.87 ± 2215491.16 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [80354527 93864666.76 119789198] main.so
  [77554917 86010098.89 94726994] smallvec.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 22810509.85 ± 6434024.63 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [233358190 272593088.57 347880715] main.so
  [225227821 249782578.72 275097380] smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 10849521.41 ± 4324757.85 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [133875427 156859544.47 222455440] main.so
  [126073854 146010023.06 181611647] smallvec.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 31508176.97 ± 12559561.91 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [388788638 455536988.31 646034523] main.so
  [366132033 424028811.34 527419755] smallvec.so
```
@dhil dhil requested a review from CosineP August 2, 2022 22:56
These are always length 1 for Wasm benchmarks.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 328624015.86 ± 40274677.93 (confidence = 99%)

  main.so is 0.88x to 0.91x faster than slots-smallvec.so!
  slots-smallvec.so is 1.10x to 1.13x faster than main.so!

  [3070752447 3203778792.55 3446269274] main.so
  [2503544039 2875154776.69 3197966713] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 9685705.06 ± 3221286.87 (confidence = 99%)

  main.so is 0.91x to 0.96x faster than slots-smallvec.so!
  slots-smallvec.so is 1.05x to 1.09x faster than main.so!

  [129356493 145594942.79 165038803] main.so
  [118555011 135909237.73 188780619] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [79080493 86757564.46 112649639] main.so
  [78083384 85934125.69 94992743] slots-smallvec.so
```
dhil added a commit to dhil/wasm-tools that referenced this pull request Aug 3, 2022
This patch implements (most parts of) the frontend for the function
references proposal (currently we have not implemented initialised
locals). Moreover, the crates shrink, smith, and mutate have yet to
made compatible with the new features --- once the changeset has been
approved we can adjust those crates.

Boardly, the implementation of function references follows the
proposed spec closely, and as such, this patch introduces some
important changes to both the type structure and the validation
algorithm.

The structure of `ValType` is no longer a flat, instead, it contains a
new constructor `Ref(RefType)` which is parameterised by `RefType`,
which is a new struct type. This type is itself parameterised by a
`HeapType`, which is a new enumeration type. In addition, `ValType` is
extended with the constructor `Bot` to represent the special value
bottom type. It is special in the sense that it can only appear during
validation as there is not surface syntax for it (this is in
accordance with the spec). Concretely, the `ValType`, `RefType`, and
`HeapType` are now defined as follows.

```rust
pub enum ValType {
    /// The value type is i32.
    I32,
    /// The value type is i64.
    I64,
    /// The value type is f32.
    F32,
    /// The value type is f64.
    F64,
    /// The value type is v128.
    V128,
    /// The value type is a reference. Which type of reference is decided by
    /// RefType. This is a change in syntax from the function references proposal,
    /// which now provides FuncRef and ExternRef as sugar for the generic ref
    /// construct.
    Ref(RefType),
    /// Special bottom type.
    Bot,
}

pub struct RefType {
    /// Whether it's nullable
    pub nullable: bool,
    /// The relevant heap type
    pub heap_type: HeapType,
}

pub enum HeapType {
    /// It seems by example that u32s are directly used for arbitrary indexes,
    /// but maybe a higher-level structure like TypeRef is relevant here?
    Index(u32),
    /// From reference types
    Func,
    /// From reference types
    Extern,
    /// Special bottom heap type
    Bot,
}
```

The validation algorithm has been adapted to follow that of the
pseudocode algorithm in the appendix of the spec. We have made a few
administrative changes such as plumbing an instance of
`WasmModuleResources` around, because whenever a reference type has
heap type `Index(i)` we need to look up the defined type at index
`i`. The interface of `WasmModuleResources` has been extended with the
following new methods.

```rust
fn type_index_of_function(&self, func_idx: u32) -> Option<u32>;
fn matches(&self, t1: ValType, t2: ValType) -> bool;
fn check_value_type(
    &self,
    t: ValType,
    features: &WasmFeatures,
    offset: usize
) -> Result<(), BinaryReaderError>;
```

These methods are used during validation:

 * `type_index_of_function` is necessary to obtain the result type of `RefFunc`
 * `matches` implements the subtyping relation on types; it is used in
   both operand validation and in element validation. It may need to
   peek into context to retrieve function types.
 * `check_value_type` is again used all over and needs to be shared. It now
    needs access to the context because a value type with an invalid heap index
    (depending on the number of types in the context) is not valid

The validation procedure has obviously been extended with the new
instructions; the only previous instruction whose validation needed to
be updated is `Select`.

The validation helper methods has changed too.

```rust
fn pop_operand(
    &mut self,
    expected: Option<ValType>,
    resources: &impl WasmModuleResources,
) -> OperatorValidatorResult<ValType>

fn pop_ref(
    &mut self,
    resources: &impl WasmModuleResources,
) -> OperatorValidatorResult<RefType>
```

Comments and feedback are most welcome --- we have tried to follow the
structure of the code base to the best of our abilities, however, we
are not completely sure we have put the shared logic
(e.g. `type_index_of_function`, `matches`, and `check_value_type`) in
the right places. Also, help with getting the crates skrink, smith,
and mutate up to date would be much appreciated.

Additionally, we have another patch that makes wasmtime compile and
pass the testsuite (without function references) with our modified
wasm-tools (c.f. effect-handlers/wasmtime#3).

Co-authored-by: Daniel Hillerström <[email protected]>
Co-authored-by: cosine <[email protected]>
alexcrichton and others added 8 commits August 3, 2022 16:08
On oss-fuzz a test case has been found that executes 30k iterations of a
wasm trap which with a 60s timeout leaves 2ms for each invocation which
under fuzzing instrumentation is a bit of a stretch with a ~20x
slowdown. This commit places a limit on the number of inputs to the
fuzzer at 200 to keep it reasonably sized.
* Support CLI parameters for string encoding
* Fix `--skip-validate`
* Fix printing binary to stdout
This is a collection of some minor renamings, refactorings, sharing of
code, etc. This was all discovered during my addition of string support
to adapter functions and I figured it'd be best to frontload this and
land it ahead of the full patch since it's getting complex.
This commit aims to improve the readability of supporting the memory64
proposal in the `fact` adapter trampoline compiler. Previously there
were a few sprinkled blocks that used `if` to generate different
instructions inline, but as I've worked on support for strings this has
become pretty unwieldy as strings do far more memory manipulation than
other type conversions. A pattern that's easier to read is to have
small instruction helpers that take the pointer width as an argument and
internally dispatch to the correct instruction. This keeps the main
translation code branch-free and a bit easier to follow. Additionally
for more complicated branching logic it allows for deduplicating the
main translation path by having lots of little branches instead of one
large branch with everything duplicated on both halves.
x86 does not have dedicated instructions for scalar FMA, lower
to a libcall which seems to be what llvm does.
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.

Copyright (c) 2021, Arm Limited.
…dealliance#4585)

This adds support for StructArgument on s390x.  The ABI for this
platform requires that the address of the buffer holding the copy
of the struct argument is passed from caller to callee as hidden
pointer, using a register or overflow stack slot.

To implement this, I've added an optional "pointer" filed to
ABIArg::StructArg, and code to handle the pointer both in common
abi_impl code and the s390x back-end.

One notable change necessary to make this work involved the
"copy_to_arg_order" mechanism.  Currently, for struct args
we only need to copy the data (and that need to happen before
setting up any other args), while for non-struct args we only
need to set up the appropriate registers or stack slots.
This order is ensured by sorting the arguments appropriately
into a "copy_to_arg_order" list.

However, for struct args with explicit pointers we need to *both*
copy the data (again, before everything else), *and* set up a
register or stack slot.  Since we now need to touch the argument
twice, we cannot solve the ordering problem by a simple sort.
Instead, the abi_impl common code now provided *two* callbacks,
emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects
the back end to first call copy..to_buffer for all args, and
then call copy.._to_arg for all args.  This required updates
to all back ends.

In the s390x back end, in addition to the new ABI code, I'm now
adding code to actually copy the struct data, using the MVC
instruction (for small buffers) or a memcpy libcall (for larger
buffers).  This also requires a bit of new infrastructure:
- MVC is the first memory-to-memory instruction we use, which
  needed a bit of memory argument tweaking
- We also need to set up the infrastructure to emit libcalls.

(This implements the first half of issue bytecodealliance#4565.)
Copy link
Collaborator

@CosineP CosineP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and looks like the tests pass too? Cool. Alright looks like we have our work cut out for us :)

crates/cranelift/src/func_environ.rs Outdated Show resolved Hide resolved
crates/cranelift/src/lib.rs Show resolved Hide resolved
crates/environ/src/module.rs Outdated Show resolved Hide resolved
crates/runtime/src/table.rs Show resolved Hide resolved
crates/types/src/lib.rs Outdated Show resolved Hide resolved
let externref_returns_count = returns
.iter()
.filter(|r| **r == WasmType::ExternRef)
.filter(|r| match **r { WasmType::Ref(WasmRefType { heap_type: WasmHeapType::Extern, .. }) => true, _ => false })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which do non-nullable extern refs exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/wasmtime/src/externals.rs Outdated Show resolved Hide resolved
@@ -321,7 +321,7 @@ unsafe impl WasmTy for Option<ExternRef> {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does ExternRef come from here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/wasmtime/src/types.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/types.rs Show resolved Hide resolved
jameysharp and others added 15 commits August 4, 2022 00:06
Includes a modest improvement in memory usage and performance by
removing analysis that was only used during fuzzing.
…liance#4601)

* Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts

* Save exit Wasm FP and PC in component-to-host trampolines

Fixes bytecodealliance#4535

* Add comment about why we deref the trampoline's FP

* Update some tests to use new `vmruntime_limits_*` methods
* Port `Shuffle` to ISLE (AArch64)

Ported the existing implementation of `Shuffle` for AArch64 to ISLE.

Copyright (c) 2022 Arm Limited

* Cleanup by shadowing `rn`, `rn2`, and `_`

Copyright (c) 2022 Arm Limited
)

* Add `try_use_var` method to `cranelift-frontend`.
- Unlike `use_var`, this method does not panic if the variable has not been defined
before use

* Add `try_declare_var` and `try_def_var`.
- Also implement Error for error enums.

* Use `write!` macro.

* Add `write!` use I missed.
This addresses bytecodealliance#4307.

For the static API we generate 100 arbitrary test cases at build time, each of
which includes 0-5 parameter types, a result type, and a WAT fragment containing
an imported function and an exported function.  The exported function calls the
imported function, which is implemented by the host.  At runtime, the fuzz test
selects a test case at random and feeds it zero or more sets of arbitrary
parameters and results, checking that values which flow host-to-guest and
guest-to-host make the transition unchanged.

The fuzz test for the dynamic API follows a similar pattern, the only difference
being that test cases are generated at runtime.

Signed-off-by: Joel Dice <[email protected]>
This adds full i128 support to the s390x target, including new filetests
and enabling the existing i128 runtest on s390x.

The ABI requires that i128 is passed and returned via implicit pointer,
but the front end still generates direct i128 types in call.  This means
we have to implement ABI support to implicitly convert i128 types to
pointers when passing arguments.

To do so, we add a new variant ABIArg::ImplicitArg.  This acts like
StructArg, except that the value type is the actual target type,
not a pointer type.  The required conversions have to be inserted
in the prologue and at function call sites.

Note that when dereferencing the implicit pointer in the prologue,
we may require a temp register: the pointer may be passed on the
stack so it needs to be loaded first, but the value register may
be in the wrong class for pointer values.  In this case, we use
the "stack limit" register, which should be available at this
point in the prologue.

For return values, we use a mechanism similar to the one used for
supporting multiple return values in the Wasmtime ABI.  The only
difference is that the hidden pointer to the return buffer must
be the *first*, not last, argument in this case.

(This implements the second half of issue bytecodealliance#4565.)
)

* Add a dataflow-based representation of components

This commit updates the inlining phase of compiling a component to
creating a dataflow-based representation of a component instead of
creating a final `Component` with a linear list of initializers. This
dataflow graph is then linearized in a final step to create the actual
final `Component`.

The motivation for this commit stems primarily from my work implementing
strings in fused adapters. In doing this my plan is to defer most
low-level transcoding to the host itself rather than implementing that
in the core wasm adapter modules. This means that small
cranelift-generated trampolines will be used for adapter modules to call
which then call "transcoding libcalls". The cranelift-generated
trampolines will get raw pointers into linear memory and pass those to
the libcall which core wasm doesn't have access to when passing
arguments to an import.

Implementing this with the previous representation of a `Component` was
becoming too tricky to bear. The initialization of a transcoder needed
to happen at just the right time: before the adapter module which needed
it was instantiated but after the linear memories referenced had been
extracted into the `VMComponentContext`. The difficulty here is further
compounded by the current adapter module injection pass already being
quite complicated. Adapter modules are already renumbering the index
space of runtime instances and shuffling items around in the
`GlobalInitializer` list. Perhaps the worst part of this was that
memories could already be referenced by host function imports or exports
to the host, and if adapters referenced the same memory it shouldn't be
referenced twice in the component. This meant that `ExtractMemory`
initializers ideally needed to be shuffled around in the initializer
list to happen as early as possible instead of wherever they happened to
show up during translation.

Overall I did my best to implement the transcoders but everything always
came up short. I have decided to throw my hands up in the air and try a
completely different approach to this, namely the dataflow-based
representation in this commit. This makes it much easier to edit the
component after initial translation for injection of adapters, injection
of transcoders, adding dependencies on possibly-already-existing items,
etc. The adapter module partitioning pass in this commit was greatly
simplified to something which I believe is functionally equivalent but
is probably an order of magnitude easier to understand.

The biggest downside of this representation I believe is having a
duplicate representation of a component. The `component::info` was
largely duplicated into the `component::dfg` module in this commit.
Personally though I think this is a more appropriate tradeoff than
before because it's very easy to reason about "convert representation A
to B" code whereas it was very difficult to reason about shuffling
around `GlobalInitializer` items in optimal fashions. This may also have
a cost at compile-time in terms of shuffling data around, but my hope is
that we have lots of other low-hanging fruit to optimize if it ever
comes to that which allows keeping this easier-to-understand
representation.

Finally, to reiterate, the final representation of components is not
changed by this PR. To the runtime internals everything is still the
same.

* Fix compile of factc
…lliance#4604)

* components: ignore export aliases to types in translation.

Currently, translation is ignoring type exports from components during
translation by skipping over them before adding them to the exports map.

If a component instantiates an inner component and aliases a type export of
that instance, it will cause wasmtime to panic with a failure to find the
export in the exports map.

The fix is to add a representation for exported types to the map that is simply
ignored when encountered. This also makes it easier to track places where we
would have to support type exports in translation in the future.

* Keep type information for type exports.

This commit keeps the type information for type exports so that types can be
properly aliased from an instance export and thereby adjusting the type index
space accordingly.

* Add a simple test case for type exports for the component model.
Since we do not have an instruction for this, this is a simple
open-coded implementation.

Needed by the cg_clif frontend.
When an adapter module depends on a particular core wasm instance this
means that it actually depends on not only that instance but all prior
core wasm instances as well. This is because core wasm instances must be
instantiated in the specified order within a component and that cannot
change depending on the dataflow between adapters. This commit fixes a
possible panic from linearizing the component dfg where an adapter
module tried to depend on an instance that hadn't been instantiated yet
because the ordering dependency between core wasm instances hadn't been
modeled.
@dhil dhil merged commit f79de60 into effect-handlers:func-ref Aug 5, 2022
dhil pushed a commit that referenced this pull request Apr 10, 2023
* Integrate experimental HTTP into wasmtime.

* Reset Cargo.lock

* Switch to bail!, plumb options partially.

* Implement timeouts.

* Remove generated files & wasm, add Makefile

* Remove generated code textfile

* Update crates/wasi-http/Cargo.toml

Co-authored-by: Eduardo de Moura Rodrigues <[email protected]>

* Update crates/wasi-http/Cargo.toml

Co-authored-by: Eduardo de Moura Rodrigues <[email protected]>

* Extract streams from request/response.

* Fix read for len < buffer length.

* Formatting.

* types impl: swap todos for traps

* streams_impl: idioms, and swap todos for traps

* component impl: idioms, swap all unwraps for traps, swap all todos for traps

* http impl: idiom

* Remove an unnecessary mut.

* Remove an unsupported function.

* Switch to the tokio runtime for the HTTP request.

* Add a rust example.

* Update to latest wit definition

* Remove example code.

* wip: start writing a http test...

* finish writing the outbound request example

havent executed it yet

* better debug output

* wasi-http: some stubs required for rust rewrite of the example

* add wasi_http tests to test-programs

* CI: run the http tests

* Fix some warnings.

* bump new deps to latest releases (#3)

* Add tests for wasi-http to test-programs (#2)

* wip: start writing a http test...

* finish writing the outbound request example

havent executed it yet

* better debug output

* wasi-http: some stubs required for rust rewrite of the example

* add wasi_http tests to test-programs

* CI: run the http tests

* bump new deps to latest releases

h2 0.3.16
http 0.2.9
mio 0.8.6
openssl 0.10.48
openssl-sys 0.9.83
tokio 1.26.0

---------

Co-authored-by: Brendan Burns <[email protected]>

* Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs

* Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs

* Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs

* wasi-http: fix cargo.toml file and publish script to work together (#4)

unfortunately, the publish script doesn't use a proper toml parser (in
order to not have any dependencies), so the whitespace has to be the
trivial expected case.

then, add wasi-http to the list of crates to publish.

* Update crates/test-programs/build.rs

* Switch to rustls

* Cleanups.

* Merge switch to rustls.

* Formatting

* Remove libssl install

* Fix tests.

* Rename wasi-http -> wasmtime-wasi-http

* prtest:full

Conditionalize TLS on riscv64gc.

* prtest:full

Fix formatting, also disable tls on s390x

* prtest:full

Add a path parameter to wit-bindgen, remove symlink.

* prtest:full

Fix tests for places where SSL isn't supported.

* Update crates/wasi-http/Cargo.toml

---------

Co-authored-by: Eduardo de Moura Rodrigues <[email protected]>
Co-authored-by: Pat Hickey <[email protected]>
Co-authored-by: Pat Hickey <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.