Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to avoid syscalls during Function::call? #1922

Closed
ul opened this issue Dec 12, 2020 · 5 comments
Closed

Is it possible to avoid syscalls during Function::call? #1922

ul opened this issue Dec 12, 2020 · 5 comments
Labels
📦 lib-vm About wasmer-vm ❓ question I've a question!

Comments

@ul
Copy link

ul commented Dec 12, 2020

Summary

This question is in the context of using WASM for realtime audio synthesis. I invoke Function::call within an audio thread 48000 times per second. In the profiler, sigprogmask and sigaltstack get a noticeable share of CPU time (see attached screenshot). While overhead itself is more or less acceptable as it stays constant with the growth of the executed WASM code, the unpredictability of the latency incurred by syscalls is bothersome for my use-case, I get a high variation in the trace with some calls taking longer than my per-frame budget. So my question is as follows: is it possible to implement Function::call without the need for syscalls?

Additional details

Screen Shot 2020-12-12 at 15 42 15

@ul ul added the ❓ question I've a question! label Dec 12, 2020
@syrusakbary
Copy link
Member

As functions can fail, we need to wrap them so if they fail we catch the traps properly.
However, this "catch" mechanism is done each time per function call and can indeed be optimized.

A few months ago we were thinking on creating an unsafe function call_unchecked so the user can catch errors themselves manually (rather than having Wasmer do it for them).
In your use case that means that you will choose how to catch the error. And you could save the 918ms spent on the traphandlers code.

How this would be implemented?

We would need a function similar to:

// Similar to wasmer_call_trampoline, but without catch_traps
pub unsafe fn wasmer_call_trampoline_unchecked(
    vmctx: VMFunctionEnvironment,
    trampoline: VMTrampoline,
    callee: *const VMFunctionBody,
    values_vec: *mut u8,
) -> Result<(), Trap> {
    mem::transmute::<_, extern "C" fn(VMFunctionEnvironment, *const VMFunctionBody, *mut u8)>(
        trampoline,
    )(vmctx, callee, values_vec)
}

And then the user would need to do something similar to:

catch_traps(vmctx, || {
    for i in 1..48000 {
        unsafe {
            func.call_unchecked(...);
        }
    }
}

This way, catch_traps will only be called once rather than 48,000 times and our code will still be safe to execute (returning any error when it happens.

@ul
Copy link
Author

ul commented Dec 12, 2020

Thank you for the explanation @syrusakbary! Indeed it would be nice to have such flexible API for traps to allow for some optimisations in cases like mine.

@syrusakbary
Copy link
Member

Hi @ul,

I just created #2102. It should address your main question as it makes Function calls an order of magnitude faster.

bors bot added a commit that referenced this issue Feb 9, 2021
2102: Use platform setjmp/longjmp to optimize function calls r=syrusakbary a=syrusakbary

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description

Use platform setjmp/longjmp when possible to optimize function calls.
This PR fixes #1922.

This improves timings from:
```
Benchmarking basic static func llvm: Collecting 100 samples in estimated 5.0004                                                                                 basic static func llvm  time:   [131.99 ns 132.31 ns 132.66 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking basic static func with many args llvm: Collecting 100 samples in es                                                                                basic static func with many args llvm
                        time:   [140.14 ns 140.55 ns 140.94 ns]

Benchmarking basic static func cranelift: Collecting 100 samples in estimated 5.                                                                                basic static func cranelift
                        time:   [133.51 ns 133.81 ns 134.09 ns]
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

Benchmarking basic static func with many args cranelift: Collecting 100 samples                                                                                 basic static func with many args cranelift
                        time:   [144.17 ns 145.04 ns 146.01 ns]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking basic dynfunc llvm: Collecting 100 samples in estimated 5.0012 s (2                                                                                basic dynfunc llvm      time:   [228.77 ns 229.59 ns 230.35 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking basic dynfunc with many args llvm: Collecting 100 samples in estima                                                                                basic dynfunc with many args llvm
                        time:   [277.33 ns 279.09 ns 280.91 ns]

Benchmarking basic dynfunc cranelift: Collecting 100 samples in estimated 5.0008                                                                                basic dynfunc cranelift time:   [229.38 ns 230.38 ns 231.43 ns]
Found 15 outliers among 100 measurements (15.00%)
  14 (14.00%) high mild
  1 (1.00%) high severe

Benchmarking basic dynfunc with many args cranelift: Collecting 100 samples in e                                                                                basic dynfunc with many args cranelift
                        time:   [278.24 ns 280.11 ns 281.96 ns]
```

To:

```
Benchmarking basic static func llvm: Collecting 100 samples in estimated 5.0001                                                                           basic static func llvm  time:   [19.791 ns 19.817 ns 19.845 ns]
                        change: [-85.086% -85.045% -85.006%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe

Benchmarking basic static func with many args llvm: Collecting 100 samples in es                                                                                basic static func with many args llvm
                        time:   [29.684 ns 29.716 ns 29.756 ns]
                        change: [-78.858% -78.802% -78.743%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  12 (12.00%) high severe

Benchmarking basic static func cranelift: Collecting 100 samples in estimated 5.                                                                                basic static func cranelift
                        time:   [22.266 ns 22.289 ns 22.316 ns]
                        change: [-83.476% -83.279% -82.980%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  8 (8.00%) high mild
  7 (7.00%) high severe

Benchmarking basic static func with many args cranelift: Collecting 100 samples                                                                                 basic static func with many args cranelift
                        time:   [30.699 ns 30.726 ns 30.757 ns]
                        change: [-78.786% -78.682% -78.586%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

Benchmarking basic dynfunc llvm: Collecting 100 samples in estimated 5.0005 s (4                                                                                basic dynfunc llvm      time:   [120.06 ns 121.13 ns 122.21 ns]
                        change: [-47.266% -46.814% -46.367%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking basic dynfunc with many args llvm: Collecting 100 samples in estima                                                                                basic dynfunc with many args llvm
                        time:   [172.60 ns 176.38 ns 181.35 ns]
                        change: [-32.788% -27.622% -21.063%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking basic dynfunc cranelift: Collecting 100 samples in estimated 5.0004                                                                                basic dynfunc cranelift time:   [120.39 ns 121.71 ns 123.13 ns]
                        change: [-46.541% -45.905% -45.238%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking basic dynfunc with many args cranelift: Collecting 100 samples in e                                                                                basic dynfunc with many args cranelift
                        time:   [162.72 ns 163.36 ns 164.01 ns]
                        change: [-41.999% -41.705% -41.419%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
```

So, best case scenario from 131.99ns to 19.817ns.

<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Syrus Akbary <[email protected]>
@ul
Copy link
Author

ul commented Feb 9, 2021

Thank you, I appreciate it!

@syrusakbary
Copy link
Member

Duplicate of #2562

@syrusakbary syrusakbary marked this as a duplicate of #2562 Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 lib-vm About wasmer-vm ❓ question I've a question!
Projects
None yet
4 participants
@syrusakbary @ul @Hywan and others