-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to avoid syscalls during Function::call? #1922
Comments
As functions can fail, we need to wrap them so if they fail we catch the traps properly. A few months ago we were thinking on creating an How this would be implemented? We would need a function similar to: // Similar to wasmer_call_trampoline, but without catch_traps
pub unsafe fn wasmer_call_trampoline_unchecked(
vmctx: VMFunctionEnvironment,
trampoline: VMTrampoline,
callee: *const VMFunctionBody,
values_vec: *mut u8,
) -> Result<(), Trap> {
mem::transmute::<_, extern "C" fn(VMFunctionEnvironment, *const VMFunctionBody, *mut u8)>(
trampoline,
)(vmctx, callee, values_vec)
} And then the user would need to do something similar to: catch_traps(vmctx, || {
for i in 1..48000 {
unsafe {
func.call_unchecked(...);
}
}
} This way, catch_traps will only be called once rather than 48,000 times and our code will still be safe to execute (returning any error when it happens. |
Thank you for the explanation @syrusakbary! Indeed it would be nice to have such flexible API for traps to allow for some optimisations in cases like mine. |
2102: Use platform setjmp/longjmp to optimize function calls r=syrusakbary a=syrusakbary <!-- Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test: https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests --> # Description Use platform setjmp/longjmp when possible to optimize function calls. This PR fixes #1922. This improves timings from: ``` Benchmarking basic static func llvm: Collecting 100 samples in estimated 5.0004 basic static func llvm time: [131.99 ns 132.31 ns 132.66 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Benchmarking basic static func with many args llvm: Collecting 100 samples in es basic static func with many args llvm time: [140.14 ns 140.55 ns 140.94 ns] Benchmarking basic static func cranelift: Collecting 100 samples in estimated 5. basic static func cranelift time: [133.51 ns 133.81 ns 134.09 ns] Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe Benchmarking basic static func with many args cranelift: Collecting 100 samples basic static func with many args cranelift time: [144.17 ns 145.04 ns 146.01 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe Benchmarking basic dynfunc llvm: Collecting 100 samples in estimated 5.0012 s (2 basic dynfunc llvm time: [228.77 ns 229.59 ns 230.35 ns] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe Benchmarking basic dynfunc with many args llvm: Collecting 100 samples in estima basic dynfunc with many args llvm time: [277.33 ns 279.09 ns 280.91 ns] Benchmarking basic dynfunc cranelift: Collecting 100 samples in estimated 5.0008 basic dynfunc cranelift time: [229.38 ns 230.38 ns 231.43 ns] Found 15 outliers among 100 measurements (15.00%) 14 (14.00%) high mild 1 (1.00%) high severe Benchmarking basic dynfunc with many args cranelift: Collecting 100 samples in e basic dynfunc with many args cranelift time: [278.24 ns 280.11 ns 281.96 ns] ``` To: ``` Benchmarking basic static func llvm: Collecting 100 samples in estimated 5.0001 basic static func llvm time: [19.791 ns 19.817 ns 19.845 ns] change: [-85.086% -85.045% -85.006%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe Benchmarking basic static func with many args llvm: Collecting 100 samples in es basic static func with many args llvm time: [29.684 ns 29.716 ns 29.756 ns] change: [-78.858% -78.802% -78.743%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low severe 2 (2.00%) high mild 12 (12.00%) high severe Benchmarking basic static func cranelift: Collecting 100 samples in estimated 5. basic static func cranelift time: [22.266 ns 22.289 ns 22.316 ns] change: [-83.476% -83.279% -82.980%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) high mild 7 (7.00%) high severe Benchmarking basic static func with many args cranelift: Collecting 100 samples basic static func with many args cranelift time: [30.699 ns 30.726 ns 30.757 ns] change: [-78.786% -78.682% -78.586%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe Benchmarking basic dynfunc llvm: Collecting 100 samples in estimated 5.0005 s (4 basic dynfunc llvm time: [120.06 ns 121.13 ns 122.21 ns] change: [-47.266% -46.814% -46.367%] (p = 0.00 < 0.05) Performance has improved. Benchmarking basic dynfunc with many args llvm: Collecting 100 samples in estima basic dynfunc with many args llvm time: [172.60 ns 176.38 ns 181.35 ns] change: [-32.788% -27.622% -21.063%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Benchmarking basic dynfunc cranelift: Collecting 100 samples in estimated 5.0004 basic dynfunc cranelift time: [120.39 ns 121.71 ns 123.13 ns] change: [-46.541% -45.905% -45.238%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking basic dynfunc with many args cranelift: Collecting 100 samples in e basic dynfunc with many args cranelift time: [162.72 ns 163.36 ns 164.01 ns] change: [-41.999% -41.705% -41.419%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) ``` So, best case scenario from 131.99ns to 19.817ns. <!-- Provide details regarding the change including motivation, links to related issues, and the context of the PR. --> # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Syrus Akbary <[email protected]>
Thank you, I appreciate it! |
Duplicate of #2562 |
Summary
This question is in the context of using WASM for realtime audio synthesis. I invoke
Function::call
within an audio thread 48000 times per second. In the profiler,sigprogmask
andsigaltstack
get a noticeable share of CPU time (see attached screenshot). While overhead itself is more or less acceptable as it stays constant with the growth of the executed WASM code, the unpredictability of the latency incurred by syscalls is bothersome for my use-case, I get a high variation in the trace with some calls taking longer than my per-frame budget. So my question is as follows: is it possible to implementFunction::call
without the need for syscalls?Additional details
The text was updated successfully, but these errors were encountered: