Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native stack switching #225

Merged
merged 5 commits into from
Sep 16, 2024
Merged

Conversation

frank-emrich
Copy link

@frank-emrich frank-emrich commented Sep 12, 2024

This PR moves stack switching in the optimized implementation from libcalls to generated code using the CLIF stack_switch instruction.

Due to earlier preparation work, the changes are fairly localized:

  • The layout of FiberStacks remains similar: Near the top of stack address, we store metadata about the stack's parent (if running) or itself (when suspended). This area has grown from 2 to 4 words, storing RSP, RIP and RBP now. Previously, we only stored RSP there, and RIP and RBP were stored on the actual stack itself. See unix.rs for a detailed description of the layout.
  • Code emitted in optimized.rs hasn't changed much, it's mostly just replacing libcalls to tc_resume and tc_suspend with usages of stack_switch. The instruction is given a "context pointer" into the 4-word region near the top of stack mentioned above.
  • Most changes happen in the fibre module:
    • wasmtime_fibre_switch, the function previously doing all switching under the hood, is gone now. There is however a new, related function wasmtime_fibre_switch_to_parent that is exclusively used to return back to the parent when a continuation finishes.
    • wasmtime_fibre_init, previously used to initialize FiberStacks, is gone too. Conceptually, we still need to do very similar initialization work as before, but I realized that there wasn't a need for this function to be implemented in assembly anymore. The initialization logic is therefore moved to Rust code in FiberStack::initialize.
    • wasmtime_fibre_start, the "launchpad" and bottom frame in every FiberStack, is conceptually still doing the same and has seen only minor changes, related to where certain data is obtained from now. I've temporarily removed all the .cfi_* directives from the function, meaning that its custom DWARF info is gone for now. That has no actual impact on our work: Unlike the original wasmtime-fiber, our stacks are designed so that external tools can obtain cross-continuation backtraces just by doing frame pointer walking. I'm nevertheless planning to re-add the DWARF info at a later point.
  • The libcalls tc_resume and tc_suspend are no longer needed and hence removed.

@frank-emrich
Copy link
Author

Benchmark results, showing runtime with this PR divided by runtime on main for each benchmark:

c10m_asyncify: 1.000054783032901
c10m_wasmfx: 0.6700970347850967
sieve_asyncify: 1.0099458738253342
sieve_wasmfx: 0.3834041087319698
skynet_asyncify: 1.0122322896432343
skynet_wasmfx: 0.5817693043300282
state_asyncify: 0.9942814050683813
state_wasmfx: 0.22364321985889885
micro/suspend_resume: 0.16735253601580183
micro/2resumes_same_function: 0.21016042838851387

Copy link
Member

@dhil dhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

crates/cranelift/src/wasmfx/optimized.rs Outdated Show resolved Hide resolved
Comment on lines -296 to -373
/// TODO
#[inline(always)]
pub fn resume(
instance: &mut Instance,
contref: *mut VMContRef,
) -> Result<ControlEffect, TrapReason> {
let cont = unsafe {
contref.as_ref().ok_or_else(|| {
TrapReason::user_without_backtrace(anyhow::anyhow!(
"Attempt to dereference null VMContRef!"
))
})?
};
assert!(cont.state == State::Allocated || cont.state == State::Invoked);

if ENABLE_DEBUG_PRINTING {
let chain = instance.typed_continuations_stack_chain();
// SAFETY: We maintain as an invariant that the stack chain field in the
// VMContext is non-null and contains a chain of zero or more
// StackChain::Continuation values followed by StackChain::Main.
match unsafe { (**chain).0.get_mut() } {
StackChain::Continuation(running_contref) => {
debug_assert_eq!(contref, *running_contref);
debug_println!(
"Resuming contref @ {:p}, previously running contref is {:p}",
contref,
running_contref
)
}
_ => {
// Before calling this function as a libcall, we must have set
// the parent of the to-be-resumed continuation to the
// previously running one. Hence, we must see a
// `StackChain::Continuation` variant.
return Err(TrapReason::user_without_backtrace(anyhow::anyhow!(
"Invalid StackChain value in VMContext"
)));
}
}
}

Ok(cont.stack.resume())
}

/// TODO
#[inline(always)]
pub fn suspend(instance: &mut Instance, tag_addr: *mut u8) -> Result<(), TrapReason> {
let chain_ptr = instance.typed_continuations_stack_chain();

// TODO(dhil): This should be handled in generated code.
// SAFETY: We maintain as an invariant that the stack chain field in the
// VMContext is non-null and contains a chain of zero or more
// StackChain::Continuation values followed by StackChain::Main.
let chain = unsafe { (**chain_ptr).0.get_mut() };
let running = match chain {
StackChain::Absent => Err(TrapReason::user_without_backtrace(anyhow::anyhow!(
"Internal error: StackChain not initialised"
))),
StackChain::MainStack { .. } => Err(TrapReason::user_without_backtrace(
anyhow::anyhow!("Calling suspend outside of a continuation"),
)),
StackChain::Continuation(running) => {
// SAFETY: See above.
Ok(unsafe { &**running })
}
}?;

let stack = &running.stack;
debug_println!(
"Suspending while running {:p}, parent is {:?}",
running,
running.parent_chain
);

let payload = ControlEffect::suspend(tag_addr as *const u8);
Ok(stack.suspend(payload))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there may be value in keeping this implementation around, perhaps as a new baseline implementation...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to replace the current baseline implementation with the "optimized" implementation prior to this PR? I definitely wouldn't want to maintain a third active implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am not suggesting to replace anything... I think there would be value in keeping this implementation around. It is fairly simple still... It would be good to compare against at least.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require keeping the version of the fibre/* files prior to this PR around (effectively continuing to support a third possible internal layout of stacks), plus eventually increasing divergence within optimized.rs. I consider the maintenance cost of that too great.
If we want to evaluate the performance of the implementation prior to this PR, we should use the approach I've implemented where we checkout older revisions directly, accepting that it will not support things like switch in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is such a burden. We should discuss this on Thursday, I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's discuss on Thursday. I will merge this PR in the meantime, since making the previous optimized implementation available as a third implementation would be a PR of its own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed.

@frank-emrich frank-emrich merged commit 191694c into wasmfx:main Sep 16, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants