-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native stack switching #225
Native stack switching #225
Conversation
Benchmark results, showing runtime with this PR divided by runtime on main for each benchmark:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
/// TODO | ||
#[inline(always)] | ||
pub fn resume( | ||
instance: &mut Instance, | ||
contref: *mut VMContRef, | ||
) -> Result<ControlEffect, TrapReason> { | ||
let cont = unsafe { | ||
contref.as_ref().ok_or_else(|| { | ||
TrapReason::user_without_backtrace(anyhow::anyhow!( | ||
"Attempt to dereference null VMContRef!" | ||
)) | ||
})? | ||
}; | ||
assert!(cont.state == State::Allocated || cont.state == State::Invoked); | ||
|
||
if ENABLE_DEBUG_PRINTING { | ||
let chain = instance.typed_continuations_stack_chain(); | ||
// SAFETY: We maintain as an invariant that the stack chain field in the | ||
// VMContext is non-null and contains a chain of zero or more | ||
// StackChain::Continuation values followed by StackChain::Main. | ||
match unsafe { (**chain).0.get_mut() } { | ||
StackChain::Continuation(running_contref) => { | ||
debug_assert_eq!(contref, *running_contref); | ||
debug_println!( | ||
"Resuming contref @ {:p}, previously running contref is {:p}", | ||
contref, | ||
running_contref | ||
) | ||
} | ||
_ => { | ||
// Before calling this function as a libcall, we must have set | ||
// the parent of the to-be-resumed continuation to the | ||
// previously running one. Hence, we must see a | ||
// `StackChain::Continuation` variant. | ||
return Err(TrapReason::user_without_backtrace(anyhow::anyhow!( | ||
"Invalid StackChain value in VMContext" | ||
))); | ||
} | ||
} | ||
} | ||
|
||
Ok(cont.stack.resume()) | ||
} | ||
|
||
/// TODO | ||
#[inline(always)] | ||
pub fn suspend(instance: &mut Instance, tag_addr: *mut u8) -> Result<(), TrapReason> { | ||
let chain_ptr = instance.typed_continuations_stack_chain(); | ||
|
||
// TODO(dhil): This should be handled in generated code. | ||
// SAFETY: We maintain as an invariant that the stack chain field in the | ||
// VMContext is non-null and contains a chain of zero or more | ||
// StackChain::Continuation values followed by StackChain::Main. | ||
let chain = unsafe { (**chain_ptr).0.get_mut() }; | ||
let running = match chain { | ||
StackChain::Absent => Err(TrapReason::user_without_backtrace(anyhow::anyhow!( | ||
"Internal error: StackChain not initialised" | ||
))), | ||
StackChain::MainStack { .. } => Err(TrapReason::user_without_backtrace( | ||
anyhow::anyhow!("Calling suspend outside of a continuation"), | ||
)), | ||
StackChain::Continuation(running) => { | ||
// SAFETY: See above. | ||
Ok(unsafe { &**running }) | ||
} | ||
}?; | ||
|
||
let stack = &running.stack; | ||
debug_println!( | ||
"Suspending while running {:p}, parent is {:?}", | ||
running, | ||
running.parent_chain | ||
); | ||
|
||
let payload = ControlEffect::suspend(tag_addr as *const u8); | ||
Ok(stack.suspend(payload)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether there may be value in keeping this implementation around, perhaps as a new baseline implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to replace the current baseline implementation with the "optimized" implementation prior to this PR? I definitely wouldn't want to maintain a third active implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I am not suggesting to replace anything... I think there would be value in keeping this implementation around. It is fairly simple still... It would be good to compare against at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require keeping the version of the fibre/*
files prior to this PR around (effectively continuing to support a third possible internal layout of stacks), plus eventually increasing divergence within optimized.rs
. I consider the maintenance cost of that too great.
If we want to evaluate the performance of the implementation prior to this PR, we should use the approach I've implemented where we checkout older revisions directly, accepting that it will not support things like switch
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is such a burden. We should discuss this on Thursday, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's discuss on Thursday. I will merge this PR in the meantime, since making the previous optimized implementation available as a third implementation would be a PR of its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed.
This PR moves stack switching in the optimized implementation from libcalls to generated code using the CLIF
stack_switch
instruction.Due to earlier preparation work, the changes are fairly localized:
FiberStacks
remains similar: Near the top of stack address, we store metadata about the stack's parent (if running) or itself (when suspended). This area has grown from 2 to 4 words, storing RSP, RIP and RBP now. Previously, we only stored RSP there, and RIP and RBP were stored on the actual stack itself. Seeunix.rs
for a detailed description of the layout.optimized.rs
hasn't changed much, it's mostly just replacing libcalls totc_resume
andtc_suspend
with usages ofstack_switch
. The instruction is given a "context pointer" into the 4-word region near the top of stack mentioned above.fibre
module:wasmtime_fibre_switch
, the function previously doing all switching under the hood, is gone now. There is however a new, related functionwasmtime_fibre_switch_to_parent
that is exclusively used to return back to the parent when a continuation finishes.wasmtime_fibre_init
, previously used to initializeFiberStacks
, is gone too. Conceptually, we still need to do very similar initialization work as before, but I realized that there wasn't a need for this function to be implemented in assembly anymore. The initialization logic is therefore moved to Rust code inFiberStack::initialize
.wasmtime_fibre_start
, the "launchpad" and bottom frame in everyFiberStack,
is conceptually still doing the same and has seen only minor changes, related to where certain data is obtained from now. I've temporarily removed all the.cfi_*
directives from the function, meaning that its custom DWARF info is gone for now. That has no actual impact on our work: Unlike the originalwasmtime-fiber
, our stacks are designed so that external tools can obtain cross-continuation backtraces just by doing frame pointer walking. I'm nevertheless planning to re-add the DWARF info at a later point.tc_resume
andtc_suspend
are no longer needed and hence removed.