Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: add stack_switch CLIF instruction #9078

Merged
merged 12 commits into from
Aug 21, 2024

Conversation

frank-emrich
Copy link
Contributor

@frank-emrich frank-emrich commented Aug 5, 2024

This PR adds a new CLIF instruction for switching stacks. While the primary motivation is to support the Wasm stack switching proposal currently under development, the CLIF instruction here is lower level and thus intended to be useful for general-purpose stackful context switching (such as implementing coroutines, fibers, etc independently from the Wasm stack switching proposal).
This PR only adds support for the instruction on x64 Linux, but I'm planning to add support for more platforms over time. The design of the instruction should be sufficiently abstract to support all the other platforms.

While work is currently under way to implement Wasm stack switching in Wasmtime here and indeed uses the CLIF instruction introduced by this PR successfully, it seems worthwhile just upstreaming the CLIF instruction by itself. The proposal is not fully finalized yet, and this CLIF instruction seems useful on its own and independent from the remainder of the Wasm proposal.

Concretely, the CLIF instruction looks as follows:

out_payload = stack_switch(store_context_ptr, load_context_ptr, in_payload)

This causes the following to happen:

  1. The current execution context is saved: The current frame pointer, stack pointer and PC after the stack_switch instruction are stored at store_context_ptr. All other registers are marked as clobbered and thus spilled by regalloc as needed.
  2. We load a new (SP, FP, PC) triple from load_context_ptr, indicating the stack/context to switch to. We assume that we are either switching to a stack that was either previously switched away from by another stack_switch, or it's a newly initialized stack.
  3. The value in_payload is passed over to the other stack. In other words, if the instruction above switches from some stack A to another stack B, then the return value of the stack_switch instruction previously executed on B will be in_payload.
  4. Execution continues on stack B.
  5. If we ever switch back to stack A, the value out_payload above (i.e., the return value of the stack_switch executed when leaving stack A) is the payload argument passed to the corresponding switch.

A few additional notes:

  • store_context_ptr and load_context_ptr can be seen as pointers to what is conceptually a three-element struct, containing SP, FP, PC.
  • The pointers store_context_ptr and load_context_ptr are allowed to be equal. In particular, in steps 1 and 2 above, we ensure to actually load all required data from load_context_ptr before storing to store_context_ptr.
  • As mentioned above, there are two cases in step 2: We either switch to code where a matching stack_switch was executed, or to a new stack
    • If we switch back to (right behind) a previous stack_switch instruction, then regalloc has spilled all subsequently needed SSA values for us, no need to manually restore any context besides SP, FP, PC.
    • If we are executing on a new stack, we assume that execution starts inside a stack-switch aware trampoline
  • Payloads are currently hard-coded to be a single, word-sized value. That may seem arbitrary, but it's simply what's needed for our current implementation of Wasm stack switching on top of this CLIF instruction. It could later be extended to more general cases, if necessary.
  • The stack switching implemented here is "one-shot": If you execute a stack_switch to switch from a stack A to a stack B, we can only switch back to A once (unless we subsequently execute another stack_switch on A again).
    This is different from setjmp/longjmp, where we may store the context once using setjmp and then return to it multiple times using longjmp without needing to call setjmp again.

@frank-emrich
Copy link
Contributor Author

I'm marking this as "ready for review" now, but I have a few questions about things that will probably require additional fixes before this is actually ready:

  1. Currently, the instruction is lowered by very simple rules in codegen/src/isa/x64/lower.isle. However, the emitted code is only valid for x64 Linux, not x64 Windows.
    This means that I should probably do something similar to function calls, where my instruction is lowered using an external gen_stack_switch function. What's the cleanest way to simply refuse lowering on x64 Windows for the time being?
  2. The layout of the data read and written by the instruction is described in the new file codegen/src/isa/x64/inst/stack_switch.rs, given that the information in that file is platform-specific. However, I'm not sure if that's the right place for that file. I imagine that the content of that file will either be split up in the future, or move somewhere else to collect the layout information for all platforms.
  3. When emitting code for the new instruction on x64, we need two temporary registers. Currently, this is done without regalloc (see my comment in emit.rs): If I try to add any early def to my StackSwitch MInst to get temporary registers, regalloc2 fails with TooManyLiveRegs. I suspect this is just a result of the rather unique set of constraints on StackSwitch. I'm curious if there's an existing issue I could mention, a more elegant workaround than what I'm currently doing, or if I should open a new issue regarding this in the regalloc2 repo.
  4. In instructions.rs, I'm giving stack_switch all kinds of scary side effects (namely, .other_side_effects(), .can_load(), .can_store()), given that it continues execution elsewhere, where arbitrary things may happen.
    I'm wondering what a more accurate description of the side effects would be. The .call() side effect seems related, but I was unsure if using that would have unintended consequences.
  5. I've added a filetest for the new instruction but it turned out quite monstrous: It uses stack_switch in two functions where we create enough SSA values to exceed the number of available registers before the switch, and then use all of them after the stack switch. I've manually verified the generated assembly, but I'm wondering if people prefer having a shorter test case that makes it easier to verify the assembly for the stack switching itself.
  6. My StackSwitch MInst takes its inputs as Gprs. These values all end up in registers during code emission anyway, but I was wondering if it would be more idiomatic to make the MInst use GprMem or RegMem, then deal with moving things into registers during emission.

@frank-emrich frank-emrich marked this pull request as ready for review August 5, 2024 15:27
@frank-emrich frank-emrich requested a review from a team as a code owner August 5, 2024 15:27
@frank-emrich frank-emrich requested review from abrown and removed request for a team August 5, 2024 15:27
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. labels Aug 5, 2024
@fitzgen fitzgen requested review from fitzgen and removed request for abrown August 6, 2024 15:54
@fitzgen
Copy link
Member

fitzgen commented Aug 7, 2024

Super excited for this! Haven't taken a look at the actual code yet, but here are some answers to the questions in your comment above.

However, the emitted code is only valid for x64 Linux, not x64 Windows. ... What's the cleanest way to simply refuse lowering on x64 Windows for the time being?

Will the lowering work for all posix right now or only Linux? Will macos work, for example? We should be precise about the correctness condition here.

Cranelift doesn't generally care what OS it is targeting beyond calling conventions and a few other ABI details here and there, and AFAIK we've never needed OS-specific lowering rules before, so this is sort of untrodded ground.

A TargetIsa does have a triple method, which returns a target_lexicon::Triple, which in turn has an operating_system member. You could try plumbing that stuff through to an external partial constructor for use in ISLE if-lets.

The layout of the data read and written by the instruction is described in the new file codegen/src/isa/x64/inst/stack_switch.rs, given that the information in that file is platform-specific. However, I'm not sure if that's the right place for that file. I imagine that the content of that file will either be split up in the future, or move somewhere else to collect the layout information for all platforms.

I think describing the layout of that data would be best done in the documentation for the new instruction itself.

I haven't looked at the actual code in this PR, but I'd expect that the only platform-specific bits would be pointer size, and otherwise we are always saving SP, optionally FP when frame pointers are enabled, and PC. Is that assumption incorrect? Are we, or will we be, saving more/fewer values on different platforms?

I also would expect that we wouldn't actually need to explicitly define all the details of this data and its layout. I'd imagine we would only say that it has a given size and alignment, and is otherwise only valid to be manipulated via the stack_switch instruction. That is, it should be opaque to the host.

But maybe something like stack walking means that the runtime needs to be able to peek inside this data, and we can't keep it opaque to the host?

When emitting code for the new instruction on x64, we need two temporary registers. Currently, this is done without regalloc (see my comment in emit.rs): If I try to add any early def to my StackSwitch MInst to get temporary registers, regalloc2 fails with TooManyLiveRegs. I suspect this is just a result of the rather unique set of constraints on StackSwitch. I'm curious if there's an existing issue I could mention, a more elegant workaround than what I'm currently doing, or if I should open a new issue regarding this in the regalloc2 repo.

I think we've had similar-ish issues opened in the past for regalloc2. None the less, I'd recommend opening an issue there and we can dedupe it if necessary. That will also get eyes on the problem from folks who are more familiar with our register allocator than I am.

In instructions.rs, I'm giving stack_switch all kinds of scary side effects (namely, .other_side_effects(), .can_load(), .can_store()), given that it continues execution elsewhere, where arbitrary things may happen.
I'm wondering what a more accurate description of the side effects would be. The .call() side effect seems related, but I was unsure if using that would have unintended consequences.

I think we also want to have the call() side effect because, for these instructions, we will want to do call-related things like

  • clear all memory aliasing info, assuming that external code we "call" could alias anything
  • treat the instruction as a GC safepoint, spill GC refs to the stack, and emit a stack map for this location
  • etc...

I've added a filetest for the new instruction but it turned out quite monstrous: It uses stack_switch in two functions where we create enough SSA values to exceed the number of available registers before the switch, and then use all of them after the stack switch. I've manually verified the generated assembly, but I'm wondering if people prefer having a shorter test case that makes it easier to verify the assembly for the stack switching itself.

I don't think having a big filetest is a problem in itself per se, but I think we should also have a very basic filetest that is effectively just the stack_switch instruction and its disassembly. Basically, a function that takes all the stack_switch operands as arguments and returns the out_payload, and is just a single basic block with a single stack_switch instruction. This gives us a smoke test for the basic lowering and its disassembly.

My StackSwitch MInst takes its inputs as Gprs. These values all end up in registers during code emission anyway, but I was wondering if it would be more idiomatic to make the MInst use GprMem or RegMem, then deal with moving things into registers during emission.

Things like GprMem are useful for x64 instructions like add where one of the operands can naturally be either in memory or a register. When we are lowering an add with a memory operand, we don't force the memory into a register during emission, we emit an instruction that operates directly on the memory. It wouldn't make sense to take a GprMem operand for add if the x64 instruction(s) we emit can't operate directly on memory.

So for the x64 MInst.StackSwitch instruction, my questions would be:

  1. Can the instruction sequence it emits operate directly on memory, or must they have the operand in a register?
  2. Is the emitted instruction sequence accessing the value once or multiple times?

If the answer to these questions are "can operate directly on memory" and "accessing the value once" then I'd say a GprMem makes sense. Otherwise, I think a Gpr is the way to go.

FWIW, it is also fine to start with just Gpr operands and then investigate whether it makes sense to upgrade to a GprMem in follow up PRs.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I think that, with the guard rails where we only lower this instruction on target OSes for which it will work, and the various nitpicks below addressed, this will be good to merge.

Comment on lines 920 to 954
"stack_switch",
r#"
Suspends execution of the current stack and resumes execution of another
one.
The target stack to switch to is identified by the data stored at
``load_context_ptr``. Before switching, this instruction stores
analogous information about the
current (i.e., original) stack at ``store_context_ptr``, to
enabled switching back to the original stack at a later point.
The size and layout of the information stored at ``load_context_ptr``
and ``store_context_ptr`` is platform-dependent.
The instruction is experimental and only supported on x64 Linux at the
moment.
When switching from a stack A to a stack B, one of the following cases
must apply:
1. Stack B was previously suspended using a ``stack_switch`` instruction.
2. Stack B is a newly initialized stack. The necessary initialization is
platform-dependent and will generally involve running some kind of
trampoline to start execution of a function on the new stack.
In both cases, the ``in_payload`` argument of the ``stack_switch``
instruction executed on A is passed to stack B. In the first case above,
it will be the result value of the earlier ``stack_switch`` instruction
executed on stack B. In the second case, the value will be accessible to
the trampoline in a platform-dependent register.
The pointers ``load_context_ptr`` and ``store_context_ptr`` are allowed
to be equal; the instruction ensures that all data is loaded from the
former before writing to the latter.
"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in here we should mention the one-shottedness of this instruction, and how resuming a context twice can result in UB due to spilled values being overwritten by the first resumption, etc...

It might even be worth naming this instruction one_shot_stack_switch.

I think we should also clarify that, while this instruction performs loads and stores, those memory operations are always assumed to be aligned, non-trapping, and otherwise valid. This instruction performs no validation itself. It is as if these memory operations had MemFlags::trusted() attached to them. Therefore, it is the user's responsibility to ensure that these assumptions are upheld.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended the documentation of this instruction now. Still not specifying the actual layout itself, but mentioned the one-shottedness and the fact that the instruction does not check the pointers or data.

Comment on lines +1747 to +1749
// Note that we do not emit anything for preserving and restoring
// ordinary registers here: That's taken care of by regalloc for us,
// since we marked this instruction as clobbering all registers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic hack

cranelift/codegen/src/isa/x64/inst/stack_switch.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/pcc.rs Outdated Show resolved Hide resolved
@frank-emrich
Copy link
Contributor Author

Thanks for your answers and looking at the code! I'll look into the things you suggested.

In the meantime, some more thoughts regarding the layout of the contexts: On most platforms, the context can indeed look like this:

pub struct ControlContext {
    pub stack_pointer: *mut u8,
    pub frame_pointer: *mut u8,
    pub instruction_pointer: *mut u8,
}

However, there are two reasons for why the layout may be different somewhere:

1. Additional info needed on Windows

On Windows, I think we would have to update data inside the "Thread Information Block" (the stuff we briefly talked about with Ryan Hunt at the Pittsburgh CG meeting). I haven't looked too deeply into it, but it looks like it contains pointers to the beginning and end of the currently active stack, which we would need to update when switching. That means that I suspect we would have to add these to the ControlContext on Windows.

In any case, it looks like we would need Windows-specific lowering rules for x64, just to emit the logic for updating the Thread Information Block.

2. Frame pointer walking

There's another issue that I've briefly mentioned in a comment in the new file stack_switch.rs, but haven't provided much detail on (mostly because it requires dumping a bunch of extra info on you, sorry!):

I made sure that the ControlContext above is laid out in a way so that it can appear inside a frame pointer chain. This is so that for stack switching approaches where you do have parent-child relationships between stacks, these are reflected in the stack trace you get just from frame pointer walking, by creating a single continuous chain crossing stacks. In other words, tools like lldb show you the frames of the active stack, but also the frames in its parent.
(There's been a recent development in the Wasm stack switching subgroup making it very likely that at least for Wasm stack switching, we will end up with a proposal where there are parent-child relationships between all the active stacks.)

Concretely, in our implementation of Wasm stack switching, this is achieved like this:
Imagine we have allocated memory from 0x1000 to 0xB000 which we want to use for a new stack S. We store a ControlContext at the very top of that allocation at 0xAFE8, with the actual stack space starting below it. When starting execution in the new stack, we will then have RSP = 0xAFE0 in the trampoline that kicks off execution inside stack S (e.g., the trampoline calls the Wasm function that we actually want to run inside that stack).

The main idea is now that we make sure that the ControlContext at 0xAFE8 always contains the information about the parent of stack S, meaning that whenever we switch from some other stack X to S, we write information about stack X (which then becomes the parent of S) into the ControlContext at 0xAFE8.

The last missing ingredient for creating the frame pointer chain is that when we are running the trampoline that kicks of execution inside stack S, we set RBP to 0xAFF0, which is the address of the frame pointer field inside the control context.
This means that while the trampoline is the bottom-most stack frame inside S, when passing the trampoline while walking the frame pointer chain we actually get to where X stack_switch-ed to S: At 0xAFF0 we stored the frame pointer of X, and the layout of ControlContext puts the PC saved for X right next to it, exactly where lldb and friends would expect it.

Long story short: It's quite neat to make sure that ControlContext has just the right layout to enable this trick. It allows users of the stack_switch instruction to get frame pointer walking across stacks entirely for free "simply" by carefully laying out where the contexts are actually stored and setting RBP correctly. If they don't care or implement a stack switching approach where there are no parents, nothing is lost. But to make this possible, the layout of the ControlContext suddenly depends on not just the ISA, but whatever your platform dictates about stack layout (in particular, where to find the return address relative to the frame pointer).

Luckily, the layout above (i.e., frame pointer right next to PC) works on all platforms supported by Cranelift, except s390x: On the latter, there's an offset of 14 words between where the FP and PC are stored.
So on that platform my plan was to make the ControlContext look like this:

pub struct S390XControlContext {
    pub frame_pointer: *mut u8,
    pub stack_pointer: *mut u8,
    padding: [*mut u8; 12],
    pub instruction_pointer: *mut u8,
}

(Or just give up on frame pointer walking there)

So at least it seems that within the same ISA, all OSes sufficiently agree on the stack layout so that the frame pointer walking causes no extra hassle.

Documenting the layout

I also would expect that we wouldn't actually need to explicitly define all the details of this data and its layout. I'd imagine we would only say that it has a given size and alignment, and is otherwise only valid to be manipulated via the stack_switch instruction. That is, it should be opaque to the host.

But maybe something like stack walking means that the runtime needs to be able to peek inside this data, and we can't keep it opaque to the host?

Yes, the layout of the ControlContext can be kept mostly opaque, except for stack creation. As long as you only create and consume these contexts with stack_switch, you really don't need to know anything about them besides their size and (boring) alignment requirements.

The only situation where you do need to know about the layout is when initializing a new stack: You need to create a corresponding ControlContext that when stack_switch-ed to executes the right trampoline on the new stack. That's why my idea was basically to leave details about the layout of the context out of the documentation of the instruction in instruction.rs, but at least have some platform-specific documentation elsewhere.

Alternatively, we could add a stack_init instruction, or instructions that act like getters and setters on the context.

@frank-emrich
Copy link
Contributor Author

I've implemented the restriction to only lower stack_switch on Linux now: Strictly speaking, it's each individual OS rather than the ISA that may require additional things needing to be done when switching stacks (see my remarks on Windows in my huge previous comment). I do believe that my current implementation would actually work on x64 macOS, but haven't tested it. So I would just stay on the safe side and make the support OS-dependent (instead of, say, allowing anything Unix-like), with the only allowed one being Linux for now.

I've implemented this using a partial constructor on_linux. The controversial bit may be that I'm now copying the Triple into every machinst::Callee. Is that acceptable?

@fitzgen
Copy link
Member

fitzgen commented Aug 9, 2024

only allowed one being Linux for now.

SGTM

The controversial bit may be that I'm now copying the Triple into every machinst::Callee. Is that acceptable?

It doesn't seem ideal. Do we not already have access to a &dyn TargetIsa in the LowerContext?

@fitzgen
Copy link
Member

fitzgen commented Aug 9, 2024

Do we not already have access to a &dyn TargetIsa in the LowerContext?

The IsleContext contains a B: LowerBackend which for x64 is a X64Backend which contains a Triple already, and all the other LowerBackend implementations also contain a Triple. I think we could add a fn triple(&self) -> &Triple trait method to LowerBackend and then reuse that in the IsleContext's implementation of the partial constructor.


(rule (lower (stack_switch store_context_ptr load_context_ptr in_payload0))
;; For the time being, stack switching is only supported on x64 Linux
(if (on_linux))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no other place in Cranelift that looks at the target OS when determining how to lower clif ir. Even TLS handling chooses the right variant for the target OS using a codegen flag rather than by looking at the OS in the triple. Using the right calling conventions is done by the producer of the clif ir.

Copy link
Member

@cfallin cfallin Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree here: philosophically, this is if not quite literally calling-convention related, certainly like an OS-interface detail that should be "baked into" the CLIF (rather than implicit in lowering) as CLIF is ordinarily explicit about such details. Can we check the triple in the Wasm translation (i.e., the wasmtime FuncEnvironment hook called by cranelift-wasm, or wherever else this instruction is generated) and fail if on the wrong platform?

This is also a little more future-looking in the sense that there may be Wasmtime details at some future point related to stack-switching (e.g., what if we add our own stack-protection mitigations or have some custom kind of stack-growth scheme or ...) -- we wouldn't want to hardcode that into Cranelift lowering in the same way OS details are here.

Copy link
Contributor Author

@frank-emrich frank-emrich Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to recap the context sprinkled through this PR: The reason why we need to do something platform/OS-specific here eventually is that Windows requires us to update parts of the Thread Information Block when switching stacks. Unfortunately, what exactly needs to be done is undocumented, so I've not done it, yet. The "plain" stack switching I implemented in this PR was supposed to work only on Linux, but from what I can tell should work on x64 macOS, too.

@bjorn3:

Following what you said, I guess one approach would be to make the lowering of stack_switch more similar to that of tls_value. I'd imagine this would look as follows:

  • Create a StackSwitchMode enum with Default and UpdateTib variants.
  • Encode a value of that type in the backend's shared_settings::Flags and also add it as a field to the StackSwitch MInst.
  • When emitting code for StackSwitch, check the value of the StackSwitchMode flag and act accordingly.

I'm also happy to have more than one MInst for stack switching, and lower the stack_switch CLIF instruction to one of them, based on the value of StackSwitchMode. That would mirror tls_value more closely, but I'm inclined to avoid that: The core stack switching code is always the same, we just sometimes need to emit some extra code on top of that.

Finally, what I described above would be the medium-term solution once I've implemented Windows support. For the time being I would just add a None variant to StackSwitchMode, use that on Windows, and panic if we see it when emitting code for StackSwitch.

@cfallin:

Can we check the triple in the Wasm translation (i.e., the wasmtime FuncEnvironment hook called by cranelift-wasm, or wherever else this instruction is generated) and fail if on the wrong platform?

Did you mean for this to be just a solution for the current issue of me wanting to fail if not on Linux? Or did you mean to also use this approach (i.e., resolving differences at the Wasm -> CLIF stage) in the future when we want to generate slightly different code on different OS-es?

In the latter case, are you suggesting to have multiple stack switching CLIF instructions, for the case with and without the TIB update, then choose between them during the Wasm translation? Or are you suggesting to have a single CLIF instruction for the stack switching itself, but then emit some additional CLIF on Windows (say, some additional loads and stores) around the stack_switch during the Wasm translation?

I fully agree that for things like stack growing there's a good chance that in the future, we'd need some more customization of the generated code (e.g., customize stack probing, function preludes, ...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean for this to be just a solution for the current issue of me wanting to fail if not on Linux? Or did you mean to also use this approach (i.e., resolving differences at the Wasm -> CLIF stage) in the future when we want to generate slightly different code on different OS-es?

Both, I think -- basically, any sort of difference of behavior on that level should be reified in the CLIF, rather than implicit; that's how we've handled things like TLS and other platform dependencies.

In the latter case, are you suggesting to have multiple stack switching CLIF instructions, for the case with and without the TIB update, then choose between them during the Wasm translation? Or are you suggesting to have a single CLIF instruction for the stack switching itself, but then emit some additional CLIF on Windows (say, some additional loads and stores) around the stack_switch during the Wasm translation?

One or the other, depending on what is actually needed? I don't have enough context to actually specify the full design; that's something we can discuss further; only that we should make it explicit somehow. If one platform requires a superset of the work that another platform does, factoring out the common bit as one instruction and adding more logic at the CLIF level seems reasonable. On the other hand it's totally reasonable to have stack_switch_windows and stack_switch_sysv instructions IMHO; again see how we did TLS, with separate instructions for ELF and Mach-O cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand it's totally reasonable to have stack_switch_windows and stack_switch_sysv instructions IMHO; again see how we did TLS, with separate instructions for ELF and Mach-O cases.

Ah, I think that's where my confusion came from: For TLS, there is a single CLIF instruction, which is then lowered one of several per-platform MInsts (but based on a flag in the backend, not some ad hoc OS check like I did). I'm happy to do it like that, which would be similar to what I described in my response to @bjorn3.

I'd prefer that over moving the TIB update out of the stack_switch (CLIF) instruction itself.

@github-actions github-actions bot added the cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. label Aug 9, 2024
@frank-emrich frank-emrich requested a review from a team as a code owner August 14, 2024 12:42
@frank-emrich frank-emrich requested review from alexcrichton and removed request for a team August 14, 2024 12:42
@frank-emrich
Copy link
Contributor Author

I've reworked the platform dependence logic based now, inspired by what happens for TLS:
There's a new enum StackSwitchModel with values None, Basic, and UpdateWindowsTib. A value of that is saved in the backend Flags.
This value is then used when lowering a stack_switch CLIF instruction: If None, we cannot lower them. If Basic, we lower to stack_switch_basic. In the future, it will be the case that UpdateWindowsTib causes lowering to a new dedicated MInst (like StackSwitchUpdateWindowsTib), but currently this will behave like None (i.e., we fail to lower).

@frank-emrich
Copy link
Contributor Author

@alexcrichton For some reason the automated reviewer assignment was triggered again, maybe because this PR is now touching a non-Cranelift file.

@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Aug 14, 2024
@alexcrichton
Copy link
Member

Do others have more thoughts on this? This all looks reaosnable enough to me to land, but I'd want to be sure to run by others too.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one final nitpick below

Comment on lines 877 to 878
(decl stack_switch_model (StackSwitchModel) Type)
(extern extractor infallible stack_switch_model stack_switch_model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making this an extractor from a type, can we make it a partial constructor?

Then the use would look like

(lower (stack_switch store_context_ptr load_context_ptr in_payload)
       (if-let (stack_switch_model) (StackSwitchModel.Basic))
       ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but you meant (if-let (StackSwitchModel.Basic) (stack_switch_model)), right?

Is there a particular reason for making it a partial constructor? Following what's happening for TLS I gave my StackSwitchModel enum a None variant to indicate that no model was set, but that means that the stack_switch_model constructor can be total, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but I had thought that you can only use partial constructors in if-lets. Might be worth removing the None variant from StackSwitchModel, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, and turning stack_switch_model into a total constructor seems to just work.
I don't understand the magic mechanism for configuring what goes into the backend Flags in detail, but from what I can tell it does not allow you to store something logically equivalent to Option<StackSwitchModel> in there, so keeping the None variant in StackSwitchModel itself seems like the way to go.

@fitzgen
Copy link
Member

fitzgen commented Aug 21, 2024

I think this is in a good enough place that we can land it and then continue with any further improvements in follow ups. Thanks @frank-emrich!

@fitzgen fitzgen added this pull request to the merge queue Aug 21, 2024
@frank-emrich
Copy link
Contributor Author

Oops, it looks like I responded to your comment and added a new commit in parallel to you approving things. But I think it should be uncontroversial, I kept None and made stack_switch_model total.

@fitzgen
Copy link
Member

fitzgen commented Aug 21, 2024

Yeah, LGTM

Merged via the queue into bytecodealliance:main with commit dbc11c3 Aug 21, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants