-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Stack Walking #1340
Comments
This feels like a distraction to work around speccing a host provided GC. I can see why this would be useful for some runtimes that want to port to WASM but I don't think this is worth investigating at this time. Maybe this makes sense in a non-web context but for hosts that already have their own GC this will cause enormous problems. In particular, for a host that doesn't have a precise GC already this is an extraordinary new overhead. Then they're all the problems that come with a platform that has two GCs interoperating, which is hard to get right even if both GCs were built with the other in mind. Then, even after we solve all of that, we'd still have to convince our security team that allowing malicious code to stack walk in a process that has your bank account number in memory is a good idea. Thus, I think this committee would be much better served by focusing on a GC proposal. I understand that existing runtimes have very specific requirements but, frankly, there's no way I can see this shipping in the foreseeable future... |
For implementing Erlang green threads, we would really like the stack switching primitives. Can we get those without the parts that @kmiller68 finds objectionable? |
Ah, this is definitely in need of clarifying. This proposal is intended to be orthogonal to the GC proposal. In fact, until the very last feature—first-class stacks—the proposal makes no use of references whatsoever. The garbage collection we are talking about here is solely garbage in linear memory, i.e. the garbage that the host and the GC proposal currently has no intention of helping collect. There is also no expectation that the host's garbage collector would interact with this proposal at all. The proposal primitives themselves make no mention of garbage collection—garbage collection of linear memory just happens to be a known application of these primitives. @kmiller68, does that clarification address your concern? |
Any of these stack marks require some way to enumerate all of them, correct? For runtimes with a conservative GC that's not something we track and would be a significant change. Right now, JSC just scans the C stack for anything that could be a pointer into the heap without worrying about if that frame is C++/JS/WASM, or even where frame boundaries are. None of our WASM pipelines have to know anything about where a value ended up being stored. So this proposal introduces an enormous VM overhead that didn't exist before. This is the biggest concern I have. A very large percentage of DOM APIs require a JS object as a parameter. So, a developer needs to handle references in both directions, effectively meaning there are two GCs. The only other realistic option is to paint on a canvas, which is probably even worse from the end users perspective (e.g. accessibility, performance, etc). We have history of this with Flash. If we're going to expose API in WASM that enables a GC, I strongly want that to be the browser's GC first because those are integrated into the host already. The web platform as a whole already has a story for how this will work cleanly. I worry that if we go the other way and make this the first option it dramatically raises the bar for the true GC proposal as it now must significantly better than this for developers to adopt it. |
Enumeration of executable stack marks is done through stack walking and through code ranges, a generalization of the process used to find the handler for a given thrown exception. The host's GC does not need any awareness of them, for the same reason it (presumably) ignores exception-handling tables. It occurs to me that you might be thinking "executable" means something like a closure, and therefore something that might have references that need to be accounted for during garbage collection, but it's simply a compile-time-constant code pointer. It's hard for me to describe the implementation in a little text box, but I'm happy to chat with you about it. I think your concerns have already been addressed and are one reason I chose this particular design strategy over other variants. As for linear-memory garbage collection, it is just one of many applications of stack primitives. It is also one of the easiest to implement using stack primitives. So we cannot make the other applications available without also making it possible for a module to use these primitives to implement its own linear-memory garbage collector. It's even the application that can be most easily implemented without these primitives. I hear your concern about competition with the GC proposal. I also agree that the GC proposal should be the encouraged target for GC languages because it facilitates better cross-module memory management. But I am less concerned about the competition for a few reasons. One is precisely because it works better across modules, which on its own will prompt people to prefer the GC proposal. Another is that the GC proposal should be easier to target for many GC languages. And one more is that host-managed GC can be specialized to the machine at hand, like 32-bit vs. 64-bit, and does not have to operate behind wasm's safety, like scanning the stack directly rather than having to look for and execute stack marks via code ranges. There is also an opportunity for the proposals to assist each other. In particular, the GC proposal(s) do not currently have a great plan for "complex" initialization patterns, such as constructors in Java and C#, especially across abstraction boundaries. I've been sketching out how one of the concepts from this proposal, namely stack-bound types (an example of which is stack-allocated closures), can be used to enable such complex initializations safely and across abstraction boundaries. And in the other direction, features like first-class stacks need host-managed garbage collection. In the end, although both these proposals facilitate garbage collection, I believe they serve fairly complementary domains, and developers will simply target the means that most naturally fits their domain rather than jump through substantial hoops to use the other. Does that reasoning jibe with you? |
It seems to me that technologies that give developers options tend to win, whereas trying to force developers to only follow one "right" way to do it never ends up well, especially if it's not the way they'd prefer. While there is overlap, these are not competing proposals, there is significant amount of things you might want to bring to Wasm that are outside the realm of native GC that this technique could help with. Also, I don't understand the "we should focus on the GC proposal".. we've been staring at the GC proposal mostly unmodified for 2+ years now, it is not like mere discussion of other potential Wasm features would take away time from its progress. |
As a general principle, I strongly believe we should have competing proposals. If the new proposal wins spec-developer mindshare, it's a sign that the old proposal was lacking. If someone magically comes up with a new, better Interface Types proposal that wins across all axes, I want that to ship. I'm not attached to our current proposal, I'm attached to solving the problem. If the new proposal solves the same problem and wins out on time-to-market, that's good news. In general, proposals should compete on their own merits. If stack walking doesn't interop as nicely with host GCs, that's a point of non-overlap that is left unsolved. But if developers think that's good enough and don't find the lack of interop to be a problem, then it's good enough and we've made the world better. If we then never get around to implementing the GC proposal because there isn't enough incentive because we solved enough of peoples' problems, then we have solved their problems. |
Besides the issue of competition, it is important to make sure that this proposal wouldn't significantly complicate engine implementations. @kmiller68, should we video chat about this and afterwards write up what we think would clarify our intentions/perspectives here? I'm worried that text boxes aren't the best medium for getting on the same page about implementation details and concerns. |
@kmiller68, Filip, and I had a through discussion about this proposal with a number of important points considered. Here is a summary of the primary takeaways:
All of these takeaways resonate with me. Note that in this discussion we focused specifically on stack marks and walking. We also focused on just understanding the proposal and relevant concerns, rather than what to do about them. |
As for (2), I believe there is potentially a significant cost for a language to move most/all of its values from the Wasm stack to a shadow stack, as you're missing out on many LLVM optimisations (e.g. Mem2Reg) and there being a big speed (and code size) difference between having all your values in locals/wasm stack vs accessing each of them over loads/stores with access to This cost is especially problematic for languages where the bulk of the code has no need for a shadow stack, but it would be required just to support a single feature (like the ones enabled by stack walking mentioned above). C++ currently only uses it where needed, which is infrequent in C++. For a language that would need access to all pointers this way would be a lot higher. A shadow stack is a nice escape hatch to allow a language implementation to do "anything" that Wasm currently doesn't support, but it is a high price to pay, and will put an even bigger distance performance wise between it and "efficient" languages like C/C++/Rust. Stack Walking could fix that. |
One could go even further than this; and ask the question 'why does WASM even have local variables?' Everything could be done by manipulating memory -- shades of FORTRAN. |
Why do the variables need to live in the shadow stack? For example, couldn't the LLVM raising to WASM just flush variables to that frame's shadow frame at escape sites. IIRC, from JSC's days compiling to LLVM, it's capable of tracking the location of any variable for you as long as you are careful enough. |
@kmiller68 I don't see how that is of any help if you need access to potentially all (pointer) variables for use in stack-walking. They'd all have to be on the shadow stack. |
@aardappel Isn't that exactly what the WASM VM would have to do but on the native stack? To clarify, my proposal was not talking using the shadow frame as the source of truth for the function (I think this is called a Henderson Frame but it's been a while). I'm only saying that whenever a store happens to a variable you also store a copy to the shadow frame. This is functionally what the WASM VM is going to do anyway. For what it's worth, I also think this is no more complicated than tracing which variables are needed for the stack walking API. In fact, I would probably use the same abstraction to do both in my compiler. The only optimization I can think of that stack marking can do, which shadow frames cannot, would be to figure out where callee save registers got saved in the callee. But doing that would potentially mean spilling every callee save on any host call (native or JS). It could also make walking the stack slower since you'll need to maintain callee save information. So it's not 100% clear if tracking callee save registers would actually be profitable, although it could be. |
I suggest that we're not in a position to reasonably predict the performance tradeoffs of the two alternatives, given how drastically different they are and how prior implementations have not been under wasm's type-safety constraint. Further, performance was only one of the concerns raise, and I personally am more interested in the other concerns. Now, here is the question I have been pondering since making this issue: do we split this proposal into parts or keep it whole? On the one hand, adding stack marks and stack walking could be done "relatively" quickly, so peeling that off for quicker release is appealing. On the other hand, the feature that stack marks and stack walking is most likely to affect is first-class stacks, so keeping them together so that we can ensure they all get along is appealing. Plus, some of the vocal excitement has been specifically about first-class stacks. (All that said, even if we keep it whole, I would want to make sure stack walking would not depend on first-class stacks so that engines can choose to support the former lighter-weight construct and not the latter heavier-weight construct.) And, as what may seem like a digression, there's another question I've been pondering for months: what dynamic language would be the best test of the Garbage Collection proposal? My sense is that the answer to that question is the Lisp/Scheme/Racket family. Its complex array of values and its heavy use of dynamic tests and casts seems ideally fit for evaluating the GC proposal with respect to its weakest aspects: the inability to control low-level representation and the ever-imposing safety constraint. The problem is: the Lisp/Scheme/Racket family needs continuations! For that same reason, the Lisp/Scheme/Racket family would be ideal for testing this proposal. Plus, it's the continuation-marks feature of these languages—and the related implementation, control, and language/module-interop research—that inspired much of the design of this proposal. But again, the problem is: the Lisp/Scheme/Racket family needs garbage collection! So my ideal test language for both proposals needs both proposals. Furthermore, both proposals stand to benefit from each other. For example, first-class stacks need host-managed garbage collection, and linear-memory garbage collection needs (or at least benefits from) stack walking. So, in my ideal world, these two proposals would progress roughly alongside each other, i.e. be no more than a phase apart. Maybe that ideal is impractical—y'all would know better than I and I'm happy to go with your collective advice—but I wanted to at least put it out there. My sense is that it would at least address a number of concerns that have been raised about both proposals. (Side note: none of this addresses the security concern. For that, it occurs to me that the host could implement the proposal with a shadow stack of stack marks, something the host could do probably much more efficiently than the wasm module because it's not limited by wasm's abstractions and type safety.) |
If shadow stacks are such a good idea, why don't all compilers use them when targeting machines like arm and x64?. (Rhetorical question) |
Some feedback from the Go side here on this proposal. I've been working on an alternative Go compiler (TinyGo) which also targets WebAssembly, hence my interest. There are three things that are currently not well supported in WebAssembly that make it hard to target this instruction set. All three are somewhat related to this proposal:
|
This is also how we would prefer to implement Erlang processes for Lumen as it more closely matches the native stack switching we can do when targeting x86_64. Currently we do WASM processes by keeping the instruction pointer stack in a Rust Vec for resumption while doing nested calls when we think we won’t blow the WASM stack. |
@kmiller68 I mentioned a few reasons why I think using GC pointers on the shadow stack could be significant slower (and bloated) than on the Wasm stack, but I agree with @RossTate that we don't know until we try. @RossTate I'd personally be very much in favor of treating stack walking separate from first class stacks, as indeed it sounds like much simpler machinery that has its own use cases. The existing GC proposal (@rossberg's one) seems to highly favor statically typed languages, so I don't see how the Lisp family of languages would be a good test case. I'd expect one out of Java/C#/Go/Kotlin/OCaml/Scala would work best? |
My understanding is that dynamically typed languages are hoping to target wasm just as much as statically typed languages. If they're not served well by a design (which I'm not claiming here is the case for the existing GC proposal), or if they could offer useful novel perspective on how to improve a design, then it would be good to know that and hear that perspective earlier rather than later. The GC proposal currently states that it should support dynamically typed languages, I'm guessing for this reason. And my sense is that the Lisp family would offer the most novel perspective amongst the many dynamically typed languages, especially since we intrinsically already have the dynamic OO perspective from JavaScript even if we don't evaluate the GC proposal on JS. |
Somewhat offtopic, but the last time I looked at the GC proposal it lacked a very important feature for Go, which is interior pointers. Even if they were implemented as fat pointers, that would still be a massive improvement over the current proposal (both in performance and in implementing support for it in TinyGo). (But please note that I am only working on TinyGo, it may very well be that the main Go toolchain doesn't want to use the WebAssembly GC). |
Oh, that's useful feedback. Thanks! Also, you reminded me of another aspect of Go that's relevant to this discussion. Go uses pointer-escape analysis to determine if something should be allocated on the stack or on the heap. Now suppose the GC proposal's type for heap references, I'll use I should also note that both interior pointers and the use for stack-bound types are not exclusive to Go. C# also has interior pointers, and C# has a pass-by-reference |
It doesn't yet include specific features to optimise dynamic languages, but it also shouldn't get in the way of dynamic languages. I would definitely want it to be evaluated on some not-too-crazy dynamic language like Scheme, for example.
Yes, interior references are a described as a post-MVP feature. They would probably be among the first features to follow. |
This is not specified by the language, it's purely a heuristic to make programs faster. The spec doesn't even mandate a GC. It is perfectly valid (but slow) to allocate all values on the heap. In fact, I suspect that this will be required when using the GC proposal, as a callee cannot accept both heap and (shadow-)stack allocated values. (Function local variables can of course be kept in wasm locals or on the operand stack). In summary I think the added complexity of |
3 years passed since this proposal was posted and 6 years since initial GC proposal. And we still don't have any viable option except for shadow stack. I think there are several disadvantages in Wasm GC:
And the only advantage of GC here is interoperability, that's true. However this issue was never obstacle for developers to interoperate with native code on native platforms. Yes, it requires more efforts, but performance matters. And yes, there are plenty of use-cases of drawing everything on a single canvas. So I don't see any reason to ignore stack walking proposal just because of GC spec. I think WebAssemly could provide both features. |
I agree that roll-your-own GC is going to remain an important use case. With regard to built-in support for stack walking, however, nobody has put in the effort of proposing a sufficiently concrete design, as far as I can tell. And it is highly non-obvious what a solution would look like that
I'd be very curious to see a concrete proposal. |
@rossberg what are the problems here? I don't see any. For example, one could:
As for stack switching, it's not necessary should be compatible with stack walking at the very first version. Many proposals aren't perfect (e.g. exception handling), which does not prevent from implementing them in the browsers. |
@rossberg GC is also incompatible with threads. Technically, it is, but there are only synchronization primitives on flat memory, not on objects. So if I'm working on JVM for WebAssembly and I want to use this GC spec, I end up with single-threaded version. For me it's ok for now, my JVM is single-threaded. But in my case I also would be satisfied with single-threaded stack walking. I don't see any problem to extend single-threaded stack walker to support multiple threads in the future. |
There is in fact a proposal for this: https://github.com/WebAssembly/root-scanning/blob/main/proposals/root-scanning/Overview.md (and #1459). It was approved for Phase 1, albeit contentiously. However, afterwards some key shortcomings were identified in discussions due to its coarseness and dynamic nature, and significant questions were raised about how it would interop with stack switching. I have since developed a design that addresses the shortcomings, but it is currently impossible to address the questions about interoperating with stack-switching when there is so much uncertainty about the design of stack-switching. For what it is worth, I have provided a design for stack-switching that would interop very cleanly with my design for root scanning (see WebAssembly/stack-switching#40, where a "resumption type" can easily be extended to include information for root scanning), but I cannot speak to whether this design will be adopted. |
As an author of JVM over WebAssembly I would never benefit from stack switching, I simply don't need it. However, I would benefit from stack walking. Can I get stack walking incompatible with stack switching (e.g. WebAssembly module verification does not pass when these features are used simultaneosly)? |
@konsoletyper, "compatible" doesn't necessarily mean that the combination is available from the beginning, but that there is a sufficiently clear path towards enabling it. That's e.g. the case for concurrent GC, which no doubt Wasm will include in the future. For stack walking, that path is less obvious. That doesn't imply that the problem is unsolvable, but it will need a champion to work it all out, demonstrate the claims, and push through a proposal. (FWIW, never say "never", Java 19 is already incubating structured concurrency and lightweight threads based on delimited control, as part of the longer-term Project Loom.) |
Based on feedback from the discussion at the CG meeting, @fgmccabe and I would like to propose adding stack-walking primitives to WebAssembly based on the primer in WebAssembly/exception-handling#105. For those new to the topic, the following is a high-level description of the proposal.
Many high-level languages keep the stack as an abstract entity that is heavily used for implementing the language but which the programmer rarely directly manipulates. This proposal is for a small suite of primitives that enable language implementers to provide specific functionalities that depend on access to the stack but without compromising the security guarantees provided by WebAssembly. Examples of enabled functionalities include better support for language-provided garbage collection, better support for debugging, better support for exception handling, support for stack-allocated closures, support for simple yield and resume computations, and, in contemplated future extensions, support for stack switching and coroutining.
The core of the stack-walking functionality is centered around two features. The first feature is executable stack marks, which associate stack frames with executable code. This executable code can, for example, indicate what line of source code the stack frame corresponds to, can inform which 32-bit integers on the stack frame correspond to addresses that should not be collected as garbage, and can display the source-code-relevant contents of the current stack frame and even update the contents of the stack frame. The second feature is a framework for stack walking, which enables WebAssembly code to walk up the stack to find and execute these stack marks.
The stack-walking functionality imagined does not permit arbitrary inspection of stack frames nor of arbitrary locations on the stack. Rather, the regions of the stack that are intended to be inspected are declared as part of the stack-mark primitives. This is so that stack walking can be done in a safe manner within WebAssembly, keeping in line with WebAssembly’s existing security considerations, and furthermore can be designed independently of features such as memory-managed references, maintaining compatibility with low-resource engines.
We propose establishing a new activity to clarify and specify these primitives and to construct a formal proposal for adoption by the WebAssembly community.
The text was updated successfully, but these errors were encountered: