Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketch: Linear memory GC with IT #78

Closed
aardappel opened this issue Feb 21, 2020 · 20 comments
Closed

Sketch: Linear memory GC with IT #78

aardappel opened this issue Feb 21, 2020 · 20 comments

Comments

@aardappel
Copy link

aardappel commented Feb 21, 2020

This sketches an alternative way in which GC could in theory work in Wasm. It is probably crazy, but bear with me, it may have some merit.

NOTE: this is NOT meant to be a "proposal", it is merely a sketch of an idea with most details left open, that puts emphasis on different advantages compared to the existing proposal(s), intended for discussion. If some of these aspects are deemed important, then maybe they can inform future GC features.

I will focus on these aspects of what would make for a good GC provided out of the box by Wasm:

  1. Allow modules to have access to an industrial strength GC at zero code size cost.
  2. Allow different languages (and the host) to exchange data easily.
  3. Keep cost of GC as local and scalable as possible.
  4. Allow existing language runtimes/codegen to interact with the Wasm GC.

This proposal keeps (1), arrives at (2) thru different means, and improves upon (3) & (4) compared to the existing GC proposals.

The basics of the idea are simple: A linear memory program/runtime assigns certain 64KB pages to the Wasm GC. The Wasm GC manages this memory as it sees fit. The linear program can address this memory, but only has minimal guarantees on what layout to expect. Inter module communication uses Interface Types, not anyref. A special kind of i32 provides exact roots.

Now, the details. This trivially retains (1), so let's tackle (2) & (3) first:

Inter-module data exchange.

The existing GC proposal(s) assume it is desirable that different modules can share references to GC objects freely, without incurring the cost of copying, and with the benefit of being able to collect inter-module cycles trivially. There are however issues with this approach:

  • The existing GC proposals do not define language "objects", they define low level GC building blocks (structs and arrays) that can be used to represent them. Different languages may have wildly different representations, especially when it comes to e.g. vtables and other implementation techniques. This means that 2 languages can typically not access objects they receive from each other directly, they can merely refer to them. For, say, Java and Haskell to exchange an object, they'll likely have to go through some form of serialization anyway. Even in the context of browsers, current JS objects (and existing JS APIs) have completely different semantics and representation from GC "structs" (yes I am aware of the JS typed objects proposal, which will help only in limited cases). We already intend to access host APIs with IT.
  • Different modules holding on to each others memory is an anti-pattern, not a feature, the way I see it. "leaking" memory in a GC program is very easy, and now being at the mercy of every other module you interact with to not cause leaks in your module seems like a very brittle scenario. We cannot rely on programmers to write code that does not leak, so shielding modules from it at the module level allows for more predictable and stable systems.

So instead, we'll use Interface Types. The potential for Interface Types (with Shared-Nothing-Linking) to revolutionize how we build software out of parts, and how we can work with different languages is huge. If most interfacing with GC language modules is going to go over IT, we might as well go all the way, and get some of the benefits this would bring:

  • GC being a per-module thing may have great performance benefits. Most GC algorithms do not scale linearly with the size of data being managed, having 10s (or 100s) of modules all having small, independent GC spaces allows them to be collected more quickly (with less "lag") than trying to be collected all at once, can allow collecting to trivially happen on independent threads, can even allow different GC algorithms/configurations per module that suit the language, etc.
    • It may be able to use simpler, faster collector algorithms: collecting all objects across modules and multiple threads requires incremental and/or generational designs that come at additional cost in terms of barriers that need to be checked and synchronized, something that a collector for a smaller space (with control of threading) possibly can do without.
    • It may be more suitable for Wasm runtimes that are memory constrained (embedded and mobile).
    • It allows the cost of memory use to be accessed and controlled on a per module basis, rather than the "tragedy of the commons" that could result in a many-module single GC space scenario.
  • Each module can have a lot of freedom in designing (and changing) their data layout, since there are no constraints that we'd try to match something other languages are doing for exchange reasons, or having to stay compatible over time.
  • Emphasis on copying rather than sharing in API design will in general result in simpler, more general APIs. Some of the cost of copying will be offset by the fact that once the data lands in another module, it is fully under that modules control. It will make it easier to swap out modules with new algorithms, languages and language implementations.
  • Cross-module references (identity of objects) should be discouraged as much as possible, but where needed it is something that is better off becoming a feature of IT, which would mean it also becomes more general (more useful in exchanges between all languages, including non-GC ones). There will always be reasons why you'd want to establish some form of identity across modules, but this is best left to a high level feature (which is very much opt-in) than a low level feature like anyref that affects every possible object, and can't be opted-out of.

Now for (4):

Runtime interaction.

The vast majority of current language implementation come with some form of "runtime", usually written in C or C++. Besides the GC algorithm (which we're trying to save them from having to ship), this can include other support functions that implement the runtime semantics of the language, implementations of the built-in functionality, and general standard library implementations. Typically all that code will access (and create) GC objects directly. Under the existing GC proposals all this code would need an almost full rewrite from C/C++ into a language that is Wasm-GC aware (i.e. can emit Wasm-GC object access/creation opcodes directly from the language, to be efficient). There may be ways for this to retrofitted on to C/C++ (using intrinsics?) but how this would flow all the way thru the LLVM pipeline is an open question. Even if we had this functionality, this would still require a significant rewrite of a language runtime.

Will language implementers go thru such a rewrite? Or will they throw in the towel, and port their existing GC to linear memory, not using a standard Wasm GC at all?

There is generally the question of how well a standard generic Wasm GC will cater for all languages. It is well known that many languages have intricate GC designs, whose functionality intertwines with the language's promised semantics. Arguably, for the runtime to be able to interact with a Wasm GC directly in linear memory would expand the possible cases where such a GC can be used, in favor of porting an existing GC. Given that the GC is per-module, it can possibly be parametrized to fit the language, rather than collecting across languages generically, which means it cannot take anything about the language into account.

How would this GC/runtime interaction go? First, the linear memory program needs to assign pages to the GC, which it could do either manually, or we could also imagine a more managed system where the upper part of memory is always for the GC (such that the GC could grow it independently of the linear program, though possibly making memory.grow more expensive if the GC memory needs to be moved). I am sure there are many possible ways to do this, for now assume there exist linear memory addressable GC pages.

The GC would need to know the roots outside of its GC space, which I imagine would come in two forms: references on the Wasm stack, and optionally from a table managed by the linear program of additional roots. These roots can be used by the linear program as naked pointers (i32), so wouldn't be an anyref. To help the GC be exact (non-conservative), we'd introduce a new (sub)type of i32 that, when present on the Wasm stack or a table, would be a root (let's say i32r). This type would degrade to i32 when stored in (unmanaged) linear memory, and would then not guarantee the pointee to be retained (it is a "untraced ref" that can only be used if the linear program guarantees there is at least one active i32r on the stack/tables or i32 in the GC pages).

Existing load/store ops can be used to access these objects, using either i32 from linear memory or i32r from the stack. We'd define what a GC struct and an array must look like in the GC pages of linear memory, which given their low level nature, doesn't sound too much of a restriction to GC implementation (any additional data a GC wishes to store can be at negative offsets).

The GC makes NO promises as to the state of memory outside of well defined parts of structs and arrays, much like it is already undefined behavior to even access memory outside of a malloc returned block in C/C++.

i32r values can possibly be rewritten by the GC to point elsewhere, invalidating any "untraced ref" copies in linear memory. For this to function there needs to be a minimal level of cooperation between the linear memory program and the GC as to when these changes may happen. How, TBD.

There is of course the ability for the linear program to access GC memory, and even corrupt it. This does not seem like a problem from a language point of view: almost all programming languages "trust" their runtime code already, and all have the potential for bugs in said code to disrupt the otherwise safe execution of the language. Given that this is now isolated to a single module, the impact of this is smaller than if there was a bug in a Wasm runtime, affecting all languages within.

It does have the downside of requiring a slightly more conservative Wasm GC implementation, i.e. i32s loaded from linear memory will need to be bounds-checked, whereas the existing GC proposals can directly treat anyref as a pointer. I think this is a small cost compared to the potential for optimisation this new system otherwise brings (see above). It generally goes along with the existing Wasm philosophy of intra-module trusted, inter-module untrusted which IT builds on, and which existing GC proposals complicate.

A final problem may be non-determinism (but only for "buggy" programs): access to GC memory outside of designated objects could result in different data under different Wasm implementations. We'd have to specify that this is undefined behavior?

@kripken
Copy link
Member

kripken commented Feb 22, 2020

I think this is very interesting! I like how it leaves cross-language stuff to Interface Types, and how it might help port VMs.

To make sure I fully understand: is the idea that the compiled VM, when it allocates GC objects, allocates them in these special GC pages. Then when the wasm VM does a GC, it traces from the roots through objects in those pages, using a layout that is agreed upon (I assume that's what you mean in "We'd define what a GC struct and an array must look like in the GC pages of linear memory"). Then it would somehow inform the compiled VM of which objects can be freed, but not touch memory itself?

If that's generally correct, it sounds good to me!

Some questions:

  • Do we actually need special GC pages? Given the i32r values which are the roots, the wasm VM can do GC that potentially touches any part of linear memory. What's the benefit of marking some pages as GC? (maybe detecting a bad/corrupt pointer early, but is that worth it?)
  • I think each GC object in linear memory needs an identifier of what its layout is, so the wasm GC can trace it? (And for an array, a length.) Or is there a way to avoid that?
  • Could this be extended to support moving GC somehow? Without that, fragmentation in linear memory is worrying.
  • Could this be extended to support threads?

@RossTate
Copy link
Contributor

This is definitely an interesting idea, but after playing with it for a while it needs some fixing up.

First of all, it's not enough to just know the roots. You need to be able to walk the gc-allocated objects and determine what is transitively reachable from the roots. But to do that you need to be able to distinguish references from integers. So the first change is that you need all of this gc memory to be i31rs so that the gc can reserve a bit to distinguish pointers and data. But even that's not enough, since you're relying on adding integer offsets to references. So the engine needs to know, given an i31r, what range of addresses it points to. The simplest way to do this is to allocate arrays of i31rs, rather than preset page sizes, and have an i31r always reference the head of the array. Now you have something that's precisely garbage collectable. Furthermore, it no longer needs to be the case that a reference is an integer, so its semantics can be deterministic.

Of course, there's a big downside, which is that i32/f32 fields would have to be split across two i31r fields, and similarly i64/f64 would have to be split across three i31r fields, though one could consolidate the overflow bits of these fields into one i31r field via some bit arithmetic. A smaller downside is that it's no longer possible to support efficient CAS on i32 fields, which a number of concurrent data structures rely on (if ever we get to that point).

Given that, an alternative design is to, at each allocation, specify which indices in the array are references and which are i32s. In addition to storing the size of the array in some header info, the GC could store a bitset tracking this invariant. Then whenever we access/mutate an index in a mixedref, we specify whether it is supposed to be an i32 or a mixedref index, and the GC dynamically checks this expectation with the bitset in the header.

In both these designs, we're stilling using integers to index into things like the funcref table or the "external reference" table, so it will still be impossible to detect cycles across these different kinds of references. Similarly, we'll have to add an i31r/mixedref table for anchoring references that linear memory has a handle on (via an index into this i31r/mixedref table).

How do those patched designs sound to you @aardappel?

@skuzmich
Copy link

Inter-module data exchange.

Couldn't all this (local GC scope, API stability, leakage prevention) be replicated in current proposal by just not sharing references across modules? I feel like you can have all the benefits but without field access bound checks, and with ability to not use this restricted mode and share objects away.

Runtime interaction.

On the flip side, there are a lot of language implementations that don't compile to machine code ahead of time. They use built-in mechanisms of virtual machines like JS or JVM to allocate memory for them. They would have to start maintaining C/C++ runtime if you require them to deal with linear memory.

@lars-t-hansen
Copy link
Contributor

I've been thinking about this problem too and I'm glad to see I'm not alone :-) Speaking narrowly about the GC implementation bits, some attempts to summarize and comment:

  • roots are designated by the new i32r type (for locals and eval stack slots and globals) + an there's an explicit programmatic root table (add/remove root) probably containing addresses in linear memory that are thus declared to hold i32r
  • in-object i32r slots are designated by object descriptors which are attached to the object at allocation time by the GC in response to some description given to it by the program, and the GC can attach those descriptors however it likes ("at negative offsets" from the object ptr was mentioned; BiBoP is clearly a possibility too; for small+closed type universes pointer tags would be nice)
  • moving collection is explicitly supported, but:
  • the story for safe points is pretty vague
  • the story for interior pointers is missing, but one big motivator for this type of GC work is specifically to support interior pointers, plus interior pointers will appear as a matter of course by doing pointer arithmetic on i32r for various optimization purposes both in the wasm code (LLVM hoists) and in the engine (the JIT hoists)
  • the story for tag bits is missing, yet one big motivator for this type of GC is to allow pointer tagging for languages that favor that
  • the story for barriers for incremental/concurrent/parallel/generational collection is missing ("can possibly do without"). We were discussing 4GB heaps just the other week, and bigger heaps than that are on their way, so dismissing non-stop-the-world collection is not a great idea. In addition, the appeal of bringing a production-quality GC to wasm is its support for just these techniques.

(The use of the wording "weak ref" is also unfortunate; "untraced ref" might be better.)

Given that we store i32r as i32 there will need to be some kind of downcast functionality for i32 -> i32r (whether a cast instruction or something embedded in a load instruction; it amounts to the same thing). Thus it is possible for some references to live as non-references in locals / on the eval stack for stretches of the program; this ties into the need for a robust safepoint mechanism.

@RossTate
Copy link
Contributor

Will language implementers go thru such a rewrite? Or will they throw in the towel, and port their existing GC to linear memory, not using a standard Wasm GC at all?

To bolster @skuzmich's point above, it is unrealistic to expect this code to work in wasm. Much of the complexities of run-time systems have to do with features that wasm either doesn't support or takes over, such as JITing or dynamic loading of code. Even your lower-level proposal here doesn't support the functionality I've used for my toy research languages, so I'd have to make a separate wasm implementation for them anyways rather than our current implementation using LLVM.

GC being a per-module thing may have great performance benefits.

From what I understand, y'all want wasm modules to be able to have references to JS/DOM values. Once this is possible then you now need a multi-module GC. Similarly, if there is any way to import/export function pointers that are closures, then again you need multi-module GC.

Each module can have a lot of freedom in designing (and changing) their data layout, since there are no constraints that we'd try to match something other languages are doing for exchange reasons, or having to stay compatible over time.

This seems to be more of an argument against anyref than multi-module GC.

Emphasis on copying rather than sharing in API design will in general result in simpler, more general APIs.

Yes, but we should not throw out sharing of mutable data structures or large immutable data structures altogether. For example, with GC it would be easy for WASI to provide an mmap function that returns a GC array of i32s, which msync then flushes any mutations of to file.

Note that none of these comments are about the proposed design sketch; I'm just trying to clarify the motivation here.

@aardappel
Copy link
Author

@kripken Yes, that is mostly correct, except that the "free-ing" is something that the Wasm GC can do itself, since it is entirely in control of what gets stored where in the GC pages. Thus the linear program doesn't need to be informed.

Your Qs:

  • For maximum freedom in how GCs are implemented, "pages" seemed like a useful restriction that could allow some optimisations in bounds checking, but it is not strictly required, in theory it could be any block of memory handed out by the linear program. Or like I mention above, it could be under control by the Wasm GC more directly by requiring it to be the top of linear memory.
  • Yes, mapping an actual i32r to a layout in a GC page will need a mapping to its type. That could be an index at a negative offset or similar. There may be cases where that can be avoided, such as when the GC implementation decides to store objects of a similar type adjacent in memory, or when using information in the type system such as "field x of type T always points to a U".
  • I was hoping it would support moving GC, yes. If a GC is only triggered from an explicit call to do so, then the linear program would know when any i32rs might change values, and thus when any temporary i32s derived from them would be invalidated.
  • I haven't given threads a lot of thought, I admit. My assumption is that it will be easier than for the existing proposals, given that this only needs to coordinate for a single memory, but there may be ways in which it is worse due to its more "typeless" nature.

@aardappel
Copy link
Author

@RossTate I guess I wasn't clear in that I was assuming we'd have typed definitions of structs and arrays similar to the current proposals, such that once the GC knows it is looking at a particular type of struct in the GC pages, it also knows how to distinguish between refs and scalars inside of it (see also my reply to @kripken above).

What you are talking about would be a design where we don't have to rely on such definitions, but the GC would work generically on blocks of N refs (and M scalars, possibly always in that order) which could be even more flexible, especially for more dynamic languages that may decide how to layout an object at runtime. This may well be superior given that we can't use static type information entirely statically anyway. Minimum header information would be storing N and M, and it wouldn't have to distinguish between structs and arrays?

I generally like this direction, yes, but I'll need to think about the consequences some more.

@aardappel
Copy link
Author

@skuzmich Yup, if from the above discussion we conclude that "local GC has many benefits over global GC", and we enable that optionally, by default, or even always in the existing proposal, that would be a great outcome.

Re: not compiling to machine code ahead of time.. Wasm requires that currently (we have no known viable JIT path yet). Every Wasm program will always have some linear memory, and I don't think "allocate and manage an object for me" is much different in the existing GC proposal or what I am sketching here.

@aardappel
Copy link
Author

@lars-t-hansen:

  • Pages per type would be a good implementation technique, yes.
  • Agree that it is hard to say how safe existing C/C++ runtime code would be in the presence of a moving GC, even with explicit GC triggers. One advantage of what I am suggesting here is that each module could potentially request a customized GC, so it could request a non-moving implementation. This would of course complicate Wasm implementations if they want to provide both (assuming moving is faster).
  • Good point about interior pointers being unavoidable. That may work with "typed pages" for structs, given that it is easy to compute the start of a struct, though for arrays that wouldn't be so easy.
  • My guess is pointer tagging should still be possible. We currently assume something is a pointer because either it is an i32r type of because it resides in GC pages at a particular location. There's no reason why bits in that i32 couldn't change the meaning of what we do with this "pointer".
  • True, doing concurrent GC is potentially harder when you're sharing the data directly with the linear program. I'm not sure if barriers are even possible.

(I'll edit the original post to use "untraced ref")

@aardappel
Copy link
Author

@RossTate

To bolster @skuzmich's point above, it is unrealistic to expect this code to work in wasm. Much of the complexities of run-time systems have to do with features that wasm either doesn't support or takes over, such as JITing or dynamic loading of code. Even your lower-level proposal here doesn't support the functionality I've used for my toy research languages, so I'd have to make a separate wasm implementation for them anyways rather than our current implementation using LLVM.

Essentially you are saying languages will need a rewrite of their runtime anyway to target Wasm? And what would you write that new runtime in? (since it needs some combination of minimum overhead while still being GC instruction aware).

If you're right, then indeed the argument for direct linear program <-> GC interop goes away. It is my impression that it will not be so easy to replace existing runtimes, however.

From what I understand, y'all want wasm modules to be able to have references to JS/DOM values. Once this is possible then you now need a multi-module GC. Similarly, if there is any way to import/export function pointers that are closures, then again you need multi-module GC.

It is all about defaults. If even the smallest objects in each module, and each language, and the host can all arbitrarily refer to eachother, then yes, you need a global GC for cycles to not become a huge pain. I feel this is the wrong granularity of sharing though, and thus the wrong default. If the default is (semantic) copying, and the vast majority of cross module exchanges happen this way, then the opportunity for cycles we're not aware of is vastly reduced (if cross module refs are something you need to explicitly opt-in to). We still need such refs, especially for big expensive objects like textures etc, but these would be much more explicitly managed. Giving even the smallest struct this power is lifting the pains of sharing and leaks to the level of a many-module soup, where it is hard to know who is responsible for the problems.

Yes, but we should not throw out sharing of mutable data structures or large immutable data structures altogether. For example, with GC it would be easy for WASI to provide an mmap function that returns a GC array of i32s, which msync then flushes any mutations of to file.

Agreed (see above).

@aardappel
Copy link
Author

From feedback I've gotten so far this idea really contains 2 parts, which could really be considered:

  1. The idea that each module and/or memory having its own separate GC space (with exchange of data across them going almost entirely through Interface Types) has advantages over the assumed global GC of current proposals.
  2. If you have (1), it now becomes feasible to expose GC memory as part of linear memory, allowing runtime linear memory code to interact more naturally with the GC.

It seems so far that (1) may have more merit, and that (2) has some tricky issue to solve, and potentially some bigger downsides as a consequence.

It may be worth discussing these separately, but for now maybe it is enough to be aware that (1) does not need to imply (2).

@skuzmich
Copy link

@aardappel there are different types of WebAssembly ports with vastly different needs. From what I see, there are two sides of the spectrum of approaches that can pursued by language implementers when targeting Wasm:

  1. Port the whole native language platform. This would allow to run existing libraries and applications as is.
    Most language implementations would pay some overhead because they were not designed for Web requirements of small code size, soundness validation on client side, and JS interop.
    Implementations would not want to rewrite runtime, including their own implementation of GC. Shipping the actual GC code may not become the major problem compared to the size of the rest of runtime. They would use the same low level tool-chain, like LLVM, as they do for native code.

  2. Adapt to Web platform by sacrificing some of the original platform features. Only very portable or conditionally compiled code can be shared between original platform and Wasm.
    Runtime library would be small or nonexistent. GC would be used everywhere instead of linear memory (maybe except some static data).
    Instead of LLVM they would use code generator with appropriate level of abstraction (Binaryen?).

In case of Kotlin, we have a partial implementation of (1), but we are moving towards (2). In the new prototype, compiler backend is rewritten to target Wasm specifically. Small runtime library is implemented in Kotlin itself with some Wasm-specific intrinsics.

@RossTate
Copy link
Contributor

And what would you write that new runtime in? (since it needs some combination of minimum overhead while still being GC instruction aware).

This is something I have been wondering about. See below for one thought on this.

If you're right, then indeed the argument for direct linear program <-> GC interop goes away. It is my impression that it will not be so easy to replace existing runtimes, however.

I wouldn't say it goes away. My sense is that it is moved. There's no reason linear memory and GC cannot interop in either of the GC proposals. Wasm can have both an i32 and a ref $scheme on the stack, and one instruction can use the former to index into linear memory, followed by another instruction that can use the latter to lookup some field. A program might need some table of gcref $schemes to store some references in a way that is indexable by i32s, and even that could just be a global gcref variable pointing to an array of ref $schemes, but that's about all that has to be added to wasm itself to enable interop. The real issue is generation and tooling.

What would be great is for someone to make a C/C++ library of macros or primitive types/operations that get translated down to new LLVM types/operations and then to GC types and operations. Then I could write C/C++ code that actually operates on GC references. If I'm careful to only use variables on the stack, I could even make the compiled code not need linear memory.

So my sense is that your point (2) is really a tooling problem and not a gc-wasm problem.

@tlively
Copy link
Member

tlively commented Feb 25, 2020

What would be great is for someone to make a C/C++ library of macros or primitive types/operations that get translated down to new LLVM types/operations and then to GC types and operations. Then I could write C/C++ code that actually operates on GC references. If I'm careful to only use variables on the stack, I could even make the compiled code not need linear memory.

This is something we've been thinking about. At the very least we will want to make it possible to write .s assembly files that use GC types and instructions in functions that can be passed to clang and linked with normal C/C++ code. If we can solve the problem of modeling GC types in LLVM IR and other layers of LLVM, I would also like to expose as much of the proposal as possible as source-level intrinsics and primitive types. But I would rather not make changes to LLVM IR itself, for example by adding new types, so this is going to be an interesting problem.

Whether interop requires hand-written assembly or calling intrinsic functions, there will still be significant porting effort required.

@aardappel
Copy link
Author

@RossTate

So my sense is that your point (2) is really a tooling problem and not a gc-wasm problem.

If you're just loading a the value of a field, and that code has to be hand-crafted anyway since there is no way to do it in unmodified C/C++, then whether it is done with an existing load instruction to linear memory or a new instruction to outside linear memory doesn't make a lot of difference, indeed.

But you can do a lot more when things are linear memory addressable, including passing on (parts of) that memory to existing C/C++ code. New field loading instructions would have to create/allocate copies in linear memory to allow that, and while copies at the granularity of module boundaries is probably mostly fine, a lot of runtime code is extremely performance sensitive. Besides efficiency, copying also loses identity/mutability which may matter in some cases.

To me the biggest difference is thus not in tooling, but in the runtime. It may also make tooling easier, for example in the existing proposal the tools will have to guarantee that under no circumstances an anyref may land in linear memory, which will be challenging given LLVM's design. This can be somewhat relaxed if the GC is in linear memory, though there are challenges there too.

@RossTate
Copy link
Contributor

I put together a primer on low-level stack and control-flow primitives. One of the first examples is how to support marking of GC roots, in case y'all are interested. WebAssembly/exception-handling#105

@aardappel
Copy link
Author

To be clear, what @RossTate is proposing in WebAssembly/exception-handling#105 would allow more efficient linear memory GC with a GC written in Wasm itself, whereas the current issue allows more efficient linear memory GC with a GC provided by the engine. Both ideas share reasons why having GC objects in linear memory might be desirable (access by the language runtime, etc).

The diversity of current GC implementations and language implementations is enormous, so I personally think both these options are worth pursuing, and possibly providing in parallel. Offering more options to language implementers is what will make the Wasm eco-system thrive.

@aardappel
Copy link
Author

One thought on forgeability of GC roots: The current issue proposes that roots on the stack/local would be exact, but since GC linear memory is accessible by regular stores, a GC would need to do bounds/type checks to perform an actual GC. This favors the linear memory runtime having unfettered access and thus simplicity of implementation.

A different tradeoff is however possible: make the GC linear memory pages readable by the Wasm code, but writeable only by special new instructions. This would allow the GC to operate at full speed without checks. The Wasm code can perform stores on arguments directly from stack/locals (which are exact), or there may be a special cast instruction i32 -> i32r that does bounds/type checks, so the cost of this is isolated only where it is needed (when the Wasm program temporarily feels the need to store roots in linear memory, which may be unavoidable since some of this depends on compiler code generation, for e.g. reference or struct arguments).

@aardappel
Copy link
Author

aardappel commented Jun 1, 2020

In the interest of tomorrow's GC meeting, for those who do not want to read all of the above, here's a shorter summary of the salient points this issue is making.

First, as apparent by the discussion, there are 2 parts to this idea of which the first part can be evaluated independently:

Idea A: Having many modules written in many different languages communicate over interface types may well be superior to holding on to each-others references.

  • This allows separate, isolated GCs per module, which may have performance benefits (GC parameter choices per module, being able to distribute collecting modules over time rather than all at once, reducing GC stalls, clearly identifying memory use per module etc)
  • GC "structs" are not a data type, they are the "assembly language" upon which each language will create their own objects: these language objects will not be compatible with eachother anyway, and will require some form of common serialization to be communicated. This means the usefulness of having a single GC space is just anyref, not direct data exchange. IT is what provides that data exchange.
  • Holding on to references can produce memory leaks, and the idea that a bug in one module can cause unbounded memory usage in unrelated one (one it wasn't even ever tested with) is very undesirable. Since IT defaults to copying, individual modules can be more isolated and robust.

Idea B: IF we're going to have per module GC, there is the opportunity to make GC objects linear memory addressable.

  • Many language runtimes are big piles of C/C++ code, and with current GC proposal, we're asking language implementors to rewrite/replace that code, since now any access to GC objects needs to go over special GC instructions that C/C++ don't emit. For many languages, just porting their existing GC to linear memory would be a lot easier, bringing along costs of runtime size and shadow-stack usage. The ideas in this issue could bridge these problems effectively.

If you want to discuss the above points however, please make sure to having read the entire issue before doing so :)

@tlively
Copy link
Member

tlively commented Jun 16, 2022

Since we're now at phase 2 with the current MVP, I'll go ahead and close this.

@tlively tlively closed this as completed Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants