Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion - event from the JS Garbage Collector #238

Closed
lskyum opened this issue Jun 29, 2015 · 33 comments
Closed

Feature suggestion - event from the JS Garbage Collector #238

lskyum opened this issue Jun 29, 2015 · 33 comments

Comments

@lskyum
Copy link

lskyum commented Jun 29, 2015

Suppose one would like to nicely wrap C++ objects in JavaScript objects, so they could be exposed to JavaScript developers. I suppose many uses for wasm will be exactly that - at least in the beginning, until Java/C# is available.

With Emscripten it is currently nessesary to manually delete such objects, but it would be nice if one could just catch an event when the JS object is garbagecollected.

I suppose this is really a feature request for a "rooting API", to allow for weak references, that can provide an event, when an object is garbage collected.

@sunfishcode sunfishcode added this to the Future Features milestone Jun 29, 2015
@paleozogt
Copy link

This has been my biggest problem with using emscripten (which is not really its fault). We definitely need some kind of "finalizer".

@kripken
Copy link
Member

kripken commented Jun 29, 2015

There is a proposal for a next-event-loop finalizer type event in future JS. It should solve this problem.

@trevnorris
Copy link

@kripken Has any type of spec for that been created, or is just in discussion phase?

@kripken
Copy link
Member

kripken commented Jun 29, 2015

Not sure, I heard about it on twitter from @BrendanEich.

@kg
Copy link
Contributor

kg commented Jun 29, 2015

There are a couple very old strawman APIs for JS weak references but I don't think there's anything current. It's definitely my understanding that they are going in eventually, because someone figured out how to implement them without compromising JS's security priorities. (I've been campaigning for them for a couple years.)

@BrendanEich
Copy link

I will put weak references on the July TC39 meeting agenda. The old proposal is here

http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs

What's most needed is someone on TC39 to champion this proposal for inclusion in ES2016 or (more likely at this point) ES2017.

/be

@abustin
Copy link

abustin commented Jun 29, 2015

So what is the current strawman for binding C and JS? Interested to hear peoples thoughts on this.

Could WASM be able to leverage existing JS engine primitives? Each engine has its own C API for creating JS function, object bindings with event callbacks (such as destroy). Could there be a common/generic JS engine abstraction lib for WASM to bind into the JS environment? Or would that over engineering the problem?

@trevnorris
Copy link

@abustin In V8 you can use Persistent<> handles and set the weak callback. Do it all the time to cleanup attached C++ class instances after the object is no longer in use.

As far as cross-VM compatibility, been having discussions in node's API WG about abstracting away from V8. The biggest blocker is lack of information around object and GC life cycle. Enough so that it's prevented progress.

@lskyum
Copy link
Author

lskyum commented Jun 29, 2015

I think we need both weak references and finalizer events to solve this issue. If a wasm module should return an existing JS object it would need to reference it. The reference should be weak to allow GC.

I am not sure the proposed WeakMap will work here because the weak part is the key and not the value.

@trevnorris
Copy link

@BrendanEich Was expecting an API that would notify when the object was no longer in use. With the API in the strawman would have to create a Set of WeakRef handles and iterate over them at intervals so any additional cleanup could happen. Feels like implementing an inefficient GC in JS.

@BrendanEich
Copy link

WeakMap is not relevant. Tattoo this in your inner eyelids!

@trevnorris, please read the ecmascript.org wiki page about GC non-determinism. We are not going to expose a callback from the garbage collector's sweep phase, entailing post-mortem finalization and the possibillity of ressurection as in Java. That is a non-starter.

The event loop helps here. At a later turn, some notifications based on definite finalization fire, without disclosing engine-specific GC schedules.

/be

@trevnorris
Copy link

@BrendanEich The only feature we depend on in node.js is to be notified when an object is no longer referenced in JS. Used to cleanup I/O handles in certain cases. Unfortunately this has been a blocker for abstracting away the native implementation so node could be supported by multiple VMs.

V8 does (probably unofficially) support being able to resurrect objects about to be GC'd, Though node.js core doesn't use it, and have never seen it used in the wild.

EDIT: Correction. The 'vm' module does make use of resurrecting objects.

@abustin
Copy link

abustin commented Jun 30, 2015

abstracting away from V8

@trevnorris which engines have prevented this abstraction? It sounds like nodejs has a dependance on v8's set of api and events that are missing in other engines?

@BrendanEich
Copy link

@abustin: that's the wrong way of looking at it. Even V8 alone does not want to commit to its current GC implementation via a GC-schedule-leaky abstraction.

Hence TC39's strawman proposal that uses next-event-loop-turn scheduling of finalization-based notifications. That's the only way to roll. Trying to blame other engines won't cut it. V8 smeared across the future is multiple engines.

/be

@titzer
Copy link

titzer commented Jun 30, 2015

User-controlled mappings between wasm memory locations and JS objects are
going to encounter many of the same problems as any object system that we
add to wasm in the future, so we should be careful now to have room for
that use case.

There are basically two ways to view the relationship; either wasm has
roots for the GC'd world, or the GC'd world has roots for memory locations
managed in the wasm world. The first is analogous to a handle system where
the low-level program cannot be trusted with direct pointers and therefore
must have an indirection through a table that is trusted. It's the second
use case that seems to require a weak reference system. For that second
case I would advocate something that approximates reference queues, which
tie the observation of when an object becomes no longer strongly reachable
to the order in which the object was added to the queue.

On Tue, Jun 30, 2015 at 2:28 AM, Brendan Eich [email protected]
wrote:

@abustin https://github.com/abustin: that's the wrong way of looking at
it. Even V8 alone does not want to commit to its current GC implementation
via a GC-schedule-leaky abstraction.

Hence TC39's strawman proposal that uses next-event-loop-turn scheduling
of finalization-based notifications. That's the only way to roll. Trying to
blame other engines won't cut it. V8 smeared across the future is multiple
engines.

/be


Reply to this email directly or view it on GitHub
#238 (comment).

@lskyum
Copy link
Author

lskyum commented Jun 30, 2015

Thinking more about the case with wrapping C++ (or other) objects as JavaScript objects.

The next-event-loop finalizer and weak references surely solves a lot, but it doesn't handle that the JavaScript engine doesn't know if a wasm module is running low on memory because of uncollected JS objects.

Do you think that is a real problem or not, or if it is even solvable?

@kg
Copy link
Contributor

kg commented Jun 30, 2015

I don't think a wasm module will ever be affected by uncollected JS objects when it comes to memory. Wasm has its own memory (and address space) of a fixed size, and they're pre-reserved. You can grow (and shrink, i think?) the heap but that's done explicitly and a runtime would be able to do any necessary collections at that point.

@titzer
Copy link

titzer commented Jun 30, 2015

I think the OP meant the scenario where the JS objects essentially wrap a
pointer to a C++ object in the wasm memory and the wasm module wants to
free that memory (internally speaking, of course) when the JS object dies.

On Tue, Jun 30, 2015 at 11:00 AM, Katelyn Gadd [email protected]
wrote:

I don't think a wasm module will ever be affected by uncollected JS
objects when it comes to memory. Wasm has its own memory (and address
space) of a fixed size, and they're pre-reserved. You can grow (and shrink,
i think?) the heap but that's done explicitly and a runtime would be able
to do any necessary collections at that point.


Reply to this email directly or view it on GitHub
#238 (comment).

@lskyum
Copy link
Author

lskyum commented Jun 30, 2015

@kg i agree that when wasm is "the main program" it is not an issue, but as titzer states, it's the case where wasm objects are exposed as JavaScript objects. I think that will be a major usecase for wasm - besides stuff like games and complex applications.

@trevnorris
Copy link

it's the case where wasm objects are exposed as JavaScript objects.

This is an expected use case for node.js. For example a transform for a TCP data stream. It will be necessary for the transform to be able to keep a list of all pointers that have been passed and be able to free those pointers if it happens to go out of scope. Generally the stream should be terminated, but can't guarantee that.

Question about next-event-loop-turn finalization. If the GC notification isn't signaled until the next turn of the event loop, wouldn't it be fairly trivial to run my system out of memory in a tight loop? For example (borrowing from io.js' pre v3.0 implementation)

for (var i = 0; i < 1e7; i++)
  new Buffer(0xfffff);

If the GC wasn't allowed to halt the process and run the native weak callback to free referenced external memory the system would quickly run out.

@titzer
Copy link

titzer commented Jun 30, 2015

That's a good point. "Filling up" the wasm memory might need to put
pressure on the JS garbage collector to clean up the JS heap, even though
it's not full.

On Tue, Jun 30, 2015 at 8:57 PM, Trevor Norris [email protected]
wrote:

it's the case where wasm objects are exposed as JavaScript objects.

This is an expected use case for node.js. For example a transform for a
TCP data stream. It will be necessary for the transform to be able to keep
a list of all pointers that have been passed and be able to free those
pointers if it happens to go out of scope. Generally the stream should be
terminated, but can't guarantee that.

Question about next-event-loop-turn finalization. If the GC notification
isn't signaled until the next turn of the event loop, wouldn't it be fairly
trivial to run my system out of memory in a tight loop? For example
(borrowing from io.js' pre v3.0 implementation)

for (var i = 0; i < 1e7; i++)
new Buffer(0xfffff);

If the GC wasn't allowed to halt the process and run the native weak
callback to free referenced external memory the system would quickly run
out.


Reply to this email directly or view it on GitHub
#238 (comment).

@BrendanEich
Copy link

Right -- memory pressure accounting must be "all-in" or you'll get false OOMs.

/be

@lskyum
Copy link
Author

lskyum commented Jun 30, 2015

I think trevnorris pointed out that next event loop finalizers will face a problem, even without wasm involved. It would potentially need to keep a huge amount of objects for later finalization.

@BrendanEich
Copy link

@lskyum -- the GC would run sooner but the notifications would be batched till next event loop. If notification entails further memory freeing, then indeed false OOMs could arise, compared to the case where the GC could finalize everything. That's the price of proposed JS weak refs and not leaking the non-deterministic GC schedule.

The better way would not be to use the proposed weak ref event-turn notification in the first place. In doing asm.js and roadmapping it, we talked a year or two ago about the one true JS GC being extended to see through asm allocations. Same idea goes for wasm as @titzer suggests.

So please don't let my status update on a potential future JS weak ref extension lead us astray. If it's useful, great. For the full one-GC solution (1GC, catchy!) we need better.

/be

@kg
Copy link
Contributor

kg commented Jun 30, 2015

If we care about these scenarios working, then it sounds like weakrefs even with next-turn notification might not be sufficient. The solution might have to be something that requires VM support, which is unfortunate.

One alternative that comes to mind is something around address space reservation: If we expose basic address space allocation/reservation as a VM primitive (this is something that probably will become necessary for dynamic linking anyhow), one reservation strategy would be to be able to reserve a page (or other allocation unit) for 'weak' data that expires when the corresponding GC objects are collected. There would be no notification and no state to poll to determine whether the page allocation had been released, but it would be possible to get the page again when trying to allocate (which would enable a native malloc to claim the page if it needs it). This would address a subset of weak JS -> native heap scenarios.

This would still need some additional support to be a complete solution, though. You'd need a way to flag any GC roots contained within the 'weak' pages, or you'd need to combine the weak pages with traditional weakrefs and finalization notifications. The finalization notification would allow you to clean up any internal data structures - like a list of event listeners - relatively quickly without the requirement for you to observe a GC as soon as it happens. Perhaps you could use that to implement releasing any rooted GC values held by the object, but there's probably a race or something involved in that.

So essentially the VM would expose an 'allocate weak page(s)' primitive, where you pass it a size and get back a JS object that controls the lifetime of the page(s). When that object is collected the page allocation is silently reaped by the VM at its discretion and future page allocations (weak or strong) can utilize those page(s). The JS object representing the allocation could be pointed to by a JS WeakRef in order to get finalization notifications, or it could itself be a specialized variant of WeakRef (implements the JS WeakRef protocol but has additional semantics).

@lskyum
Copy link
Author

lskyum commented Jul 1, 2015

@BrendanEich Thank you for clearing some things out. I don't fully understand why a finalizer can't run immediately, so I can't say if the following idea will help or not:

Suppose wasm could register a JS object for finalization through a Rooting API. The event occurs immediately to a wasm function, but in a special isolation where no communication with JS is allowed. This allows wasm to do some cleanup, but not mess around with JS.

@trevnorris
Copy link

Only being able to run a rooted object's cleanup callback on next event loop can make a trivial DDoS attack. Unless the implementer knows they have to manually interrupt a stream of events so the event loop can flip over, then resume when it comes back around. Problem with this is it is dependent on the event loop's implementation details.

This is a known issue in node.js that companies have reported experiencing. One case happens as follows:

  • Event loop polls on a TCP socket
  • Data packet is received and sent to JS land
  • This data stream has a transform that takes as little as 1-2ms
  • The transformed data is then written to another socket
  • Kernel is able to flush data to socket immediately
  • Event loop polls again for any events
  • Data packet is ...

We see this with servers that handle high volume requests of smaller packets. Generally the kernel can keep up with all the packets that need to be written, and an application can be caught looping on the poll event for quite a while. So if the JS object isn't allowed to cleanup the attached packet as they are being processed you can quite easily kill the app.

The fix we suggest is to force polling to stop processing so the event loop can flip around and cleanup resources. Thing is, we're able to tell them to run a callback that specifically occurs after the cleanup phase of the event loop, but before it starts polling again. This is important for them to be able to prep anything necessary before polling starts. But I doubt enforcing the order of event loop events is within scope.

@MikeHolman
Copy link
Member

I've discussed this with someone from our GC team and it seems like finalizers should be doable for us. We actually could support reviving objects as well, as long as that behavior is well defined and is distinct from the finalize API (which I believe is not currently the case in V8).

Apparently we already did most of the required work when shimming to get Chakra to work with Node, @jianchun might have more context on this stuff.

@titzer
Copy link

titzer commented Oct 23, 2015

WebAssembly doesn't have a concept of an "event" yet, so it's not clear how this feature would play out at the engine level. Integration with JS is an embedding concern that we've addressed somewhat in https://github.com/WebAssembly/design/blob/master/Web.md. JS engines don't currently expose GC events to JS, so the fate of this feature request is tied to JS engines offering significant (and controversial) new features that expose GC.

@titzer titzer closed this as completed Oct 23, 2015
@rafis
Copy link

rafis commented Aug 27, 2016

WebAssembly doesn't have a concept of an "event" yet

There is no need for an "event" concept. It should be applied only on objects created with new:

function MyClass() {
     console.log('Constructor for MyClass is called');
}
MyClass.prototype.__gc = function() {
     console.log('Destructor for MyClass is called');
};
var obj = new MyClass;
obj = null;
global.gc('full'); // perform full garbage collection (blocking)

This is based on concept used in Lua. But in Lua it works differently in original Lua and in LuaJIT, it has bugs and not recommended to use, it also supports weak references which I hate. So my conclusion it suxx in Lua, please make not to suck in JavaScript.

@nguyenbs
Copy link

nguyenbs commented Sep 25, 2017

I'm currently joining a project that uses FalconJX to cross-compile AS3 to JavaScript. The output JavaScript will use WASM FlashAPI that are writting by us using C++.

We are facing a big problem is that the memory of WASM is leaked because of there's no way to design a mechanism to free the WASM FlashAPI objects automatically when they are not used at JS side.

This issue directly related with my problem. Does anyone have a clue?

@ceztko
Copy link

ceztko commented Apr 27, 2023

V8 does (probably unofficially) support being able to resurrect objects about to be GC'd, Though node.js core doesn't use it, and have never seen it used in the wild

Off-topic: @trevnorris please can you show a concrete example of such technique being used with V8? I tried to simulate a full finalizer that can access the object being GC'd to no avail.

@ceztko
Copy link

ceztko commented Apr 27, 2023

Answering myself: resurrecting finalizers were deprecated and have been already removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests