Discussion/Tracking SnapshotCreator support #13877

bmeck · 2017-06-22T17:36:14Z

refack · 2017-06-22T17:38:17Z

I'm working on first one.

mscdex · 2017-06-22T17:46:51Z

Does this still really matter now with Ignition (and TurboFan)?

bmeck · 2017-06-22T17:48:40Z

@mscdex less so, but allows for rudimentary equivalence of features of pkg and does have impact since you don't need to run any require on startup.

bmeck · 2017-06-22T18:07:04Z

Some numbers from local investigation: https://twitter.com/bradleymeck/status/875758792805404673

On a Mac 64bit:

2MB~ is minimal snapshot size (also why setting max old space < 3MB doesn't work)
10e6 globals serialized into a snapshot w/ different hidden classes (global[i++]={[i]:i}) is 88MB, no significant change in speed when calling Isolate::New w/ this vs 2MB.
Isolate::New time seems to float around 10ms on local machine (consistent w/ v8 builtin default snapshot)

refack · 2017-06-22T18:23:16Z

If the v8::SnapshotCreator was able to use the existing Isolate, we could use this for pure JS post-mortem dumps, right?

bmeck · 2017-06-22T18:24:16Z

@refack yes, but the snapshot blob is platform and build specific unlike core dumps

refack · 2017-06-22T18:29:26Z

But we'll end up with a live system that could be dynamically interrogated. Sounds like an amazing feature, and if V8 stabilize the blob format it seems possible they could implement portability.
/cc @nodejs/post-mortem @nodejs/v8

bmeck · 2017-06-22T18:38:12Z

Not sure what you mean by "live", all v8::Handles need to be closed in order to serialize a blob, can't have JS on the stack.

refack · 2017-06-22T18:44:32Z

Not sure what you mean by "live", all v8::Handles need to be closed in order to serialize a blob, can't have JS on the stack.

Ok not live, suspended animation, but AFAICT you could reignite the event loop and use the preloaded code for example to rerun a JS function to see what it's doing.
For example if a production server enters an err state, before you restart it you "suspend" it, take the blob to your dev machine and fiddle with the state in JS land. Mind blown!

sebmarkbage · 2017-06-23T00:21:04Z

Related: Proposal for snapshots as a Realm API for wider support in the ecosystem and getting libraries to consider the requirements.

I built a mode into Prepack to compile snapshots of basic Node CLIs. One limitation that I hit was that things like the type of a socket becomes hard coded. You might imagine that taking a snapshot of a CLI that prints to the console and then running it with node mysnapshot > file.txt or something would work but now that stdout is a different type and a lot of the JS streams around stdio has be rewired. Now Prepack can automatically encode multiple branches over abstract values but you'd probably want to manually handle such cases.

bmeck · 2017-06-23T01:12:20Z

@sebmarkbage I think the custom serializer deserializer functions in v8 could be enough to handle this. Even with that though, libs could check tty info that gets invalidated when main runs after being deserialized, like you said.

hashseed · 2017-06-23T01:15:35Z

Some comments:

The isolate provided by the snapshot creator is special in that it prepares for serialization. This basically means that code objects will include meta data to mark external references so that the serializer can find them.
Not everything on the heap can be serialized. Most importantly, optimized code and typed arrays.
Snapshot for Realm API is different than the start up snapshot in so far that the start up snapshot bundles isolate and context snapshot, while for the Realm API the snapshot will only contain the context, decoupled from the isolate. This could prove to be more complicated since there need to be a way to untangle a context from the isolate it belongs to, into a form that can be transplanted into any arbitrary isolate.
Snapshot creator can be used to also include compiled code into the snapshot. For this, check out the implementation of V8::WarmUpSnapshotDataBlob. If it confuses, I can elaborate further.
We can't capture a snapshot that includes the stack. This in combination with some of the above points makes snapshot as mechanism for crash dump unsuitable, imo. Also unsuitable for suspending an isolate, as it contains more state than what's on the heap. It really is just to put V8 into a predefined start up state.
V8 team is unlikely to stabilize the snapshot format for the same reasons the bytecode format will not be stable.

(edited by @TimothyGu to correct a typo)

sebmarkbage · 2017-06-23T01:38:40Z

@bmeck My point was more that the JS libs internal to Node's runtime already does that. Ideally they'd be updated to avoid that problem.

bnoordhuis · 2017-06-23T07:03:30Z

Not everything on the heap can be serialized. Most importantly, optimized code and typed arrays.

I wondered about that. Means it's probably not useful for node because typed arrays (Buffers, which are instances of Uint8Array) are used pervasively throughout core.

hashseed · 2017-06-23T11:53:37Z

Adding support for serializing typed arrays should be fairly easy, as long as the deserializer can allocate the backing stores outside of V8's heap, I think. I'm not familiar with how Buffer is implemented. Would it work out-of-the-box if Uint8Arrays can be serialized?

TimothyGu · 2017-06-23T12:25:04Z

@hashseed Yes it would work, but can be a bit unwieldy if no special case is designated. Node.js Buffers can be pooled, and multiple Buffers can share one ArrayBuffer, but with different offsets and lengths. If Buffer serialization were to be implemented, only the visible view should be stored rather than the entire ArrayBuffer. This is what we are doing with the Serializer/Deserializer APIs (see https://github.com/nodejs/node/blob/master/lib/v8.js#L148-L165). Otherwise everything is the same.

/cc @addaleax who implemented the Serializer/Deserializer binding.

hashseed · 2017-06-23T12:32:07Z

@TimothyGu that would require V8's serializer to recognize and introduce special case for Buffers, which doesn't sound like a good idea to me. Why is it necessary to serialize views into separate ArrayBuffers? I would suggest to deserialize Buffers into the exact same state as they've been serialized.

bmeck · 2017-06-23T13:27:39Z

I am with @hashseed here, I think the entire pool would want to be serialized since a snapshot shouldn't change the data (which is leaking the pool).

TimothyGu · 2017-06-23T13:34:38Z

@bmeck @hashseed The default pool size (Buffer.poolSize) is 8192 bytes, so the underlying ArrayBuffer is 8192 bytes until it is used in full. In case I need to serialize 4096 consecutively allocated Buffers, each with two bytes, they would actually be stored on the same ArrayBuffer. To serialize Buffers into the same state as they've been initialized, I would have to store 4096 * 8192 = 33554432 bytes instead of just 4096 * 2 = 8192 bytes.

I'm not sure how startup snapshots are different from V8's Serializer, and the calculation above only applies to Serializer. If different ArrayBufferViews in a startup snapshot are allowed to share a single ArrayBuffer, then the issue would of course be solved. If not, I do see the argument for keeping the Buffers in the same state (and TBH I'm fine with that), but it just seems to be a bit wasteful.

bmeck · 2017-06-23T13:38:20Z

I think there is a disconnect here. I am not sure why you need to change the pool for all the buffers, that seems like a shallow copy which would also be bad.

hashseed · 2017-06-23T17:23:26Z

The start up snapshot is supposed to put the context in a predefined state while bypassing any initialization scripts. It should reach the same state as if the initialization script ran.

Assuming that serializing ArrayBuffer is implemented, there is not going to be a duplication you described above. The serializer can recognize objects that it already visited before and deserialize to the same object graph. After deserialization we would still require just 8192 bytes. I think you are confusing the ValueSerializer with the snapshot serializer. The former is fairly new, while the latter has existed for years, and been continuously improved. We are talking about the latter.

TimothyGu · 2017-06-24T08:19:36Z

@hashseed Ah that was partially what I was asking – if they operate differently. Seems like they do, so thanks for the explanation :)

mrkmarron · 2017-06-28T23:26:46Z

We have been working on snapshot/restore in ChakraCore for a while in the context of time-travel debugging. Our current implementation supports full serialization/deserialization of most ES5/6 constructs to a stable and (mostly) engine agnostic format. Snapshot performance is a big concern for us and the current implementation is able to extract/restore the full JS application state on the order of 10's of milliseconds.

We don't yet support linking this up with the native data in Node as, mentioned above, (1) we need to track down and associate all the native reference locations for snapshot/restore and (2) there are challenges in restoring some native state such as file handles or sockets. The first issue is mainly an engineering challenge, an intern of mine experimented with this for migrating live JavaScript/HTML applications (section 5.2). The perf. numbers are on the slow side since it was a proof-of-concept effort. The second issue requires some thought in the API for inflation -- e.g. symbolic values for startup code, user provided hooks, report error on next access and let recovery logic take over, etc. -- and I think having some specific scenarios here to guide design would be very useful.

Using snapshots for application startup performance and migration are things I have been wanting to do more work on, and we have most of the needed JavaScript side parts already implemented, so I am definitely happy to support work on this.

bmeck · 2017-06-29T14:14:58Z

@mrkmarron we can discuss need for ChakraCore here as well. Do you think an intptr_t[] is enough for ChakraCore to setup pointers to when doing serialization? In pseudocode~

intptr_t externals = { READ_FILE }; // provided by node

JSFunction readFileFn = { // made by runtime env / C++
   intptr_t internal_fields = { READ_FILE }
};

snapshot_create(...) {
   // walk ...
       for (field, i of obj.internal_fields) {
           serialized.internal_fields[i] = externals.indexOf(field);
       }
   // ... walk
}
snapshot_load(...) {
   // walk ...
       for (serialized_external_i, i of obj.internal_fields) {
           deserialized.internal_fields[i] = externals[serialized_external_i];
       }
   // ... walk
}

mrkmarron · 2017-06-30T05:59:25Z

Hi @bmeck, if I understand your design correctly I think it should work for recording basic JS->Native references in the snapshot. My paraphrasing of your design is that it uses the index in a fixed by the host array of intptr_t as the stable way to give an identity to a native reference in the snapshot phase so it can be looked up in the inflate phase. This should definitely be do-able with the current ChakraCore snapshot/inflate code.

My only thought here is that this works well when the snapshot is assumed to be at a very stable point (e.g., right after Node loads but before event loop starts) but it can be a bit brittle and difficult to debug if we start to want to use this in more complex scenarios. We had a similar issue with primitive objects from the core runtime (like the undefined object or builtin functions) where we cannot create them and need to give them "well known identities" in the snapshot. We ended up using simple string names, which is slightly lower performance but much easier to debug and more flexible, and I have been satisfied with the result. So, I feel like having a typedef-able setup for the identifier tokens, at least to allow for debugging, might be good.

Also, what are you thinking in of the the Native->JS references? We currently just use the pointer values cast to integer values (#defined as TTD_PTR_ID) in the snapshot and then have a map TTD_PTR_ID -> Js::Object when inflating. We could either look to pass that map back explicitly or provide and API the host can use like Js::Object JsRTMapSnapshotObjectId(TTD_PTR_ID).

I'll plan to put some cycles into the JS->Native implementation next week.

bmeck · 2017-06-30T12:13:04Z

@mrkmarron I think the map could be fine as long as it doesn't explicitly point to a memory address once serialized.

For v8, the general idea is to us Private symbols that are not visible to JS and attach them to the global scope:

Context global;
Function require = makeJSRequireFunction();
global->Set(Private::ForAPI("require"), require); // ~= Symbol.for but not visible to JS
// ... take snapshot
// ... revive snapshot
Context global = snapshot.GetContext(0);
Function require = global->Get(Private::ForAPI("require"));

This roughly equates to us manually making the map ourselves so that seems fine. Not having to do all the Get/Sets and being able to get an array would be nice.

refack added lib / src Issues and PRs related to general changes in the lib or src directory. process Issues and PRs related to the process subsystem. labels Jun 22, 2017

mscdex added performance Issues and PRs related to the performance of Node.js. discuss Issues opened for discussions and feedbacks. labels Jun 22, 2017

snyk-bot mentioned this issue Feb 12, 2022

[Snyk] Fix for 9 vulnerabilities Iceymann18777/node#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion/Tracking SnapshotCreator support #13877

Discussion/Tracking SnapshotCreator support #13877

bmeck commented Jun 22, 2017 •

edited by meixg

Loading

refack commented Jun 22, 2017

mscdex commented Jun 22, 2017

bmeck commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

sebmarkbage commented Jun 23, 2017

bmeck commented Jun 23, 2017

hashseed commented Jun 23, 2017 •

edited by TimothyGu

Loading

sebmarkbage commented Jun 23, 2017

bnoordhuis commented Jun 23, 2017

hashseed commented Jun 23, 2017

TimothyGu commented Jun 23, 2017

hashseed commented Jun 23, 2017

bmeck commented Jun 23, 2017

TimothyGu commented Jun 23, 2017

bmeck commented Jun 23, 2017

hashseed commented Jun 23, 2017

TimothyGu commented Jun 24, 2017

mrkmarron commented Jun 28, 2017

bmeck commented Jun 29, 2017

mrkmarron commented Jun 30, 2017

bmeck commented Jun 30, 2017

Discussion/Tracking SnapshotCreator support #13877

Discussion/Tracking SnapshotCreator support #13877

Comments

bmeck commented Jun 22, 2017 • edited by meixg Loading

refack commented Jun 22, 2017

mscdex commented Jun 22, 2017

bmeck commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

bmeck commented Jun 22, 2017

refack commented Jun 22, 2017

sebmarkbage commented Jun 23, 2017

bmeck commented Jun 23, 2017

hashseed commented Jun 23, 2017 • edited by TimothyGu Loading

sebmarkbage commented Jun 23, 2017

bnoordhuis commented Jun 23, 2017

hashseed commented Jun 23, 2017

TimothyGu commented Jun 23, 2017

hashseed commented Jun 23, 2017

bmeck commented Jun 23, 2017

TimothyGu commented Jun 23, 2017

bmeck commented Jun 23, 2017

hashseed commented Jun 23, 2017

TimothyGu commented Jun 24, 2017

mrkmarron commented Jun 28, 2017

bmeck commented Jun 29, 2017

mrkmarron commented Jun 30, 2017

bmeck commented Jun 30, 2017

bmeck commented Jun 22, 2017 •

edited by meixg

Loading

hashseed commented Jun 23, 2017 •

edited by TimothyGu

Loading