Skip to content

Commit 31589b5

Browse files
joyeecheungdanielleadams
authored andcommitted
bootstrap: implement --snapshot-blob and --build-snapshot
This patch introduces `--build-snapshot` and `--snapshot-blob` options for creating and using user land snapshots. For the initial iteration, user land CJS modules and ESM are not yet supported in the snapshot, so only one single file can be snapshotted (users can bundle their applications into a single script with their bundler of choice to build a snapshot though). A subset of builtins should already work, and support for more builtins are being added. This PR includes tests checking that the TypeScript compiler and the marked markdown renderer (and the builtins they use) can be snapshotted and deserialized. To generate a snapshot using `snapshot.js` as entry point and write the snapshot blob to `snapshot.blob`: ``` $ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js $ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js ``` To restore application state from `snapshot.blob`, with `index.js` as the entry point script for the deserialized application: ``` $ echo "console.log(globalThis.foo)" > index.js $ node --snapshot-blob snapshot.blob index.js I am from the snapshot ``` Users can also use the `v8.startupSnapshot` API to specify an entry point at snapshot building time, thus avoiding the need of an additional entry script at deserialization time: ``` $ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js $ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js $ node --snapshot-blob snapshot.blob I am from the snapshot ``` Note that this patch only adds functionality to the `node` executable for building run-time user-land snapshots, the generated snapshot is stored into a separate file on disk. Building a single binary with both Node.js and an embedded snapshot has already been possible with the `--node-snapshot-main` option to the `configure` script if the user compiles Node.js from source. It would be a different task to enable the `node` executable to produce a single binary that contains both Node.js and an embedded snapshot without building Node.js from source, which should be layered on top of the SEA (Single Executable Apps) initiative. Known limitations/bugs that are being fixed in the upstream: - V8 hits a DCHECK when deserializing certain mutated globals, e.g. `Error.stackTraceLimit` (it should work fine in the release build, however): https://chromium-review.googlesource.com/c/v8/v8/+/3319481 - Layout of V8's read-only heap can be inconsistent after deserialization, resulting in memory corruption: https://bugs.chromium.org/p/v8/issues/detail?id=12921 PR-URL: #38905 Refs: #35711 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
1 parent 5cb5c65 commit 31589b5

17 files changed

+1408
-52
lines changed

doc/api/cli.md

+76
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,62 @@ If this flag is passed, the behavior can still be set to not abort through
100100
[`process.setUncaughtExceptionCaptureCallback()`][] (and through usage of the
101101
`node:domain` module that uses it).
102102

103+
### `--build-snapshot`
104+
105+
<!-- YAML
106+
added: REPLACEME
107+
-->
108+
109+
> Stability: 1 - Experimental
110+
111+
Generates a snapshot blob when the process exits and writes it to
112+
disk, which can be loaded later with `--snapshot-blob`.
113+
114+
When building the snapshot, if `--snapshot-blob` is not specified,
115+
the generated blob will be written, by default, to `snapshot.blob`
116+
in the current working directory. Otherwise it will be written to
117+
the path specified by `--snapshot-blob`.
118+
119+
```console
120+
$ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js
121+
122+
# Run snapshot.js to intialize the application and snapshot the
123+
# state of it into snapshot.blob.
124+
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
125+
126+
$ echo "console.log(globalThis.foo)" > index.js
127+
128+
# Load the generated snapshot and start the application from index.js.
129+
$ node --snapshot-blob snapshot.blob index.js
130+
I am from the snapshot
131+
```
132+
133+
The [`v8.startupSnapshot` API][] can be used to specify an entry point at
134+
snapshot building time, thus avoiding the need of an additional entry
135+
script at deserialization time:
136+
137+
```console
138+
$ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js
139+
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
140+
$ node --snapshot-blob snapshot.blob
141+
I am from the snapshot
142+
```
143+
144+
For more information, check out the [`v8.startupSnapshot` API][] documentation.
145+
146+
Currently the support for run-time snapshot is experimental in that:
147+
148+
1. User-land modules are not yet supported in the snapshot, so only
149+
one single file can be snapshotted. Users can bundle their applications
150+
into a single script with their bundler of choice before building
151+
a snapshot, however.
152+
2. Only a subset of the built-in modules work in the snapshot, though the
153+
Node.js core test suite checks that a few fairly complex applications
154+
can be snapshotted. Support for more modules are being added. If any
155+
crashes or buggy behaviors occur when building a snapshot, please file
156+
a report in the [Node.js issue tracker][] and link to it in the
157+
[tracking issue for user-land snapshots][].
158+
103159
### `--completion-bash`
104160

105161
<!-- YAML
@@ -1094,6 +1150,22 @@ minimum allocation from the secure heap. The minimum value is `2`.
10941150
The maximum value is the lesser of `--secure-heap` or `2147483647`.
10951151
The value given must be a power of two.
10961152

1153+
### `--snapshot-blob=path`
1154+
1155+
<!-- YAML
1156+
added: REPLACEME
1157+
-->
1158+
1159+
> Stability: 1 - Experimental
1160+
1161+
When used with `--build-snapshot`, `--snapshot-blob` specifies the path
1162+
where the generated snapshot blob will be written to. If not specified,
1163+
the generated blob will be written, by default, to `snapshot.blob`
1164+
in the current working directory.
1165+
1166+
When used without `--build-snapshot`, `--snapshot-blob` specifies the
1167+
path to the blob that will be used to restore the application state.
1168+
10971169
### `--test`
10981170

10991171
<!-- YAML
@@ -1705,6 +1777,7 @@ Node.js options that are allowed are:
17051777
* `--require`, `-r`
17061778
* `--secure-heap-min`
17071779
* `--secure-heap`
1780+
* `--snapshot-blob`
17081781
* `--test-only`
17091782
* `--throw-deprecation`
17101783
* `--title`
@@ -2077,6 +2150,7 @@ done
20772150
[ECMAScript module loader]: esm.md#loaders
20782151
[Fetch API]: https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
20792152
[Modules loaders]: packages.md#modules-loaders
2153+
[Node.js issue tracker]: https://github.com/nodejs/node/issues
20802154
[OSSL_PROVIDER-legacy]: https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html
20812155
[REPL]: repl.md
20822156
[ScriptCoverage]: https://chromedevtools.github.io/devtools-protocol/tot/Profiler#type-ScriptCoverage
@@ -2106,6 +2180,7 @@ done
21062180
[`tls.DEFAULT_MAX_VERSION`]: tls.md#tlsdefault_max_version
21072181
[`tls.DEFAULT_MIN_VERSION`]: tls.md#tlsdefault_min_version
21082182
[`unhandledRejection`]: process.md#event-unhandledrejection
2183+
[`v8.startupSnapshot` API]: v8.md#startup-snapshot-api
21092184
[`worker_threads.threadId`]: worker_threads.md#workerthreadid
21102185
[conditional exports]: packages.md#conditional-exports
21112186
[context-aware]: addons.md#context-aware-addons
@@ -2121,4 +2196,5 @@ done
21212196
[security warning]: #warning-binding-inspector-to-a-public-ipport-combination-is-insecure
21222197
[semi-space]: https://www.memorymanagement.org/glossary/s.html#semi.space
21232198
[timezone IDs]: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
2199+
[tracking issue for user-land snapshots]: https://github.com/nodejs/node/issues/44014
21242200
[ways that `TZ` is handled in other environments]: https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html

src/env.cc

+21-15
Original file line numberDiff line numberDiff line change
@@ -248,17 +248,6 @@ std::ostream& operator<<(std::ostream& output,
248248
return output;
249249
}
250250

251-
std::ostream& operator<<(std::ostream& output,
252-
const std::vector<PropInfo>& vec) {
253-
output << "{\n";
254-
for (const auto& info : vec) {
255-
output << " { \"" << info.name << "\", " << std::to_string(info.id) << ", "
256-
<< std::to_string(info.index) << " },\n";
257-
}
258-
output << "}";
259-
return output;
260-
}
261-
262251
std::ostream& operator<<(std::ostream& output,
263252
const IsolateDataSerializeInfo& i) {
264253
output << "{\n"
@@ -298,7 +287,7 @@ IsolateDataSerializeInfo IsolateData::Serialize(SnapshotCreator* creator) {
298287
for (size_t i = 0; i < AsyncWrap::PROVIDERS_LENGTH; i++)
299288
info.primitive_values.push_back(creator->AddData(async_wrap_provider(i)));
300289

301-
size_t id = 0;
290+
uint32_t id = 0;
302291
#define V(PropertyName, TypeName) \
303292
do { \
304293
Local<TypeName> field = PropertyName(); \
@@ -352,7 +341,7 @@ void IsolateData::DeserializeProperties(const IsolateDataSerializeInfo* info) {
352341

353342
const std::vector<PropInfo>& values = info->template_values;
354343
i = 0; // index to the array
355-
size_t id = 0;
344+
uint32_t id = 0;
356345
#define V(PropertyName, TypeName) \
357346
do { \
358347
if (values.size() > i && id == values[i].id) { \
@@ -1625,6 +1614,7 @@ std::ostream& operator<<(std::ostream& output,
16251614
AsyncHooks::SerializeInfo AsyncHooks::Serialize(Local<Context> context,
16261615
SnapshotCreator* creator) {
16271616
SerializeInfo info;
1617+
// TODO(joyeecheung): some of these probably don't need to be serialized.
16281618
info.async_ids_stack = async_ids_stack_.Serialize(context, creator);
16291619
info.fields = fields_.Serialize(context, creator);
16301620
info.async_id_fields = async_id_fields_.Serialize(context, creator);
@@ -1819,7 +1809,7 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
18191809
info.should_abort_on_uncaught_toggle =
18201810
should_abort_on_uncaught_toggle_.Serialize(ctx, creator);
18211811

1822-
size_t id = 0;
1812+
uint32_t id = 0;
18231813
#define V(PropertyName, TypeName) \
18241814
do { \
18251815
Local<TypeName> field = PropertyName(); \
@@ -1836,6 +1826,22 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
18361826
return info;
18371827
}
18381828

1829+
std::ostream& operator<<(std::ostream& output,
1830+
const std::vector<PropInfo>& vec) {
1831+
output << "{\n";
1832+
for (const auto& info : vec) {
1833+
output << " " << info << ",\n";
1834+
}
1835+
output << "}";
1836+
return output;
1837+
}
1838+
1839+
std::ostream& operator<<(std::ostream& output, const PropInfo& info) {
1840+
output << "{ \"" << info.name << "\", " << std::to_string(info.id) << ", "
1841+
<< std::to_string(info.index) << " }";
1842+
return output;
1843+
}
1844+
18391845
std::ostream& operator<<(std::ostream& output,
18401846
const std::vector<std::string>& vec) {
18411847
output << "{\n";
@@ -1917,7 +1923,7 @@ void Environment::DeserializeProperties(const EnvSerializeInfo* info) {
19171923

19181924
const std::vector<PropInfo>& values = info->persistent_values;
19191925
size_t i = 0; // index to the array
1920-
size_t id = 0;
1926+
uint32_t id = 0;
19211927
#define V(PropertyName, TypeName) \
19221928
do { \
19231929
if (values.size() > i && id == values[i].id) { \

src/env.h

+8-3
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,7 @@ typedef size_t SnapshotIndex;
580580

581581
struct PropInfo {
582582
std::string name; // name for debugging
583-
size_t id; // In the list - in case there are any empty entries
583+
uint32_t id; // In the list - in case there are any empty entries
584584
SnapshotIndex index; // In the snapshot
585585
};
586586

@@ -987,8 +987,9 @@ struct EnvSerializeInfo {
987987
struct SnapshotData {
988988
enum class DataOwnership { kOwned, kNotOwned };
989989

990-
static const size_t kNodeBaseContextIndex = 0;
991-
static const size_t kNodeMainContextIndex = kNodeBaseContextIndex + 1;
990+
static const uint32_t kMagic = 0x143da19;
991+
static const SnapshotIndex kNodeBaseContextIndex = 0;
992+
static const SnapshotIndex kNodeMainContextIndex = kNodeBaseContextIndex + 1;
992993

993994
DataOwnership data_ownership = DataOwnership::kOwned;
994995

@@ -1000,12 +1001,16 @@ struct SnapshotData {
10001001
// TODO(joyeecheung): there should be a vector of env_info once we snapshot
10011002
// the worker environments.
10021003
EnvSerializeInfo env_info;
1004+
10031005
// A vector of built-in ids and v8::ScriptCompiler::CachedData, this can be
10041006
// shared across Node.js instances because they are supposed to share the
10051007
// read only space. We use native_module::CodeCacheInfo because
10061008
// v8::ScriptCompiler::CachedData is not copyable.
10071009
std::vector<native_module::CodeCacheInfo> code_cache;
10081010

1011+
void ToBlob(FILE* out) const;
1012+
static void FromBlob(SnapshotData* out, FILE* in);
1013+
10091014
~SnapshotData();
10101015

10111016
SnapshotData(const SnapshotData&) = delete;

src/node.cc

+111-22
Original file line numberDiff line numberDiff line change
@@ -1176,38 +1176,127 @@ void TearDownOncePerProcess() {
11761176
per_process::v8_platform.Dispose();
11771177
}
11781178

1179+
int GenerateAndWriteSnapshotData(const SnapshotData** snapshot_data_ptr,
1180+
InitializationResult* result) {
1181+
// nullptr indicates there's no snapshot data.
1182+
DCHECK_NULL(*snapshot_data_ptr);
1183+
1184+
// node:embedded_snapshot_main indicates that we are using the
1185+
// embedded snapshot and we are not supposed to clean it up.
1186+
if (result->args[1] == "node:embedded_snapshot_main") {
1187+
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
1188+
if (*snapshot_data_ptr == nullptr) {
1189+
// The Node.js binary is built without embedded snapshot
1190+
fprintf(stderr,
1191+
"node:embedded_snapshot_main was specified as snapshot "
1192+
"entry point but Node.js was built without embedded "
1193+
"snapshot.\n");
1194+
result->exit_code = 1;
1195+
return result->exit_code;
1196+
}
1197+
} else {
1198+
// Otherwise, load and run the specified main script.
1199+
std::unique_ptr<SnapshotData> generated_data =
1200+
std::make_unique<SnapshotData>();
1201+
result->exit_code = node::SnapshotBuilder::Generate(
1202+
generated_data.get(), result->args, result->exec_args);
1203+
if (result->exit_code == 0) {
1204+
*snapshot_data_ptr = generated_data.release();
1205+
} else {
1206+
return result->exit_code;
1207+
}
1208+
}
1209+
1210+
// Get the path to write the snapshot blob to.
1211+
std::string snapshot_blob_path;
1212+
if (!per_process::cli_options->snapshot_blob.empty()) {
1213+
snapshot_blob_path = per_process::cli_options->snapshot_blob;
1214+
} else {
1215+
// Defaults to snapshot.blob in the current working directory.
1216+
snapshot_blob_path = std::string("snapshot.blob");
1217+
}
1218+
1219+
FILE* fp = fopen(snapshot_blob_path.c_str(), "wb");
1220+
if (fp != nullptr) {
1221+
(*snapshot_data_ptr)->ToBlob(fp);
1222+
fclose(fp);
1223+
} else {
1224+
fprintf(stderr,
1225+
"Cannot open %s for writing a snapshot.\n",
1226+
snapshot_blob_path.c_str());
1227+
result->exit_code = 1;
1228+
}
1229+
return result->exit_code;
1230+
}
1231+
1232+
int LoadSnapshotDataAndRun(const SnapshotData** snapshot_data_ptr,
1233+
InitializationResult* result) {
1234+
// nullptr indicates there's no snapshot data.
1235+
DCHECK_NULL(*snapshot_data_ptr);
1236+
// --snapshot-blob indicates that we are reading a customized snapshot.
1237+
if (!per_process::cli_options->snapshot_blob.empty()) {
1238+
std::string filename = per_process::cli_options->snapshot_blob;
1239+
FILE* fp = fopen(filename.c_str(), "rb");
1240+
if (fp == nullptr) {
1241+
fprintf(stderr, "Cannot open %s", filename.c_str());
1242+
result->exit_code = 1;
1243+
return result->exit_code;
1244+
}
1245+
std::unique_ptr<SnapshotData> read_data = std::make_unique<SnapshotData>();
1246+
SnapshotData::FromBlob(read_data.get(), fp);
1247+
*snapshot_data_ptr = read_data.release();
1248+
fclose(fp);
1249+
} else if (per_process::cli_options->node_snapshot) {
1250+
// If --snapshot-blob is not specified, we are reading the embedded
1251+
// snapshot, but we will skip it if --no-node-snapshot is specified.
1252+
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
1253+
}
1254+
1255+
if ((*snapshot_data_ptr) != nullptr) {
1256+
NativeModuleLoader::RefreshCodeCache((*snapshot_data_ptr)->code_cache);
1257+
}
1258+
NodeMainInstance main_instance(*snapshot_data_ptr,
1259+
uv_default_loop(),
1260+
per_process::v8_platform.Platform(),
1261+
result->args,
1262+
result->exec_args);
1263+
result->exit_code = main_instance.Run();
1264+
return result->exit_code;
1265+
}
1266+
11791267
int Start(int argc, char** argv) {
11801268
InitializationResult result = InitializeOncePerProcess(argc, argv);
11811269
if (result.early_return) {
11821270
return result.exit_code;
11831271
}
11841272

1185-
if (per_process::cli_options->build_snapshot) {
1186-
fprintf(stderr,
1187-
"--build-snapshot is not yet supported in the node binary\n");
1188-
return 1;
1189-
}
1273+
DCHECK_EQ(result.exit_code, 0);
1274+
const SnapshotData* snapshot_data = nullptr;
11901275

1191-
{
1192-
bool use_node_snapshot = per_process::cli_options->node_snapshot;
1193-
const SnapshotData* snapshot_data =
1194-
use_node_snapshot ? SnapshotBuilder::GetEmbeddedSnapshotData()
1195-
: nullptr;
1196-
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);
1197-
1198-
if (snapshot_data != nullptr) {
1199-
NativeModuleLoader::RefreshCodeCache(snapshot_data->code_cache);
1276+
auto cleanup_process = OnScopeLeave([&]() {
1277+
TearDownOncePerProcess();
1278+
1279+
if (snapshot_data != nullptr &&
1280+
snapshot_data->data_ownership == SnapshotData::DataOwnership::kOwned) {
1281+
delete snapshot_data;
1282+
}
1283+
});
1284+
1285+
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);
1286+
1287+
// --build-snapshot indicates that we are in snapshot building mode.
1288+
if (per_process::cli_options->build_snapshot) {
1289+
if (result.args.size() < 2) {
1290+
fprintf(stderr,
1291+
"--build-snapshot must be used with an entry point script.\n"
1292+
"Usage: node --build-snapshot /path/to/entry.js\n");
1293+
return 9;
12001294
}
1201-
NodeMainInstance main_instance(snapshot_data,
1202-
uv_default_loop(),
1203-
per_process::v8_platform.Platform(),
1204-
result.args,
1205-
result.exec_args);
1206-
result.exit_code = main_instance.Run();
1295+
return GenerateAndWriteSnapshotData(&snapshot_data, &result);
12071296
}
12081297

1209-
TearDownOncePerProcess();
1210-
return result.exit_code;
1298+
// Without --build-snapshot, we are in snapshot loading mode.
1299+
return LoadSnapshotDataAndRun(&snapshot_data, &result);
12111300
}
12121301

12131302
int Stop(Environment* env) {

0 commit comments

Comments
 (0)