-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: implement NODE_COMPILE_CACHE for automatic on-disk code caching #52535
Conversation
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7
This patch implements automatic on-disk code caching that can be enabled via an environment variable NODE_COMPILE_CACHE. When set, whenever Node.js compiles a CommonJS or a ECMAScript Module, it will use on-disk [V8 code cache][] persisted in the specified directory to speed up the compilation. This may slow down the first load of a module graph, but subsequent loads of the same module graph may get a significant speedup if the contents of the modules do not change. Locally, this speeds up loading of test/fixtures/snapshot/typescript.js from ~130ms to ~80ms. To clean up the generated code cache, simply remove the directory. It will be recreated the next time the same directory is used for `NODE_COMPILE_CACHE`. Compilation cache generated by one version of Node.js may not be used by a different version of Node.js. Cache generated by different versions of Node.js will be stored separately if the same directory is used to persist the cache, so they can co-exist. Caveat: currently when using this with V8 JavaScript code coverage, the coverage being collected by V8 may be less precise in functions that are deserialized from the code cache. It's recommended to turn this off when running tests to generate precise coverage. Implementation details: There is one cache file per module on disk. The directory layout is: - Compile cache directory (from NODE_COMPILE_CACHE) - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION - 2ea3424d: - 10860e5a: CRC32 hash of filename + module type - 431e9adc: ... - ... Inside the cache file, there is a header followed by the actual cache content: ``` [uint32_t] code size [uint32_t] code hash [uint32_t] cache size [uint32_t] cache hash ... compile cache content ... ``` When reading the cache file, we'll also check if the code size and code hash match the code that the module loader is loading and whether the cache size and cache hash match the file content read. If they don't match, or if V8 rejects the cache passed, we'll ignore the mismatch cache, and regenerate the cache after compilation succeeds and rewrite it to disk.
Review requested:
|
of Node.js will be stored separately if the same directory is used | ||
to persist the cache, so they can co-exist. | ||
|
||
Caveat: currently when using this with [V8 JavaScript code coverage][], the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When enabling NODE_COMPILE_CACHE and running tools/test.py parallel
locally, three tests would fail:
Failed tests:
out/Release/node /Users/joyee/projects/node/test/parallel/test-runner-coverage.js
out/Release/node /Users/joyee/projects/node/test/parallel/test-runner-output.mjs
out/Release/node /Users/joyee/projects/node/test/parallel/test-v8-coverage.js
The failed tests are all related to less precise coverage which is currently a known caveat of code cache + v8 native coverage collection (e.g. see #51672). So I just documented that a bit in the docs that users should not use the two together for now. I could spare some time later to fix it in V8, but I don't think it's a blocker for this PR because this feature is currently opt-in. User-land code caching can run into this problem too, and it's technically a V8 bug.
Some up-to-date results with f7806b3: $ hyperfine -w 10 "NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version" "NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version"
Benchmark 1: NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version
Time (mean ± σ): 165.5 ms ± 1.9 ms [User: 156.5 ms, System: 25.4 ms]
Range (min … max): 162.2 ms … 169.3 ms 18 runs
Benchmark 2: NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version
Time (mean ± σ): 116.0 ms ± 1.3 ms [User: 109.2 ms, System: 23.5 ms]
Range (min … max): 113.8 ms … 119.7 ms 25 runs
Summary
NODE_COMPILE_CACHE='/tmp/node-cache' ./node ./.yarn/releases/yarn-4.1.1.cjs --version ran
1.43 ± 0.02 times faster than NODE_COMPILE_CACHE='' ./node ./.yarn/releases/yarn-4.1.1.cjs --version |
This is so rad! Having had to manage and hook into systems for this having it built-in is very nice! ❤️ it. |
|
||
> Stability: 1.1 - Active Development | ||
|
||
When set, whenever Node.js compiles a CommonJS or a ECMAScript Module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is very technical - I would use more layman terms for the first paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have specific suggestions? I don't want to make it sound too magical to avoid false advertisement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll do a follow up PR after this lands to not hold it but ideally something that focuses more on what it does and not the technical implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as experimental/active development. Have a few questions:
- how does this interact with SEA (would it make sense or be possible to only eventually bundle the code cache and not the code?)
- What happens on overflows or tampering with the cache file? Is there a security concern here regarding passing different code through the cache file (since the hash is just a crc32)?
- How does this feature interact with loaders/require hooks that modify the file contents and in particular passing them in different order? (reading the code it should work but want to make sure)
src/compile_cache.cc
Outdated
// - <cache_file_1>: a hash of filename + module type | ||
// - <cache_file_2> | ||
// - <cache_file_3> | ||
bool CompileCacheHandler::InitializeDirectory(const std::string& dir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not blocking for this PR - we should have better user facing errors if there are permission issues creating/reading/writing the cache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not error at all, the idea is that if caching cannot be done, this is just a no-op. There can be some logging to help users figure out why the cache isn't hit to ensure better performance, but the cache not being hit only leads to "not speeding up module loading", and it's not an error (just like when you write code that V8 cannot optimize or have to deopt with, there is no error either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's literally caching I get that but when it won't work for users and they'll want to figure out why they'll ask us. It would be useful to at least trace it as as. warning or add DEBUG* like logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already traces. The handful of tests here are testing against the traces .
Currently SEA only supports bundling a single script and it's not compiled using the normal CJS loader, but there's already a dedicated option in the SEA config to enable code cache for that one file (
The cache will be silently ignored if the hash doesn't match. Under the Node.js threat model, code loaded from the file system is already trusted. If you can gain access to the file system, you might as well just replace the source file on disk that'll be executed and do whatever you want in that replaced file, instead of wasting all your time figuring out how to mutate the code cache that meets all the requirement below:
TBH it seems a bit silly for an attacker to waste all their effort on this when they can already just replace the source code on disk if they have access to the file system. Also, this is already what people have been doing in the wild via e.g. https://www.npmjs.com/package/v8-compile-cache with 12M downloads/week, so if there are any security concerns, well it's already out there for many years, and it's unlikely that the ecosystem will just stop relying on a capability like this.
If they are using the "official" hooks, e.g. |
Would it be feasible to add a way to enable this for the current process only, perhaps as a follow-up? For context Corepack uses |
Yeah I think we can discuss other ways to enable this (or a possible solution to enable it by default and pointing it to somewhere that's less likely to fill up the disk e.g. the tmpdir) in a follow-up. For the initial PR an opt-in environment variable should be enough. |
9a7e1ed
to
b45062b
Compare
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. Joyee: See the PR description |
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: nodejs#52535 Refs: nodejs#47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
Good discussion and a cool performance improvement @joyeecheung, congrats! Expressing some of my security concerns thoughts:
It would be helpful for security audit trail to record in some way (warning, trace, etc) that a cache entry failed matching. Other aspects worth considering, even if only through mentions in the documentation so that these risks are well communicated:
|
It can be traced using NODE_DEBUG_NATIVE=COMPILE_CACHE, see the tests
Only if they inherit the environment variables (child process
We can just skip it #52577 - the cache should be quiet by default, failures would just lead to no-ops.
If they already have access to the file system they might as well just modify the original module to run an infinite loop?
This is up to the user, it's currently opt-in via the environment variable NODE_COMPILE_CACHE, and the names of subdirectories change depending on the Node.js version and the filename and the module type. If an attacker can already get the filename of the module, they might as well just...modify the module? |
Thanks Joyee, appreciate all the input. |
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: #52535 Refs: #47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> PR-URL: #52293 Reviewed-By: Moshe Atlow <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: nodejs#52535 Refs: nodejs#47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: #52535 Refs: #47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> PR-URL: #52293 Reviewed-By: Moshe Atlow <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: nodejs#52535 Refs: nodejs#47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: #52535 Refs: #47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> PR-URL: #52465 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: #52535 Refs: #47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> PR-URL: #52465 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
This patch implements automatic on-disk code caching that can be enabled via an environment variable NODE_COMPILE_CACHE. When set, whenever Node.js compiles a CommonJS or a ECMAScript Module, it will use on-disk [V8 code cache][] persisted in the specified directory to speed up the compilation. This may slow down the first load of a module graph, but subsequent loads of the same module graph may get a significant speedup if the contents of the modules do not change. Locally, this speeds up loading of test/fixtures/snapshot/typescript.js from ~130ms to ~80ms. To clean up the generated code cache, simply remove the directory. It will be recreated the next time the same directory is used for `NODE_COMPILE_CACHE`. Compilation cache generated by one version of Node.js may not be used by a different version of Node.js. Cache generated by different versions of Node.js will be stored separately if the same directory is used to persist the cache, so they can co-exist. Caveat: currently when using this with V8 JavaScript code coverage, the coverage being collected by V8 may be less precise in functions that are deserialized from the code cache. It's recommended to turn this off when running tests to generate precise coverage. Implementation details: There is one cache file per module on disk. The directory layout is: - Compile cache directory (from NODE_COMPILE_CACHE) - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION - 2ea3424d: - 10860e5a: CRC32 hash of filename + module type - 431e9adc: ... - ... Inside the cache file, there is a header followed by the actual cache content: ``` [uint32_t] code size [uint32_t] code hash [uint32_t] cache size [uint32_t] cache hash ... compile cache content ... ``` When reading the cache file, we'll also check if the code size and code hash match the code that the module loader is loading and whether the cache size and cache hash match the file content read. If they don't match, or if V8 rejects the cache passed, we'll ignore the mismatch cache, and regenerate the cache after compilation succeeds and rewrite it to disk. PR-URL: #52535 Refs: #47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: TODO
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: #52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: #52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: #52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: #52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) #52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) #52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280 PR-URL: #52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280 PR-URL: nodejs#52768
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280 PR-URL: nodejs#52768
Original commit message: [compiler] reset script details in functions deserialized from code cache During the serialization of the code cache, V8 would wipe out the host-defined options, so after a script id deserialized from the code cache, the host-defined options need to be reset on the script using what's provided by the embedder when doing the deserializing compilation, otherwise the HostImportModuleDynamically callbacks can't get the data it needs to implement dynamic import(). Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780 Reviewed-by: Leszek Swirski <[email protected]> Commit-Queue: Joyee Cheung <[email protected]> Cr-Commit-Position: refs/heads/main@{#93323} Refs: v8/v8@cd10ad7 PR-URL: nodejs#52535 Refs: nodejs#47472 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> PR-URL: nodejs#52465 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Notable changes: buffer: * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428 dns: * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492 events,doc: * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618 lib, url: * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509 module: * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535 net: * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474 src: * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595 src,permission: * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730 test_runner: * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529 url: * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280 PR-URL: nodejs#52768
|
This is the initial PR for implementing the idea proposed in #47472. It takes inspiration from ccache, the v8-compile-cache npm package (but faster, supports ESM and replaces CJS loader moneky-patching in user land) and v8 code caching in Blink. The first commit is backported from https://chromium-review.googlesource.com/c/v8/v8/+/5401780 which has already landed in the upstream (this is required for import() to work in deserialized code). For more motivation and background of this PR, check out #47472
Local numbers:
There are more numbers from folks trying on an earlier iteration of this work (although this branch now hashes the code and the cache as an extra check, but crc32 should be fast enough that it's not too big of an overhead). For example from #47472 (comment)
tsc --version
can go from ~90ms to ~40ms,yarn --version
can go from ~190ms to ~135ms, from #47472 (comment)npm run echo
with ancc
-bundled npm may go from ~150ms to ~110mssummary
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.
When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk V8 code cache persisted in the specified directory
to speed up the compilation. This may slow down the first load of a
module graph, but subsequent loads of the same module graph may get
a significant speedup if the contents of the modules do not change.
Locally, this speeds up loading of test/fixtures/snapshot/typescript.js
from ~130ms to ~80ms.
To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
NODE_COMPILE_CACHE
.Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.
Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.
Implementation details:
There is one cache file per module on disk. The directory layout
is:
Inside the cache file, there is a header followed by the actual
cache content:
When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.
There are still a bunch of TODOs for this feature, some of them are documented as TODO comments. In addition we can look into idle-time code cache serialization/writing which is also done by Blink to reduce the impact on first load. But this should be good enough as an initial iteration.
Note: why CRC32? Because we already have it in the dependencies (in
zlib.h
), it's fast and can avoid enough collisions for this use case. We do have some existing hashing functions available via openssl but those won't be available on no-crypto builds. We can revisit if CRC32 turns out to be inadequate for this feature. We can also revisit the directory structure or use an actual db, but for now the file I/O cost doesn't really show up when I profile it locally. The remaining bottleneck is the repeated UTF8 encoding, which I plan to get rid of in a follow-up (because it needs to dance with monkey-patching).Refs: #47472