-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds on Linux #3043
Comments
@joyeecheung do you have any thoughts/suggestions on the snapshot front? @sxa I assume that it should be possible to update |
That would be my assumption yes. The previous issues and PRs suggest this was resolved, so we'd just need to understand what happened and potentially reimplement a fix for it. |
Pinging @warpfork for this thread. He's currently heads-down on building Warpforge and has a passion for reproducible builds and I was telling him that there's interest in making progress on this front for Node.js so there might be some potential for collaboration and the production of some tooling to both generate reproducible builds and also document & provide mechanisms for people to do it for themselves without all the pain that usually comes from doing it manually. So @sxa, meet the most excellent and clever @warpfork; maybe you two should chat. |
Thanks @rvagg - I've worked on reproducible build environments for another project, and the Node.js build is close other than those two issues above, so it should just require a little bit of knowledge about the Node processes involved in producing those two things to be able to resolve it, then we can possibly put some more robust things in place for making our builds reproducible by default, which I think should be the aim here. @warpfork DO you have much experience on non-Linux reproducible build work (I've so far only looked at Linux for Node.js) |
One source of indeterminism is here: Instead of a set a dict can be used here (with |
Is there a todo on this issue or can it be closed? |
Definitely still work to do and I still plan to look at it. |
Just tried four consecutive builds with the latest code and the This was tested with With snapshots enabled the builds are still non-reproducible. due to |
Looks like the break was between these two commits, so nodejs/node@1faf6f459f likely made it non-reproducible again.
FYI @joyeecheung (PR) @bnoordhuis (PR) I don't think I have enough knowledge of this code to be able to propose a safe solution here so looking for advice. |
@joyeecheung Would you be able to assist with the reproducibility of the snapshots now? I'm not sure who else might have good knowledge of the snapshot support in Node so anyone else might have to start from scratch. |
Thanks for the ping, not sure how I missed the earlier one. I diff'ed the generated snapshot locally and found ~20 lines of differences in the snapshot.cc generated (with |
With this patch the snapshot.cc difference is down to 13 lines. Still need to figure out the differences in the blob though. see diffdiff --git a/src/cleanup_queue-inl.h b/src/cleanup_queue-inl.h
index 5d9a56e6b0..d1fbd8241d 100644
--- a/src/cleanup_queue-inl.h
+++ b/src/cleanup_queue-inl.h
@@ -39,7 +39,9 @@ void CleanupQueue::Remove(Callback cb, void* arg) {
template <typename T>
void CleanupQueue::ForEachBaseObject(T&& iterator) const {
- for (const auto& hook : cleanup_hooks_) {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
+ for (const auto& hook : callbacks) {
BaseObject* obj = GetBaseObject(hook);
if (obj != nullptr) iterator(obj);
}
diff --git a/src/cleanup_queue.cc b/src/cleanup_queue.cc
index 6290b6796c..c0fcda2fac 100644
--- a/src/cleanup_queue.cc
+++ b/src/cleanup_queue.cc
@@ -5,7 +5,7 @@
namespace node {
-void CleanupQueue::Drain() {
+std::vector<CleanupQueue::CleanupHookCallback> CleanupQueue::GetOrdered() const {
// Copy into a vector, since we can't sort an unordered_set in-place.
std::vector<CleanupHookCallback> callbacks(cleanup_hooks_.begin(),
cleanup_hooks_.end());
@@ -20,6 +20,12 @@ void CleanupQueue::Drain() {
return a.insertion_order_counter_ > b.insertion_order_counter_;
});
+ return callbacks;
+}
+
+void CleanupQueue::Drain() {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
for (const CleanupHookCallback& cb : callbacks) {
if (cleanup_hooks_.count(cb) == 0) {
// This hook was removed from the `cleanup_hooks_` set during another
diff --git a/src/cleanup_queue.h b/src/cleanup_queue.h
index 64e04e1856..2ca333aca8 100644
--- a/src/cleanup_queue.h
+++ b/src/cleanup_queue.h
@@ -6,6 +6,7 @@
#include <cstddef>
#include <cstdint>
#include <unordered_set>
+#include <vector>
#include "memory_tracker.h"
@@ -66,6 +67,7 @@ class CleanupQueue : public MemoryRetainer {
uint64_t insertion_order_counter_;
};
+ std::vector<CleanupHookCallback> GetOrdered() const;
inline BaseObject* GetBaseObject(const CleanupHookCallback& callback) const;
// Use an unordered_set, so that we have efficient insertion and removal.
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index ecc295acdb..2ba6878a28 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -52,6 +52,7 @@ int main(int argc, char* argv[]) {
#endif // _WIN32
v8::V8::SetFlagsFromString("--random_seed=42");
+ v8::V8::SetFlagsFromString("--predictable");
v8::V8::SetFlagsFromString("--harmony-import-assertions");
return BuildSnapshot(argc, argv);
} |
Sounds like good progress - thanks @joyeecheung! |
Checking the snapshots again, another source of indeterminism comes from performance data (milestones, time origin etc.), the easiest way to fix it is probably discarding it before snapshot generation. But I'll need to check if/how they should be synchronized. |
With nodejs/node#48702 and nodejs/node#48708 and --predictable the differences are down to 8 places:
EDIT: we can just return some non-empty data for both BaseObject slots to fix 1. 2 probably need a V8 patch for us to customize how the slots should be serialized. Trying to work out a prototype. |
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: #48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]>
I've put https://ci.nodejs.org/job/reproducibility-test/ in place to test how reproducible the builds are on Linux/aarch64 - it will run weekly:
|
Thanks, I think when nodejs/node#48851 lands, we could also add another flag to display the |
Yeah that's probably more useful for the future :-) |
I landed the regression fix for my V8 patch, looking into backporting the V8 patches and rebasing nodejs/node#50983 - with the patches the snapshot part should be reproducible, though it seems there are some bits beyond snapshots that are not reproducible. (It seems #3043 (comment) matches my findings in #3043 (comment), I was using Ubuntu (don't remember the version now)) |
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: nodejs#48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]>
In nodejs/node#50983 the snapshots are made reproducible with a test that diffs the snapshots passing in the CI. Would need some reviews to push it forward though. |
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
With nodejs/node#50983 landed on Ubuntu I am able to get identical builds with |
This is an awesome result @joyeecheung! Thank you! I've just done a test on Linux/aarch64 and confirmed that two consecutive builds on Ubuntu 22.04 come out identical on the same machine (I was running without |
From a quick glance, the job needs to be using a newer compiler than gcc 9.4.0. |
Yep - sorted and the build is now good. I've also verified that it's reproducible whether or not you use @joyeecheung Would it be possible to put these fixes back to earlier Node versions, or will the V8 levels in those not be easy to convert? It'll be a good story for the next major release regardless though! |
For v22.x the changes should land cleanly or don't need too much work to backport. For 20.x, the necessary V8 API changes would be ABI breaking although I think it can still be backportable by either backporing the V8 API changes in a non-ABI breaking way (adding new signatures instead of appending parameters), or nulling out the pointers on our end before snapshot serialization (might be easier that way anyway?). For 18.x that might be harder since it's already quite different. |
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
Are the reproducibility tests only comparing doing a build twice on identical hardware? Node 22.4.0 is not reproducible in our testing on stagex, when building on two different ryzen machines with different specs. See my comment here: #3043 (comment) I tried |
Hmm, I am pretty sure it is related to the CPU features hash V8 adds to the code cache, which is encoded as a bit field including the following properties: https://chromium.googlesource.com/v8/v8/+/7c7c6b475c6edcd7ef863e1f668be3df55c56307/src/codegen/cpu-features.h#15 - so it's not strictly required that the hardware must be identical, but the differences need to be out of the probed set. We also add the cached version tag (which includes this bitfield) to the custom snapshot blob, but for the built-in snapshot at least we can remove it since we don't check it for built-in snapshot anyway. But the ones in the code cache would be harder to get rid of as V8 has some concerns about skipping the CPU feature test (see the discussions in https://chromium-review.googlesource.com/c/v8/v8/+/4905290). I wonder if we can convince the upstream to change the bits set in the code cache to be a combination of "CPU features required by this code cache" instead of "CPU features of the platform that compiles this code cache" (the two aren't necessarily the same and at least for now, the former is basically "none" because V8 doesn't include any optimized code in the code cache, though they want to reserve the ability to make it CPU-specific. But at least making it more specific about the actual requirement instead of doing a blind equality check would help our use case). |
Oh wait you are already using |
This only served as a preemptive check, but serializing this in the snapshot would make it unreproducible on different hardware. In the current cached data version tag, the V8 version can already be checked as part of the existing Node.js version check. The V8 flags aren't necessarily important for snapshot/code cache mismatches (only a small subset are), and the CPU features currently don't matter, so doing an exact match is stricter than necessary. Removing the check to help making the snapshot more reproducible on different hardware. PR-URL: #54122 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]>
This only served as a preemptive check, but serializing this in the snapshot would make it unreproducible on different hardware. In the current cached data version tag, the V8 version can already be checked as part of the existing Node.js version check. The V8 flags aren't necessarily important for snapshot/code cache mismatches (only a small subset are), and the CPU features currently don't matter, so doing an exact match is stricter than necessary. Removing the check to help making the snapshot more reproducible on different hardware. PR-URL: #54122 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]>
Confirmed I am finally able to build nodejs 22.7.0 reproducibly across multiple different systems/cpus: https://codeberg.org/stagex/stagex/pulls/95/files --without-node-snapshot is no longer an option in this release so it builds with it enabled, however the issue (as suspected earlier) was nodejs currently does not build some deps (icu, openssl, brotli, libev, nghttp2, c-ares) deterministically. I had to package all those myself separately and get those deterministic, then have node build against the system versions instead of in-tree versions, but it works! |
Yes. And it's done on aarch64 (chosen due to the fact we have machines with very high number of cores there which means it's feasible to run the test without ccache without them taking up too much machine time!)
That's useful to know - thank you. Although it's interesting that it's only showing up when building on different machines. Do you know if it's definitely a problem for all of those that you mentioned? |
This only served as a preemptive check, but serializing this in the snapshot would make it unreproducible on different hardware. In the current cached data version tag, the V8 version can already be checked as part of the existing Node.js version check. The V8 flags aren't necessarily important for snapshot/code cache mismatches (only a small subset are), and the CPU features currently don't matter, so doing an exact match is stricter than necessary. Removing the check to help making the snapshot more reproducible on different hardware. PR-URL: #54122 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]>
In the dim and distant past before global lockdowns there was some previous work to make the build process binary reproducible.
I had a play recently with the process (on Linux/aarch64 because they are the quickest machines I have access to) and I hit two issues:
debug-support.cc
is nondeterministic - the definitions are not always in the same order within the file. It is generated by https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py - If I copy in a consistent one the builds can be reproducible.configure --without-node-snapshot
), which suggests that the changes made in node_code_cache.cc and node_snapshot.cc generation is unreproducible node#29108 are no longer valid for the current release.Tagging @ChALkeR who put the original PR in (albeit three years ago!)
The text was updated successfully, but these errors were encountered: