Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible builds on Linux #3043

Open
sxa opened this issue Oct 1, 2022 · 48 comments
Open

Reproducible builds on Linux #3043

sxa opened this issue Oct 1, 2022 · 48 comments

Comments

@sxa
Copy link
Member

sxa commented Oct 1, 2022

In the dim and distant past before global lockdowns there was some previous work to make the build process binary reproducible.

I had a play recently with the process (on Linux/aarch64 because they are the quickest machines I have access to) and I hit two issues:

  1. The production of debug-support.cc is nondeterministic - the definitions are not always in the same order within the file. It is generated by https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py - If I copy in a consistent one the builds can be reproducible.
  2. I had to disable the snapshot functionality (configure --without-node-snapshot), which suggests that the changes made in node_code_cache.cc and node_snapshot.cc generation is unreproducible node#29108 are no longer valid for the current release.

Tagging @ChALkeR who put the original PR in (albeit three years ago!)

@mhdawson
Copy link
Member

@joyeecheung do you have any thoughts/suggestions on the snapshot front?

@sxa I assume that it should be possible to update https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py so that it generates data in a deterministic manner (maybe sorting them at some point) and it's really just a matter of doing that versus there being some fundamental reason whey it might not be possible?

@sxa
Copy link
Member Author

sxa commented Oct 25, 2022

That would be my assumption yes. The previous issues and PRs suggest this was resolved, so we'd just need to understand what happened and potentially reimplement a fix for it.

@rvagg
Copy link
Member

rvagg commented Oct 26, 2022

Pinging @warpfork for this thread. He's currently heads-down on building Warpforge and has a passion for reproducible builds and I was telling him that there's interest in making progress on this front for Node.js so there might be some potential for collaboration and the production of some tooling to both generate reproducible builds and also document & provide mechanisms for people to do it for themselves without all the pain that usually comes from doing it manually.

So @sxa, meet the most excellent and clever @warpfork; maybe you two should chat.

@sxa
Copy link
Member Author

sxa commented Oct 26, 2022

Thanks @rvagg - I've worked on reproducible build environments for another project, and the Node.js build is close other than those two issues above, so it should just require a little bit of knowledge about the Node processes involved in producing those two things to be able to resolve it, then we can possibly put some more robust things in place for making our builds reproducible by default, which I think should be the aim here.

@warpfork DO you have much experience on non-Linux reproducible build work (I've so far only looked at Linux for Node.js)

@StefanBruens
Copy link

1. The production of `debug-support.cc` is nondeterministic - the definitions are not always in the same order within the file. It is generated by https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py - If I copy in a consistent one the builds can be reproducible.

One source of indeterminism is here:

https://github.com/nodejs/node/blob/ab064d12b79d14a3d02ba420138cc9d24169a951/deps/v8/tools/gen-postmortem-metadata.py#L717

Instead of a set a dict can be used here (with None values), as since Python 3.7 dict preserves order. Then use for line in dict.fromkeys(lines): to output the lines in order.

@cclauss
Copy link
Contributor

cclauss commented Apr 11, 2023

Is there a todo on this issue or can it be closed?

@sxa
Copy link
Member Author

sxa commented Apr 12, 2023

Definitely still work to do and I still plan to look at it.

@sxa
Copy link
Member Author

sxa commented Apr 12, 2023

Just tried four consecutive builds with the latest code and the debug-support.cc came out the same each time. There have been four V8 version bumps since I last tested (10.1.124.6 ->11.3.244.4) so either I'm lucky today or something in there has made things better, but potentially the fix in the earlier comment will no longer required.

This was tested with ./configure --without-node-snapshot and setting SOURCE_DATE_EPOCH=0 in the environment.

With snapshots enabled the builds are still non-reproducible. due to node_snapshot.cc (Note that node_code_cache.cc referred to in nodejs/node#29108 is no longer a problem on the Linux system being used.

@sxa
Copy link
Member Author

sxa commented Apr 12, 2023

Looks like the break was between these two commits, so nodejs/node@1faf6f459f likely made it non-reproducible again.

* 1faf6f459f 2019-04-21 | src: snapshot Environment upon instantiation (HEAD, refs/bisect/bad) [Joyee Cheung]
* f04538761f 2020-04-30 | tools: enable Node.js command line flags in node_mksnapshot (refs/bisect/good-f04538761f5bb3c334d3c8d16d093ac0916ff3bc) [Joyee Cheung]

FYI @joyeecheung (PR) @bnoordhuis (PR)

I don't think I have enough knowledge of this code to be able to propose a safe solution here so looking for advice.

@sxa
Copy link
Member Author

sxa commented Jun 23, 2023

@joyeecheung Would you be able to assist with the reproducibility of the snapshots now? I'm not sure who else might have good knowledge of the snapshot support in Node so anyone else might have to start from scratch.

@joyeecheung
Copy link
Member

Thanks for the ping, not sure how I missed the earlier one. I diff'ed the generated snapshot locally and found ~20 lines of differences in the snapshot.cc generated (with --predictable, which we should also set for mksnapshot), most notably the binding data are initialized in different order, not sure why but they are supposed to be initialized in the same order. I'll look into why that happens.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 23, 2023

With this patch the snapshot.cc difference is down to 13 lines. Still need to figure out the differences in the blob though.

see diff
diff --git a/src/cleanup_queue-inl.h b/src/cleanup_queue-inl.h
index 5d9a56e6b0..d1fbd8241d 100644
--- a/src/cleanup_queue-inl.h
+++ b/src/cleanup_queue-inl.h
@@ -39,7 +39,9 @@ void CleanupQueue::Remove(Callback cb, void* arg) {
 
 template <typename T>
 void CleanupQueue::ForEachBaseObject(T&& iterator) const {
-  for (const auto& hook : cleanup_hooks_) {
+  std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
+  for (const auto& hook : callbacks) {
     BaseObject* obj = GetBaseObject(hook);
     if (obj != nullptr) iterator(obj);
   }
diff --git a/src/cleanup_queue.cc b/src/cleanup_queue.cc
index 6290b6796c..c0fcda2fac 100644
--- a/src/cleanup_queue.cc
+++ b/src/cleanup_queue.cc
@@ -5,7 +5,7 @@
 
 namespace node {
 
-void CleanupQueue::Drain() {
+std::vector<CleanupQueue::CleanupHookCallback> CleanupQueue::GetOrdered() const {
   // Copy into a vector, since we can't sort an unordered_set in-place.
   std::vector<CleanupHookCallback> callbacks(cleanup_hooks_.begin(),
                                              cleanup_hooks_.end());
@@ -20,6 +20,12 @@ void CleanupQueue::Drain() {
               return a.insertion_order_counter_ > b.insertion_order_counter_;
             });
 
+  return callbacks;
+}
+
+void CleanupQueue::Drain() {
+  std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
   for (const CleanupHookCallback& cb : callbacks) {
     if (cleanup_hooks_.count(cb) == 0) {
       // This hook was removed from the `cleanup_hooks_` set during another
diff --git a/src/cleanup_queue.h b/src/cleanup_queue.h
index 64e04e1856..2ca333aca8 100644
--- a/src/cleanup_queue.h
+++ b/src/cleanup_queue.h
@@ -6,6 +6,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <unordered_set>
+#include <vector>
 
 #include "memory_tracker.h"
 
@@ -66,6 +67,7 @@ class CleanupQueue : public MemoryRetainer {
     uint64_t insertion_order_counter_;
   };
 
+  std::vector<CleanupHookCallback> GetOrdered() const;
   inline BaseObject* GetBaseObject(const CleanupHookCallback& callback) const;
 
   // Use an unordered_set, so that we have efficient insertion and removal.
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index ecc295acdb..2ba6878a28 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -52,6 +52,7 @@ int main(int argc, char* argv[]) {
 #endif  // _WIN32
 
   v8::V8::SetFlagsFromString("--random_seed=42");
+  v8::V8::SetFlagsFromString("--predictable");
   v8::V8::SetFlagsFromString("--harmony-import-assertions");
   return BuildSnapshot(argc, argv);
 }

@sxa
Copy link
Member Author

sxa commented Jun 30, 2023

Sounds like good progress - thanks @joyeecheung!

@joyeecheung
Copy link
Member

Checking the snapshots again, another source of indeterminism comes from performance data (milestones, time origin etc.), the easiest way to fix it is probably discarding it before snapshot generation. But I'll need to check if/how they should be synchronized.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 9, 2023

With nodejs/node#48702 and nodejs/node#48708 and --predictable the differences are down to 8 places:

  1. 7 of which coming from binding data's embedder data slot
  2. 1 of which coming from the context embedder data slot

We probably need some V8 patches to make it deterministic (my guess is, v8 doesn't actually need to copy the exact values of those slots into the snapshot, they are only there as place holders, so v8 should probably just copy the same amount of 0s for those - still working locally to see if this is correct)

EDIT: we can just return some non-empty data for both BaseObject slots to fix 1. 2 probably need a V8 patch for us to customize how the slots should be serialized. Trying to work out a prototype.

nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Jul 12, 2023
Previously we just rely on the unordered_set order to iterate over
the BaseObjects, which is not deterministic.

The iteration is only used in printing, verification, and snapshot
generation. In the first two cases the performance overhead of
sorting does not matter because they are only used for debugging.
In the last case the determinism is more important than the trivial
overhead of sorting. So this patch makes the iteration deterministic
by sorting the set first, as what is already being done when we
drain the queue.

PR-URL: #48702
Refs: nodejs/build#3043
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
juanarbol pushed a commit to nodejs/node that referenced this issue Jul 13, 2023
Previously we just rely on the unordered_set order to iterate over
the BaseObjects, which is not deterministic.

The iteration is only used in printing, verification, and snapshot
generation. In the first two cases the performance overhead of
sorting does not matter because they are only used for debugging.
In the last case the determinism is more important than the trivial
overhead of sorting. So this patch makes the iteration deterministic
by sorting the set first, as what is already being done when we
drain the queue.

PR-URL: #48702
Refs: nodejs/build#3043
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Jul 20, 2023
Previously we cache the time origin for the milestones in the user
land, and refresh it at pre-execution. As result the time origin
gets serialized into the snapshot and is therefore not deterministic.
Now we store it in the milestone array as an internal value and
reset the milestones at serialization time instead of
deserialization time. This improves the determinism of the snapshot.

Drive-by: remove the unused MarkMilestone() binding.
PR-URL: #48708
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
rluvaton pushed a commit to rluvaton/node that referenced this issue Jul 21, 2023
Previously we cache the time origin for the milestones in the user
land, and refresh it at pre-execution. As result the time origin
gets serialized into the snapshot and is therefore not deterministic.
Now we store it in the milestone array as an internal value and
reset the milestones at serialization time instead of
deserialization time. This improves the determinism of the snapshot.

Drive-by: remove the unused MarkMilestone() binding.
PR-URL: nodejs#48708
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
pluris pushed a commit to pluris/node that referenced this issue Aug 6, 2023
Previously we cache the time origin for the milestones in the user
land, and refresh it at pre-execution. As result the time origin
gets serialized into the snapshot and is therefore not deterministic.
Now we store it in the milestone array as an internal value and
reset the milestones at serialization time instead of
deserialization time. This improves the determinism of the snapshot.

Drive-by: remove the unused MarkMilestone() binding.
PR-URL: nodejs#48708
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
pluris pushed a commit to pluris/node that referenced this issue Aug 7, 2023
Previously we cache the time origin for the milestones in the user
land, and refresh it at pre-execution. As result the time origin
gets serialized into the snapshot and is therefore not deterministic.
Now we store it in the milestone array as an internal value and
reset the milestones at serialization time instead of
deserialization time. This improves the determinism of the snapshot.

Drive-by: remove the unused MarkMilestone() binding.
PR-URL: nodejs#48708
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
@sxa
Copy link
Member Author

sxa commented Aug 11, 2023

I've put https://ci.nodejs.org/job/reproducibility-test/ in place to test how reproducible the builds are on Linux/aarch64 - it will run weekly:

  1. It will compare two builds with --without-node-snapshot and fail the job if they are not identical
  2. It will compare two builds including snapshots and give a yellow warning status if only one object file is different (we expect node_snapshot.o to differ) or a red failure status if there are more differences.

@joyeecheung
Copy link
Member

Thanks, I think when nodejs/node#48851 lands, we could also add another flag to display the node_snapshot.cc in a more human readable way, then the CI can just diff the two out/Release/gen/node_snapshot.cc for a nicer output.

@sxa
Copy link
Member Author

sxa commented Aug 11, 2023

Thanks, I think when nodejs/node#48851 lands, we could also add another flag to display the node_snapshot.cc in a more human readable way, then the CI can just diff the two out/Release/gen/node_snapshot.cc for a nicer output.

Yeah that's probably more useful for the future :-)

@joyeecheung
Copy link
Member

joyeecheung commented Mar 20, 2024

I landed the regression fix for my V8 patch, looking into backporting the V8 patches and rebasing nodejs/node#50983 - with the patches the snapshot part should be reproducible, though it seems there are some bits beyond snapshots that are not reproducible. (It seems #3043 (comment) matches my findings in #3043 (comment), I was using Ubuntu (don't remember the version now))

debadree25 pushed a commit to debadree25/node that referenced this issue Apr 15, 2024
To improve determinism of snapshot generation, add --predictable
to the V8 flags used to initialize a process launched to generate
snapshot. Also add a kGeneratePredictableSnapshot flag
to ProcessInitializationFlags for this and moves the configuration
of these flags into node::InitializeOncePerProcess() so that
it can be shared by embedders.

PR-URL: nodejs#48749
Refs: nodejs/build#3043
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
@joyeecheung
Copy link
Member

In nodejs/node#50983 the snapshots are made reproducible with a test that diffs the snapshots passing in the CI. Would need some reviews to push it forward though.

nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Jun 14, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Jun 14, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Jun 14, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@joyeecheung
Copy link
Member

With nodejs/node#50983 landed on Ubuntu I am able to get identical builds with SOURCE_DATE_EPOCH=0 and ./configure --ninja

@sxa
Copy link
Member Author

sxa commented Jun 17, 2024

With nodejs/node#50983 landed on Ubuntu I am able to get identical builds with SOURCE_DATE_EPOCH=0 and ./configure --ninja

This is an awesome result @joyeecheung! Thank you! I've just done a test on Linux/aarch64 and confirmed that two consecutive builds on Ubuntu 22.04 come out identical on the same machine (I was running without --ninja and without ccache). I'll run a few extra tests but this feels like a great achievement and hopefully we can do these verifications continuously to trap if we get any regressions with it in the future. It looks like by reproducibility test job isn't running properly just now but I'll look at getting that working again.

@richardlau
Copy link
Member

It looks like by reproducibility test job isn't running properly just now but I'll look at getting that working again.

From a quick glance, the job needs to be using a newer compiler than gcc 9.4.0.

@sxa
Copy link
Member Author

sxa commented Jun 17, 2024

It looks like by reproducibility test job isn't running properly just now but I'll look at getting that working again.

From a quick glance, the job needs to be using a newer compiler than gcc 9.4.0.

Yep - sorted and the build is now good.

I've also verified that it's reproducible whether or not you use ccache so that's good too (I'm aware of some issues seen at other projects with ccache enabled, so I needed to check it)

@joyeecheung Would it be possible to put these fixes back to earlier Node versions, or will the V8 levels in those not be easy to convert? It'll be a good story for the next major release regardless though!

@joyeecheung
Copy link
Member

joyeecheung commented Jun 17, 2024

For v22.x the changes should land cleanly or don't need too much work to backport. For 20.x, the necessary V8 API changes would be ABI breaking although I think it can still be backportable by either backporing the V8 API changes in a non-ABI breaking way (adding new signatures instead of appending parameters), or nulling out the pointers on our end before snapshot serialization (might be easier that way anyway?). For 18.x that might be harder since it's already quite different.

targos pushed a commit to nodejs/node that referenced this issue Jun 20, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Jun 20, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Jun 20, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
EliphazBouye pushed a commit to EliphazBouye/node that referenced this issue Jun 20, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
EliphazBouye pushed a commit to EliphazBouye/node that referenced this issue Jun 20, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
EliphazBouye pushed a commit to EliphazBouye/node that referenced this issue Jun 20, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this issue Jun 22, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this issue Jun 22, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this issue Jun 22, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@lrvick
Copy link

lrvick commented Jul 18, 2024

Are the reproducibility tests only comparing doing a build twice on identical hardware?

Node 22.4.0 is not reproducible in our testing on stagex, when building on two different ryzen machines with different specs.

See my comment here: #3043 (comment)

I tried CFLAGS="-march=x86-64 -mtune=generic -O2" to try to avoid any cpu-specific optimizations but it seems this was not enough.

@joyeecheung
Copy link
Member

Hmm, I am pretty sure it is related to the CPU features hash V8 adds to the code cache, which is encoded as a bit field including the following properties: https://chromium.googlesource.com/v8/v8/+/7c7c6b475c6edcd7ef863e1f668be3df55c56307/src/codegen/cpu-features.h#15 - so it's not strictly required that the hardware must be identical, but the differences need to be out of the probed set.

We also add the cached version tag (which includes this bitfield) to the custom snapshot blob, but for the built-in snapshot at least we can remove it since we don't check it for built-in snapshot anyway. But the ones in the code cache would be harder to get rid of as V8 has some concerns about skipping the CPU feature test (see the discussions in https://chromium-review.googlesource.com/c/v8/v8/+/4905290). I wonder if we can convince the upstream to change the bits set in the code cache to be a combination of "CPU features required by this code cache" instead of "CPU features of the platform that compiles this code cache" (the two aren't necessarily the same and at least for now, the former is basically "none" because V8 doesn't include any optimized code in the code cache, though they want to reserve the ability to make it CPU-specific. But at least making it more specific about the actual requirement instead of doing a blind equality check would help our use case).

@joyeecheung
Copy link
Member

Oh wait you are already using --without-node-snapshot, then I am not quite sure what else is varying - though according to #3043 (comment) other than libnode it comes from libnghttp2, libicuucx and libucii18n and libicutools

nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Aug 23, 2024
This only served as a preemptive check, but serializing this in
the snapshot would make it unreproducible on different hardware.
In the current cached data version tag, the V8 version can already
be checked as part of the existing Node.js version check. The V8
flags aren't necessarily important for snapshot/code cache mismatches
(only a small subset are), and the CPU features currently don't
matter, so doing an exact match is stricter than necessary.
Removing the check to help making the snapshot more reproducible on
different hardware.

PR-URL: #54122
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Aug 25, 2024
This only served as a preemptive check, but serializing this in
the snapshot would make it unreproducible on different hardware.
In the current cached data version tag, the V8 version can already
be checked as part of the existing Node.js version check. The V8
flags aren't necessarily important for snapshot/code cache mismatches
(only a small subset are), and the CPU features currently don't
matter, so doing an exact match is stricter than necessary.
Removing the check to help making the snapshot more reproducible on
different hardware.

PR-URL: #54122
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
@lrvick
Copy link

lrvick commented Aug 26, 2024

Confirmed I am finally able to build nodejs 22.7.0 reproducibly across multiple different systems/cpus:

https://codeberg.org/stagex/stagex/pulls/95/files

--without-node-snapshot is no longer an option in this release so it builds with it enabled, however the issue (as suspected earlier) was nodejs currently does not build some deps (icu, openssl, brotli, libev, nghttp2, c-ares) deterministically.

I had to package all those myself separately and get those deterministic, then have node build against the system versions instead of in-tree versions, but it works!

@sxa
Copy link
Member Author

sxa commented Aug 27, 2024

Are the reproducibility tests only comparing doing a build twice on identical hardware?

Yes. And it's done on aarch64 (chosen due to the fact we have machines with very high number of cores there which means it's feasible to run the test without ccache without them taking up too much machine time!)

nodejs currently does not build some deps (icu, openssl, brotli, libev, nghttp2, c-ares) deterministically.

That's useful to know - thank you. Although it's interesting that it's only showing up when building on different machines. Do you know if it's definitely a problem for all of those that you mentioned?

RafaelGSS pushed a commit to nodejs/node that referenced this issue Aug 30, 2024
This only served as a preemptive check, but serializing this in
the snapshot would make it unreproducible on different hardware.
In the current cached data version tag, the V8 version can already
be checked as part of the existing Node.js version check. The V8
flags aren't necessarily important for snapshot/code cache mismatches
(only a small subset are), and the CPU features currently don't
matter, so doing an exact match is stricter than necessary.
Removing the check to help making the snapshot more reproducible on
different hardware.

PR-URL: #54122
Refs: nodejs/build#3043
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants