Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducable builds #589

Closed
rvagg opened this issue Jan 6, 2017 · 9 comments
Closed

Reproducable builds #589

rvagg opened this issue Jan 6, 2017 · 9 comments

Comments

@rvagg
Copy link
Member

rvagg commented Jan 6, 2017

This is a task for someone to pick up who has the time and interest in this area. It's likely not going to be a small job even to get to a proposal of what we need to do.

  1. Research what is possible and required for us to produce reproducable builds
  2. Draft a proposal for how we need to change our build architecture and published artifacts to share reproducible build data with the public so anyone can compile binaries that are byte-for-byte identical to what we produce.

Some starting points: https://reproducible-builds.org/ and the work Debian has been doing to switch their published artifacts to reproducables: https://wiki.debian.org/ReproducibleBuilds

@gibfahn
Copy link
Member

gibfahn commented Feb 1, 2017

This node-gyp issue seems related: nodejs/node-gyp#1100

@Trott
Copy link
Member

Trott commented May 23, 2018

Still something we'd like to do, I imagine? Should we put this on the agenda for a meeting or something to try to get some traction? Maybe this is an opportunity for someone to be a mentor to an eager individual via the mentorship program that's just getting started?

@lrvick
Copy link

lrvick commented Mar 22, 2019

My org is very interested in this. I have done reproducible build work for other projects, and even a list of known issues would go a long way to help me know what path to take.

@refack
Copy link
Contributor

refack commented Mar 22, 2019

Hello @lrvick, would you be willing to help us and break down the items required for us to achieve this? That would be a great help, and very much appreciated.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 13, 2019

I took a look at this.

The benefit of introducing baseline reproducible builds would not only meaningful to ensure trust, but also for other scenarios, e.g.:

  1. Once builds in a certain environment are reproducible, we could detect when and how switching/updating the environment affects build result.
  2. It could be useful for testing -- tooling changes often shouldn't affect the build result, and if they do -- something wrong happened.

I checked what happens with two subsequent builds now on mac in the same environment, and things are not that bad:

  1. File modification timestamps go into *.pyc files and affect the build result. That could be negated by targeting reproducibility for builds from source tarballs for now (which preserve timestamps), and postponing git checkouts.
  2. Environment should be controlled -- compiler versions etc. affect the build result. This is easiest controlled on Linux, but that also is not what acts a direct blocker here.
  3. Build path affects generated makefiles. Could be negated by (manually) fixing a specific build path, e.g. /tmp/nodejs-build for Linux.
  4. Two generated *.cc files differ each time: node_code_cache.cc and node_snapshot.cc. Those are not reproducible (and are not directly human-readable), we should target that first, I believe. The latter one could be disabled by a config option.
  5. All generated binary files are different, while object files all match except for those two files above. Something is happening on the linker stage. Not yet confirmed this on Linux.

Imo, we could target the first step baseline for reproducible builds for now as: fixed environment, Linux-only (easiest to control environment on), from a fixed source tgz.

That baseline should not be very hard, I presume, and even that could be valuable on its own, and could result in producing reproducible builds for Linux releases once the environment is fixated.

That should be probably targeted at nodejs/node, not nodejs/build, though.

The full list of different files for two consecutive builds on the same mac setup, same dir, same tgz source:

Files node.r1/node and node.r2/node differ
Files node.r1/out/Release/bytecode_builtins_list_generator and node.r2/out/Release/bytecode_builtins_list_generator differ
Files node.r1/out/Release/cctest and node.r2/out/Release/cctest differ
Files node.r1/out/Release/gen-regexp-special-case and node.r2/out/Release/gen-regexp-special-case differ
Files node.r1/out/Release/genccode and node.r2/out/Release/genccode differ
Files node.r1/out/Release/genrb and node.r2/out/Release/genrb differ
Files node.r1/out/Release/iculslocs and node.r2/out/Release/iculslocs differ
Files node.r1/out/Release/icupkg and node.r2/out/Release/icupkg differ
Files node.r1/out/Release/mkcodecache and node.r2/out/Release/mkcodecache differ
Files node.r1/out/Release/mksnapshot and node.r2/out/Release/mksnapshot differ
Files node.r1/out/Release/node and node.r2/out/Release/node differ
Files node.r1/out/Release/node_mksnapshot and node.r2/out/Release/node_mksnapshot differ
Files node.r1/out/Release/obj/gen/node_code_cache.cc and node.r2/out/Release/obj/gen/node_code_cache.cc differ
Files node.r1/out/Release/obj/gen/node_snapshot.cc and node.r2/out/Release/obj/gen/node_snapshot.cc differ
Files node.r1/out/Release/obj.target/node/gen/node_code_cache.o and node.r2/out/Release/obj.target/node/gen/node_code_cache.o differ
Files node.r1/out/Release/obj.target/node/gen/node_snapshot.o and node.r2/out/Release/obj.target/node/gen/node_snapshot.o differ
Files node.r1/out/Release/openssl-cli and node.r2/out/Release/openssl-cli differ
Files node.r1/out/Release/torque and node.r2/out/Release/torque differ

tldr: there seem to be two issues in the nodejs/node build process that block reproducible builds for now, we should fix that first, and fixing those two should be valuable by itself.

/cc @wladmis

@ChALkeR
Copy link
Member

ChALkeR commented Aug 13, 2019

Just tested on Linux (with same restrictions), results:

Files node-v12.8.0-1/node and node-v12.8.0-2/node differ
Files node-v12.8.0-1/out/Release/node and node-v12.8.0-2/out/Release/node differ
Files node-v12.8.0-1/out/Release/obj/gen/node_code_cache.cc and node-v12.8.0-2/out/Release/obj/gen/node_code_cache.cc differ
Files node-v12.8.0-1/out/Release/obj/gen/node_snapshot.cc and node-v12.8.0-2/out/Release/obj/gen/node_snapshot.cc differ
Files node-v12.8.0-1/out/Release/obj.target/node/gen/node_code_cache.o and node-v12.8.0-2/out/Release/obj.target/node/gen/node_code_cache.o differ
Files node-v12.8.0-1/out/Release/obj.target/node/gen/node_snapshot.o and node-v12.8.0-2/out/Release/obj.target/node/gen/node_snapshot.o differ

Looks like the linker issue could be macOS-specific and does not cause problems on my Linux setup, but we have an issue with unreproducible node_code_cache.cc and node_snapshot.cc generation. That is a blocker for reproducible builds afaik, and it should be fixed first.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 20, 2019

nodejs/node#29108 is now fixed (thanks, @bnoordhuis), and two consequent builds from the same archive in the same dir on linux now produce identical results (given the same environment).

The next step should be fixing the enviroment and the build path.

  • Removing the "unpack from archive" restriction produces differences only in *.pyc files (due to source file modification timestamps being present there) that are artifacts of the build process are not distributed. That restriction could be safely ignored, I suppose.
  • Removing the "build from the same path" restriction produces differences in generated binaries (which is a no-go). That is something that could perhaps be fixed on Node.js core side, but I am not yet sure if that is a blocker here.

@sam-github
Copy link
Contributor

@ChALkeR is this ongoing? complete? Closable as stale?

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants