-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pin git repos to specific revisions #499
Conversation
git commits are still the latest. |
@strugee :
|
Fixes upstreams potentially being able to break our builds, and also makes builds more deterministic which helps with reproducible builds. Fixes linuxboot#498
397dc6f
to
7cbe743
Compare
@tlaurion weird, I could have sworn this worked when I filed the PR... in any case it should be fixed now; I just force-pushed an amended commit and I'm running a test build on my laptop right now. Although now that I look at this PR again I wonder if it's better to use submodules? |
Test build successfully cloned the |
@strugee: I'm wondering if pinning is the right direction for submodules generally. For example, Nitrokey/nitrokey-hotp-verification#5 will be merged soon and it will require from Heads maintainers to validate that commit ids are the latest. A better approach to me would be to validate that the building of modules occurs on top of git pulls to be on the latest versions for each git dependent submodules. I would prefer this to the current proposed approach, so that multiple CI builds can confirm the reproducibility of a Head's commit id for a specific produced rom. For example, coreboot will probably depend on a fixed commit ID when measured boot will be merged for all Head's supported boards, until a new coreboot release is made including those changes. What are your thoughts about it? |
@strugee Nitrokey hotp updated. I still think this is not optimal strategy. |
Hey, I'm sorry for the delayed response! It's midterms season for me at college so I've been kind of swamped. In any case:
I'm having trouble parsing this sentence, can you rephrase? The basic problem I see with what's in tree now is that there isn't a 1:1 correspondence between a Heads commit id and the resulting build artifact. Clearly that is problematic for reproducible builds because "built from xyz commit" is not enough to start confirming that a given binary was legitimately built from the original source code. There's other ways to work around that (e.g. we could embed the commit ids of the submodules in the final result, so you could extract them and use that information to reconstruct the source tree) but reproducible builds aren't the only problem. Having such a relation between commit id and binary has lots of other advantages I can think of too. E.g.:
Note too that a lot of these problems aren't problems immediately, but manifest over time as upstreams add new commits. I wonder if it would help if I added some automation to this PR? I can write a script that automatically checks that the vendored commit ids aren't out of date. We could even set up something where that script is run, the build is smoketested (since AFAICT there isn't anything beyond smoketesting in-tree at this juncture?), and the results are committed automatically. It's possible I am completely missing something though so if I am let me know. |
That all being said if we do end up wanting to go with a pinning approach I'm wondering if we should just nuke a lot of this custom code and directly use git submodules instead. |
@strugee : I'll try to be clearer. Right now, if a clone is done for a specific module, it isn't verified to be the latest, nor forced pulled again on a subsequent build, layaing there forever. I think the problem lies there more then anything else, letting builds be different from one machine to another, depending on what commit id was pulled into the modules being build. Making sure that pulls are done for each git dependent module on each make, would, IMOHO, resolve the reproducibility issues for a specific Heads commit id. Am I clearer? I agree that some tracing of those commit could be added (maybe in the ./build/$BOARD/hashes.txt?), but a make real.clean clears any differences that could right now happen when differences are observed between what should be reproducible builds and what is actually built. |
@strugee To create the .canary if it doesn't exist, we clone the repo once, after which we apply the patches, only once. I do not know how to implement this. But a better way would be, if the destination directory exists, to validate if it's on the latest commit and pull the changes. Thoughts? |
@strugee you're absolutely right that "there should be a 1:1 correspondence between a Heads commit id and the resulting build artifact". Anything that breaks that is a bug that needs to be fixed. I'm not a fan of git submodules, especially on things that have massive dependencies like the Linux kernel. This wouldn't be too bad if you could clone a specific hash, but shallow checkouts can't get anything other than the latest commit: https://twitter.com/qrs/status/956648929394905089
For things that are hosted on github or kernel.org, we can use the technique in #618 to download a specific version as a tar file and then both verify the sha256 of it as well as pin to a specific tag. This seems like a lower overhead way to handle it. Essentially we bypass all of git, unless you really want to work with a current head (which should be doable for devs, but not allowed in committed code). |
Wonderful, that's great to hear. My apologies for being unresponsive! |
Fixes upstreams potentially being able to break our builds, and also makes builds more deterministic which helps with reproducible builds.
Fixes #498