-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds #16044
Comments
The index could be reproducible if the artifacts were associated using the upcoming OCI subject/referrers API rather than injecting it directly in the index.
The push from groups like Scorecard is to embed the digest directly in the Dockerfile and maintain the pin with a tool like renovate or dependabot. This gives reproducibility with controlled updates.
I'd love to see work done on a user tool that can take any built image, or git commit, and verify the output. We can assist this work by annotating the images with details like the git commit used to build them and the |
Yes, but I guess adoption of the spec v1.1 in DOI is likely to take years, as user's site-local mirrors are mostly not ready for v1.1 yet
Yes, but I wonder that maintainers might not want to see tons of bump-up PRs to happen every day. |
OCI 1.0 conformant mirrors already support it by use of a fallback tag. It's similar to how sigstore implements their attestations today.
There's a constant bump-up already. One is invisible and hurts reproducibility, while the other is documented with a Git commit. |
The first PR: |
I don't have the bandwidth to respond to all of this right now, so I'll focus on the bit that concerns me the most: the service provided at At a high level, I think the best place to start here (especially as it would be the least disruptive) is making useful/interesting layers reproducible. For example, docker-library/golang@46f40bd is an old PoC that @yosifkit worked on which provides that for the Go-providing layer of the Go images that I've been planning to revisit in the near future. |
This proposal does NOT enable Could you take a look at the PR again ?
In the case of |
BTW Would it be acceptable to add Ubuntu variants to DOI? |
@tianon Could you take a look? 🙏 |
I'm only an interested bystander but I've long been looking for reproducible images, so that, using them as base images, I can create reproducible images for my projects, too. @AkihiroSuda Would your proposal also include publishing those images on the daily, tagged by date (i.e. corresponding to the snapshot package sources), like @tianon currently does for Debian on a ~monthly basis? This would be incredibly useful downstream as one could then simply bump the date tag of the base image on the daily to upgrade dependencies & protect against vulnerabilities while still preserving reproducibility. |
This is an orthogonal topic. |
FWIW, I've went through the not-yet-fully-smooth process of making the kas build containers for Yocto and Isar reproducible (https://github.com/siemens/kas/commits/next). Our containers as based on Debian, and it would be great to see our dependencies reproducible as well. BTW, I had some fun with understanding differences due to cached layers that are apparently not rewritten timestamp-wise. Is that a known issue of BuildKit? I'm now refraining from caching layers persistently on GH and from using the cache completely (unfortunately) when validating locally. |
Could you open an issue in https://github.com/moby/buildkit/issues ? |
Done: moby/buildkit#4748 |
I just came across http://snapshot-cloudflare.debian.org . Given that @AkihiroSuda's repro-sources-list.sh uses it, I take it that mirror can be used more safely for the purposes of image (re-)builds? |
Yes, I do not think that causes much (if any) less load on the upstream infrastructure given that I do not believe it caches the heavy queries, but cannot be certain as it does not seem to be officially documented or even discussed anywhere public. There have been recent efforts within the project to reinvigorate support for the snapshot service (see https://lists.debian.org/debian-snapshot/2024/03/msg00000.html for the most recent posted notes) but they are going to take time. |
This seems still slow. My current suggestion is to just build the upstream DOI images without pinning, and let downstream reproducers hook the Dockerfile to use |
Here is the first batch of the PRs:
Is there anything left I have to do to get these PRs merged? I saw a comment about |
I'm sorry, this is still on my TODO list, but it is admittedly not a very high priority at the current time. |
@tianon I appreciate your hard work, and I know you have been very busy, but could you allocate one minute to help understanding your comment about I understand that reviewing and merging PRs may take a longer time, but I want to make sure that we are on the same direction. |
Is there anything I can do to keep this actionable? 🙏 |
Updated the PRs to take SOURCE_DATE_EPOCH="$(find /usr/src/bash -type f -exec stat -c '%Y' {} + | sort -nr | head -n1)"
Let me know if I'm still missing something to get these PRs merged 🙏 |
Also submitted a merge request to add a guide to https://reproducible-builds.org/ : |
Unfortunately all debian images since |
Looks like we should just use https://snapshot.debian.org/ now |
The ML thread suggests to use Anyways, all images since |
dpkgs can be also cached to an OCI registry or whatever (without altering Dockerfile), if this PR can be merged: Still I don't know what is the blocker to get this one (and other PRs) merged, though. |
As far as I have seen, the folks maintaining snapshot do not officially maintain (nor recommend for long-term usage) any hostname/URL other than the canonical snapshot.debian.org, which as of a few weeks ago (see https://lists.debian.org/debian-snapshot/2024/07/msg00000.html for example), includes the new snapshot-mlm-01.debian.org server in the official rotation: $ dig snapshot-mlm-01.debian.org +short
185.213.153.170
$ dig snapshot.debian.org +short
185.17.185.185
185.213.153.170 As I noted in #16044 (comment) and subsequently confirmed via IRC with the folks currently maintaining the snapshot service, I don't know what that snapshot-cloudflare URL is/was, but best guess is a partially implemented PoC that was never completed (and definitely never official advertised/recommended anywhere I'm aware of or can find, even searching old mailing list archives). |
Right, only the canonical Anyway, this only matters for third-parties who want to verify reproducibility of upstream images ( with a dockerfile hook proposed in moby/buildkit#4669 ). The PRs for the upstream DOI builds do not need to use
|
Opened a PR to set |
What's the current blocker for the PRs (linked in the OP)? |
Looks like snapshot.debian.org recently(?) introduced relatively strict rate limiting. Or at least our pipelines suddenly started failing this morning and our organization is fairly small (a few dozen Docker builds per day and most builds are cached, anyway). |
Yes, this server is quite slow and unstable. |
Well... If your products should use a stable baseline as well, you also have to use it there. I'm currently working together with the people behind snapshot.d.o to improve the situation. For details, see: |
Dockerfile PRs:
No PR is needed for the following repos:
meta-script PR:
Enabling reproducible builds for Docker Official Images will improve traceability of the image origins and will help assessment of supply chain security of the images.
I also talked about this at DockerCon 2023: https://medium.com/nttlabs/dockercon-2023-reproducible-builds-with-buildkit-for-software-supply-chain-security-0e5aedd1aaa7
Scope of reproduction
The digests of the image manifest blobs (and config blobs and layer blobs) should be reproducible.
https://github.com/reproducible-containers/diffoci can be used for testing reproducibility.
SLSA provenance manifest blobs (enabled by default in recent buildx) are not reproducible by design, so the image index digest will not be reproducible.
Reproducing base images
Base images have to be pinned by the sha256 digest for reproduction.
A digest can be embedded in a
FROM
instruction of a Dockerfile.However, I wonder that image maintainers might not want to update Dockerfiles frequently to ensure the latest base image to be picked up during the upstream build.
In that case, we can just leave
FROM
instructions unpinned, and let reproducers to use theCOVNERT
action of source policies to dynamically replace the image identifier:https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md
The digest to be used for the
CONVERT
action is recored in the SLSA provenance.We need to update buildx CLI to automatically generate
CONVERT
actions from SLSA provenances.(See the
buildx CLI
section below)Reproducing package versions
Debian and Ubuntu
Debian and Ubuntu keep old packages on http://snapshot.debian.org and http://snapshot.ubuntu.com.
I wrote a script to rewrite
/etc/apt/sources.list
to use those snapshot servers:https://github.com/reproducible-containers/repro-sources-list.sh/blob/master/repro-sources-list.sh
The snapshot timestamp can be supplied via
$SOURCE_DATE_EPOCH
.These snapshot servers are quite slow and sometimes flaky (especially for Debian), so, probably the snapshot servers
shouldn't be used for the upstream builds.
See moby/buildkit#4669 for how to rewrite
/etc/apt/sources.list
in downstream builds.Alpine
Reproducing apk packages is still challenging, as Alpine does not have snapshot servers.
The long-term plan is to capture apk packages on building and attach them to the image as artifacts:
Reproducing file timestamps
BuildKit v0.13 supports rewriting the timestamps of the files inside image layers to use
$SOURCE_DATE_EPOCH
.https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md#source_date_epoch
Removal of logs, etc.
/var/log/alternatives.log
,/var/log/apt/history.log
, etc. will have to be removed due to non-deterministic timestamps/var/cache/ldconfig/aux-cache
has to be removed toohttps://linux.debian.kernel.narkive.com/7wfNAf7A/bug-845034-initramfs-tools-please-ensure-initrd-images-are-reproducible#post3
Reproducing file contents
Some dockerfiles will need extra work for reproducing file contents.
e.g., sorting arrays, removing randomized mktemp, ...
e.g., in case of gcc:
Buildx CLI
Buildx CLI should be updated to allow attesting reproducibility with a few commands.
Notably,
buildx build
should have a flag like--repro from=gcc@sha256@...
to import build args and base image digests from an SLSA provenance:CI
We will also need to have a CI to periodically attest reproducibility with the proposed CLI above.
The text was updated successfully, but these errors were encountered: