-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: hooks for RUN
instructions (use cases: reproducible builds, cross-compilation, malware detection, ...)
#4576
Comments
@tonistiigi @thaJeztah SGTY? |
Would these hooks run inside the container that's run as part of the |
Inside.
Yes, somewhat similar. (I noticed I accidentally edited @thaJeztah's original comment 😅 . Now reverted. ) |
My initial impression is that this is not where we want to take Dockerfiles. Dockerfiles should be self-contained and guaranteed to work. This makes the outcome of a build undefined and likely broken if the author repository and hook come from a different codebase. If the hook is useful then why doesn't Dockerfile author integrate it into their build process or make it so that it can be turned on by some config variable and tested. Even if used for some hand-written script, it will make inefficient builds because the "hooks" leak to everywhere. The foundation of a good Dockerfile is to define minimal dependencies for every point of your build. That gives speed, efficient caching and small images. From that point, if the problem is that you wish to build a Dockerfile from some external source that you can't modify but add some modifications, I would even consider some option to load a patch file over it before it is processed. At least in that case the patch can choose exactly what modifications are needed. Atm it is limited to what can be modified but once you add a modification it applies everywhere. I'd like to understand more what is the actual problem with reproducible builds and how is this solving it. We want to make it easy for image authors to make their builds reproducible, but I'm not sure how this is helping (unless we ask image author to now call If the fundamental problem is to improve code reuse and reduce need to do copy-paste to write good Dockerfiles then I'm very interested in solving that problem.
I think it is the opposite. xx shows the limitation of this approach. You can't just mask over
I don't get the reproducibility part, but history should give a representation of how the image was built. If it does not, then it is incorrect. Latest versions also create provenance attestations, that supercedes history - it is always precise and can't be modified by frontend. |
Because the upstream Dockerfile authors do not want to accept PRs that complicates The purpose of the hook is to leave those upstream Dockerfiles kept unmodified, and to eliminate necessity of negotiation toward merging such PRs.
The upstream image authors do not need to use hooks. Only downstream reproducers need to use hooks. No action is needed on the upstream image authors side, except for accepting small PRs such as
The problem is that modifying the history object breaks the OCI image config digest, and hence the manifest digest. |
If my hook proposal isn't going to be accepted, I guess I'll try to reimplement this as a custom OCI runtime and maybe a snapshotter plugin, but that would ruin the user experience for buildx, as it would incur specifying a custom |
Sorry, confusing wording. I meant you "can't modify the repo where the Dockerfile is located but still want to make some changes to the Dockerfile".
Just regular
I don't get the difference between downstream fork and downstream hook. Who is the downstream here, eg. for your official-images patches, is it some
If one build produces reproducible timestamps and another does not then they will be different and it is correct that they are different if they were built differently. If some build time artifacts should not end in the final image then multi-stage build patterns should be used for that, not mounting secret.
I don't get the "custom OCI runtime and maybe a snapshotter plugin" part. If you want to create own variants of official images then options are fork, apply patches or create external buildkit frontend. |
It is quite hard to maintain
I just expect the Docker Official Image upstream (such as
The buildx CLI may have a new flag like It is true that
I expect the Docker Official Image infra to eventually adopt
I don't expect that mounting secrets is necessary for repro builds.
No, I don't want to create my own variants of DOI. |
I agree. But that would happen with this PR as hooks would be used to work around the upstream and not called by official images pipelines.
That looks way too opinionated logic for a flag. Builds should be configured by build-args/contexts, a generic repro strategy that could work is to use provenance attestation of a previously built image and provide reproduction guarantees from that data.
Looking at some of the feedback of your PRs there looks like one of the issues seems to be that maintainers suggest that for many DOI images the reproduction timestamp that makes sense for the image should be one defined by the source artifact (eg. git commit of the upstream source, or file timestamp in targz, for example https://github.com/docker-library/golang/pull/505/files#diff-12a996ea1ea6ff196d20e1af5aaa3cc1deed6c9f547979cb19dba4bf7325a15cR76 ) rather than one provided manually with epoch Just to though out some ideas, one hacky approach for that could be:
Can we think of a better approach that covers this? Another approach could be have some known stage name that can be used to collect metadata about the build (could be used for dynamic labels/env etc. in addition to setting epoch). |
Right. The hook does not need to be called on the upstream DOI side. To sum up, what has to be done on the upstream DOI side is:
The snapshot server data cannot be retrieved from the provenance, as the upstream DOI will continue to use the upstream non-snapshot debian.org due to the performance issue and the flakiness of So, downstream reproducers will have to explicitly opt-in to
Will try to update the PRs next week to add fine-grained
I assume we are in general reluctant to add new instructions |
I think the intention would be that
If there is a good use case then it is not out of the question. We of course need to make sure that the solution is flexible and future proof before we commit to backwards compatibility. |
DOI has been already setting
Right, but I guess this new instruction can be proposed separately. Is there any other remaining blocker toward merging the hook PR? |
Defining an We do not explicitly set Regarding enabling |
@tianon Thank you for replying.
Updated the PRs to take SOURCE_DATE_EPOCH="$(find /usr/src/bash -type f -exec stat -c '%Y' {} + | sort -nr | head -n1)"
The current implementation of buildkit/exporter/containerimage/export.go Line 290 in 715276d
I think we can enable it by default when we are confident that the implementation is stable enough.
The
This issue was already fixed in: |
Updated the PRs above to completely remove |
Aside from the reproducible builds, the hooks should be also useful for injecting MITM proxy certs that are needed on enterprise networks where all the HTTPS traffics have to be decrypted and monitored. I'm hoping we can see some progress toward merging the PR: |
Is there any remaining concern? |
See docker-library/official-images issue 16044 - For Debian, `/var/log/*` is removed as they contain timestamps - For Debian, `/var/cache/ldconfig/aux-cache` is removed as they contain inode numbers, etc. - For Alpine, virtual package versions are pinned to "0" to eliminate the timestamp-based version numbers that appear in `/etc/apk/world` and `/lib/apk/db/installed` > [!NOTE] > The following topics are NOT covered by this commit: > > - To reproduce file timestamps in layers, BuildKit has to be executed with > `--output type=<TYPE>,rewrite-timestamp=true`. > Needs BuildKit v0.13 or later. > > - To reproduce the base image by the hash, reproducers may: > - modify the `FROM` instruction in Dockerfile manually > - or, use the `CONVERT` action of source policies to replace the base image. > <https://github.com/moby/buildkit/blob/v0.13.2/docs/build-repro.md> > > - To reproduce packages, see the `RUN` instruction hook proposed in > moby/buildkit#4576 Signed-off-by: Akihiro Suda <[email protected]>
I'm withdrawing this proposal and going to implement a standalone translator that consumes Dockerfile and generate a new Dockerfile: program-name-to-be-decided translate --hook=hook.json < Dockerfile > Dockerfile.new The drawback of the new approach is that it can't reproduce the OCI Image Config digest as it can't retain the I'm closing #4669 but I still want the |
See docker-library/official-images issue 16044 - For Debian, `/var/log/*` is removed as they contain timestamps - For Debian, `/var/cache/ldconfig/aux-cache` is removed as they contain inode numbers, etc. - For Alpine, virtual package versions are pinned to "0" to eliminate the timestamp-based version numbers that appear in `/etc/apk/world` and `/lib/apk/db/installed` > [!NOTE] > The following topics are NOT covered by this commit: > > - To reproduce file timestamps in layers, BuildKit has to be executed with > `--output type=<TYPE>,rewrite-timestamp=true`. > Needs BuildKit v0.13 or later. > > - To reproduce the base image by the hash, reproducers may: > - modify the `FROM` instruction in Dockerfile manually > - or, use the `CONVERT` action of source policies to replace the base image. > <https://github.com/moby/buildkit/blob/v0.13.2/docs/build-repro.md> > > - To reproduce packages, see the `RUN` instruction hook proposed in > moby/buildkit#4576 Signed-off-by: Akihiro Suda <[email protected]>
See docker-library/official-images issue 16044 - For Debian, `/var/log/*` is removed as they contain timestamps - For Debian, `/var/cache/ldconfig/aux-cache` is removed as they contain inode numbers, etc. - For Alpine, virtual package versions are pinned to "0" to eliminate the timestamp-based version numbers that appear in `/etc/apk/world` and `/lib/apk/db/installed` > [!NOTE] > The following topics are NOT covered by this commit: > > - To reproduce file timestamps in layers, BuildKit has to be executed with > `--output type=<TYPE>,rewrite-timestamp=true`. > Needs BuildKit v0.13 or later. > > - To reproduce the base image by the hash, reproducers may: > - modify the `FROM` instruction in Dockerfile manually > - or, use the `CONVERT` action of source policies to replace the base image. > <https://github.com/moby/buildkit/blob/v0.13.2/docs/build-repro.md> > > - To reproduce packages, see the `RUN` instruction hook proposed in > moby/buildkit#4576 Signed-off-by: Akihiro Suda <[email protected]>
I'd like to propose a hooking mechanism for
RUN
instructions of Dockerfile.e.g.,
buildctl build \ --frontend dockerfile.v0 \ --opt hook="$(cat hook.json)"
with
hook.json
as follows:This will let the frontend treat
RUN foo
as:RUN \ --mount=from=example.com/hook,target=/dev/.dfhook \ --mount=type=secret,source=something,target=/etc/something \ /dev/.dfhook/entrypoint foo
docker history
will still show this asRUN foo
.Note
The proposed json schema may change.
See the PR for the latest status:
RUN
instructions #4669Use cases
Reproducible builds
A hook can be used for wrapping
apt-get
command to usesnapshot.debian.org
for reproducing package versions without modifying the Dockerfile.The
/dev/.dfhook/entrypoint
script can be like this:A hook may also push/pull dpkg blobs to an OCI registry (or whatever) for efficient caching.
Cross-compilation
xx-apt
, etc. (https://github.com/tonistiigi/xx) can be reimplemented as a hook.Malware detection
A hook may use seccomp, etc. to hook the syscalls and detect malicious actions, etc.
Enterprise networking
Enterprise networks often require installing a MITM proxy cert.
This can be easily automated with a hook.
FAQs
The text was updated successfully, but these errors were encountered: