Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Allow defining mounts for whole stage #1209

Open
thaJeztah opened this issue Oct 15, 2019 · 7 comments
Open

[RFC] Allow defining mounts for whole stage #1209

thaJeztah opened this issue Oct 15, 2019 · 7 comments

Comments

@thaJeztah
Copy link
Member

thaJeztah commented Oct 15, 2019

Just a quick thought, not a fully written/designed proposal.

The experimental syntax currently allows using RUN --mount, which is great. Mounts allow using files as part of your build, without them ending up in (intermediate) image layers, while still being able to use the build cache.

However, having to specify the mount for each RUN can lead to repetition in some Dockerfiles, or make a Dockerfile too complicated.

Taken the following example from the moby Dockerfile;

FROM base AS registry
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
        set -x \
        && export GOPATH="$(mktemp -d)" \
        && git clone https://github.com/docker/distribution.git "$GOPATH/src/github.com/docker/distribution" \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry \
        && case $(dpkg --print-architecture) in \
               amd64|ppc64*|s390x) \
               (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT_SCHEMA1"); \
               GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH"; \
                   go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry; \
                ;; \
           esac \
        && rm -rf "$GOPATH"

While the above isn't a "beauty", there's some things to notice here;

  • mounts are used to preserve the go build cache, and go mod cache
  • a "temp" directory is created to clone the git repository (something that could likely be replaced by --mount=type=cache or --mount=type=tmpfs)
  • two separate binaries are built from the same source, but different commits (given; not a common scenario).
  • all the steps are combined in a single RUN, so that the cloned repository can be cleaned up afterwards

Simplifying the example (taking the architecture check out), and using --mount=type=cache still gives me;

FROM base AS registry
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        set -x \
        && git clone https://github.com/docker/distribution.git "/go/src/github.com/docker/distribution" \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT_SCHEMA1") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

The above could be simplified a bit further (the example above may not be the best), but ideally, I'd be able to split the code above into separate RUN lines, not having to && chain all script steps in a single RUN.

However, doing so requires me to repeat all the --mount options for each RUN, which makes it quite cluttered and repetitive;

FROM base AS registry
WORKDIR /go/src/github.com/docker/distribution
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
RUN --mount=type=cache,target=/go/src/github.com/docker/distribution \
        git clone https://github.com/docker/distribution.git . \
        && git checkout -q "$REGISTRY_COMMIT"

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
        go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry

ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/go/src/github.com/docker/distribution \
        git checkout -q "$REGISTRY_COMMIT"

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
        go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

Instead, perhaps it's possible to define mounts for a whole build-stage. Every RUN step in the stage would inherit those mounts;

FROM --mount=type=cache,target=/root/.cache/go-build \
     --mount=type=cache,target=/go/pkg/mod \
     --mount=type=cache,target=/go/src/github.com/docker/distribution \
     base AS registry

WORKDIR /go/src/github.com/docker/distribution
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
RUN git clone https://github.com/docker/distribution.git . \
 && git checkout -q "$REGISTRY_COMMIT"

RUN GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
    go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry

ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN git checkout -q "$REGISTRY_COMMIT"
RUN GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
    go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

Note that requiring the options to be directly after FROM is kinda ugly; perhaps an alternative syntax, allowing options to be passed at the end would work;

FROM base AS registry \
    --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution

Or, instead of putting this option on FROM, having a MOUNT keyword, or a STAGE keyword (to define "stage scoped" options) could be a solution;

FROM base AS registry
STAGE --mount=type=cache,target=/root/.cache/go-build \
      --mount=type=cache,target=/go/pkg/mod \
      --mount=type=cache,target=/go/src/github.com/docker/distribution

Inheritance

Haven't given this one much thought; should another stage using FROM stage-with-mounts inherit those mounts? I think it'd make sense;

  • allows defining a common base stage that has all options set
  • could allow switching between different options (build with, or without mounts)
  • downside: could this be an issue when building stages in parallel? (don't know the technical limitations) Although, it's likely not any different than defining the same options for each RUN

Note that this examples above focus on "caching", but secrets (--mount=type=secret) could greatly benefit from this; define the secret in your base image, and all stages will have access to the secret(s) that are needed during build.

@thaJeztah
Copy link
Member Author

/cc @tonistiigi @tiborvass @cpuguy83

@FernandoMiguel
Copy link
Collaborator

a few nits,

git clone will fail if you already clone it once, so having a cache, the 2nd run would fail, but the 1st works.

FROM --mount= design hides the mount point from the RUN stage, making it hard to read. I would prefer to have some sort of indicator we are mounting there

@thaJeztah
Copy link
Member Author

git clone will fail if you already clone it once, so having a cache, the 2nd run would fail, but the 1st works.

Ah; yes, haven't tried any of the code; I guess it works for illustration purposes 😂

FROM --mount= design hides the mount point from the RUN stage, making it hard to read. I would prefer to have some sort of indicator we are mounting there

Agreed; that concern played in my head; on the other hand, it's no different from defining an ENV higher up in the Dockerfile (same stage, or even a FROM <stage that has ENV>. Not sure what's best to make it more visible (suggestions definitely welcome; it was just a quick braindump/write-up)

@tonistiigi
Copy link
Member

Defining mounts for the full stage is potentially wasteful. Using mounts for processes that don't actually need it makes instructions cached by wrong dependencies and makes cache mounts locking more complex.

Eg. when doing FROM golang, it doesn't mean that when RUN apt-get update is run on that stage, it should use the go specific cache mounts. Go specific cache mounts should only be used when go binary is invoked. It would also be cumbersome and sometimes impossible to split stages more so only go commands are left in another stage. For the go mounts itself, they also shouldn't be needed to be defined in all stages that use go but in golang image itself.

So instead, I think we should approach it by defining reusable code that can be invoked in RUN. Instead of running go binary directly, we want to run a go function that, in addition to invoking the binary, can use the correct mounts. These "functions" can be saved to a stage and maybe to an image so they are automatically inherited on FROM.

A quick example that I haven't thought through:

FROM golang:latest AS golang
DEFINE go --mount=... go
// DEFINE --mount=... go AS go # alternative

FROM golang
RUN apt-get update
RUN @go build .
RUN --mount=... @go build

I've also seen some proposals for just defining flags (eg. RUNFLAG --mount=) but issue with them is removing the flags after they are not needed complicates the syntax, so I think approaching this by the binary invoked on run is a better approach.

Related:

@FernandoMiguel
Copy link
Collaborator

@tonistiigi I really like the DEFINE approach.
But how does it cache across multi stage and multi builds?

@tonistiigi
Copy link
Member

@FernandoMiguel This is all just a syntax in Dockerfile, no changes in how the builder works/caches internally.

@thaJeztah
Copy link
Member Author

@tonistiigi this topic came up in a chat I had with @nebuk89. Wondering; could these definitions be somehow distributed? (so that "pre-defined" languages / recipes could be pushed to docker hub, and other users could make use of them for a simplified workflow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants