[RFC] Allow defining mounts for whole stage #1209

thaJeztah · 2019-10-15T15:32:26Z

Just a quick thought, not a fully written/designed proposal.

The experimental syntax currently allows using RUN --mount, which is great. Mounts allow using files as part of your build, without them ending up in (intermediate) image layers, while still being able to use the build cache.

However, having to specify the mount for each RUN can lead to repetition in some Dockerfiles, or make a Dockerfile too complicated.

Taken the following example from the moby Dockerfile;

FROM base AS registry
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
        set -x \
        && export GOPATH="$(mktemp -d)" \
        && git clone https://github.com/docker/distribution.git "$GOPATH/src/github.com/docker/distribution" \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry \
        && case $(dpkg --print-architecture) in \
               amd64|ppc64*|s390x) \
               (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT_SCHEMA1"); \
               GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH"; \
                   go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry; \
                ;; \
           esac \
        && rm -rf "$GOPATH"

While the above isn't a "beauty", there's some things to notice here;

mounts are used to preserve the go build cache, and go mod cache
a "temp" directory is created to clone the git repository (something that could likely be replaced by --mount=type=cache or --mount=type=tmpfs)
two separate binaries are built from the same source, but different commits (given; not a common scenario).
all the steps are combined in a single RUN, so that the cloned repository can be cleaned up afterwards

Simplifying the example (taking the architecture check out), and using --mount=type=cache still gives me;

FROM base AS registry
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        set -x \
        && git clone https://github.com/docker/distribution.git "/go/src/github.com/docker/distribution" \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry \
        && (cd "$GOPATH/src/github.com/docker/distribution" && git checkout -q "$REGISTRY_COMMIT_SCHEMA1") \
        && GOPATH="$GOPATH/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
           go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

The above could be simplified a bit further (the example above may not be the best), but ideally, I'd be able to split the code above into separate RUN lines, not having to && chain all script steps in a single RUN.

However, doing so requires me to repeat all the --mount options for each RUN, which makes it quite cluttered and repetitive;

FROM base AS registry
WORKDIR /go/src/github.com/docker/distribution
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
RUN --mount=type=cache,target=/go/src/github.com/docker/distribution \
        git clone https://github.com/docker/distribution.git . \
        && git checkout -q "$REGISTRY_COMMIT"

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
        go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry

ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN --mount=type=cache,target=/go/src/github.com/docker/distribution \
        git checkout -q "$REGISTRY_COMMIT"

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution \
        GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
        go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

Instead, perhaps it's possible to define mounts for a whole build-stage. Every RUN step in the stage would inherit those mounts;

FROM --mount=type=cache,target=/root/.cache/go-build \
     --mount=type=cache,target=/go/pkg/mod \
     --mount=type=cache,target=/go/src/github.com/docker/distribution \
     base AS registry

WORKDIR /go/src/github.com/docker/distribution
ARG REGISTRY_COMMIT=ec87e9b6971d831f0eff752ddb54fb64693e51cd
RUN git clone https://github.com/docker/distribution.git . \
 && git checkout -q "$REGISTRY_COMMIT"

RUN GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
    go build -buildmode=pie -o /build/registry-v2 github.com/docker/distribution/cmd/registry

ARG REGISTRY_COMMIT_SCHEMA1=47a064d4195a9b56133891bbb13620c3ac83a827
RUN git checkout -q "$REGISTRY_COMMIT"
RUN GOPATH="/go/src/github.com/docker/distribution/Godeps/_workspace:$GOPATH" \
    go build -buildmode=pie -o /build/registry-v2-schema1 github.com/docker/distribution/cmd/registry

Note that requiring the options to be directly after FROM is kinda ugly; perhaps an alternative syntax, allowing options to be passed at the end would work;

FROM base AS registry \
    --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/go/src/github.com/docker/distribution

Or, instead of putting this option on FROM, having a MOUNT keyword, or a STAGE keyword (to define "stage scoped" options) could be a solution;

FROM base AS registry
STAGE --mount=type=cache,target=/root/.cache/go-build \
      --mount=type=cache,target=/go/pkg/mod \
      --mount=type=cache,target=/go/src/github.com/docker/distribution

Inheritance

Haven't given this one much thought; should another stage using FROM stage-with-mounts inherit those mounts? I think it'd make sense;

allows defining a common base stage that has all options set
could allow switching between different options (build with, or without mounts)
downside: could this be an issue when building stages in parallel? (don't know the technical limitations) Although, it's likely not any different than defining the same options for each RUN

Note that this examples above focus on "caching", but secrets (--mount=type=secret) could greatly benefit from this; define the secret in your base image, and all stages will have access to the secret(s) that are needed during build.

The text was updated successfully, but these errors were encountered:

thaJeztah · 2019-10-15T15:33:14Z

/cc @tonistiigi @tiborvass @cpuguy83

FernandoMiguel · 2019-10-15T15:55:24Z

a few nits,

git clone will fail if you already clone it once, so having a cache, the 2nd run would fail, but the 1st works.

FROM --mount= design hides the mount point from the RUN stage, making it hard to read. I would prefer to have some sort of indicator we are mounting there

thaJeztah · 2019-10-15T16:07:46Z

git clone will fail if you already clone it once, so having a cache, the 2nd run would fail, but the 1st works.

Ah; yes, haven't tried any of the code; I guess it works for illustration purposes 😂

FROM --mount= design hides the mount point from the RUN stage, making it hard to read. I would prefer to have some sort of indicator we are mounting there

Agreed; that concern played in my head; on the other hand, it's no different from defining an ENV higher up in the Dockerfile (same stage, or even a FROM <stage that has ENV>. Not sure what's best to make it more visible (suggestions definitely welcome; it was just a quick braindump/write-up)

tonistiigi · 2019-10-15T18:13:36Z

Defining mounts for the full stage is potentially wasteful. Using mounts for processes that don't actually need it makes instructions cached by wrong dependencies and makes cache mounts locking more complex.

Eg. when doing FROM golang, it doesn't mean that when RUN apt-get update is run on that stage, it should use the go specific cache mounts. Go specific cache mounts should only be used when go binary is invoked. It would also be cumbersome and sometimes impossible to split stages more so only go commands are left in another stage. For the go mounts itself, they also shouldn't be needed to be defined in all stages that use go but in golang image itself.

So instead, I think we should approach it by defining reusable code that can be invoked in RUN. Instead of running go binary directly, we want to run a go function that, in addition to invoking the binary, can use the correct mounts. These "functions" can be saved to a stage and maybe to an image so they are automatically inherited on FROM.

A quick example that I haven't thought through:

FROM golang:latest AS golang
DEFINE go --mount=... go
// DEFINE --mount=... go AS go # alternative

FROM golang
RUN apt-get update
RUN @go build .
RUN --mount=... @go build

I've also seen some proposals for just defining flags (eg. RUNFLAG --mount=) but issue with them is removing the flags after they are not needed complicates the syntax, so I think approaching this by the binary invoked on run is a better approach.

IMPORT/EXPORT Proposal: add IMPORT/EXPORT commands to Dockerfile moby#32100
possible nested builds variants

FernandoMiguel · 2019-10-15T21:54:59Z

@tonistiigi I really like the DEFINE approach.
But how does it cache across multi stage and multi builds?

tonistiigi · 2019-10-15T23:47:51Z

@FernandoMiguel This is all just a syntax in Dockerfile, no changes in how the builder works/caches internally.

thaJeztah · 2020-05-29T10:05:49Z

@tonistiigi this topic came up in a chat I had with @nebuk89. Wondering; could these definitions be somehow distributed? (so that "pre-defined" languages / recipes could be pushed to docker hub, and other users could make use of them for a simplified workflow?

thaJeztah added area/dockerfile kind/enhancement labels Oct 15, 2019

sudo-bmitch mentioned this issue Feb 3, 2022

[RFC] Allow mounts to be defined for all steps on docker build CLI #2594

Open

jedevc mentioned this issue Nov 22, 2022

interpolation doesn't work in RUN #3301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Allow defining mounts for whole stage #1209

[RFC] Allow defining mounts for whole stage #1209

thaJeztah commented Oct 15, 2019 •

edited

Loading

thaJeztah commented Oct 15, 2019

FernandoMiguel commented Oct 15, 2019

thaJeztah commented Oct 15, 2019

tonistiigi commented Oct 15, 2019

FernandoMiguel commented Oct 15, 2019

tonistiigi commented Oct 15, 2019

thaJeztah commented May 29, 2020

[RFC] Allow defining mounts for whole stage #1209

[RFC] Allow defining mounts for whole stage #1209

Comments

thaJeztah commented Oct 15, 2019 • edited Loading

Inheritance

thaJeztah commented Oct 15, 2019

FernandoMiguel commented Oct 15, 2019

thaJeztah commented Oct 15, 2019

tonistiigi commented Oct 15, 2019

FernandoMiguel commented Oct 15, 2019

tonistiigi commented Oct 15, 2019

thaJeztah commented May 29, 2020

thaJeztah commented Oct 15, 2019 •

edited

Loading