-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow controlling cache mounts storage location #1512
Comments
Just to clarify whether the scope of this proposal covers a use case I have. Go has content-addressed build and module caches defined via Have read the documentation at https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md, it does not appear possible to delegate a cache to a host directory. Therefore a buildkit Go build and module cache will likely largely duplicate the build and module caches that exists on the host. However, this proposal seems to be heading in the direction of sharing such a cache:
Under this proposal, can I confirm, would it be possible to delegate these caches to host directories? I note a couple of requirements:
Apologies if this covers old ground (covered elsewhere); I'm rather new to the buildkit approach but happened upon this issue and it sounded like exactly the thing I was after. Many thanks |
hope this feature could land asap. it will be useful for ci caching. #1673 (comment) for host directories, could be hacking by # ensure host path
$ mkdir -p /tmp/gocache
# create volume
$ docker volume create --driver local \
--opt type=none \
--opt device=/tmp/gocache \
--opt o=bind \
myvolume
$ docker run -it --volume myvolume:/go/pkg/mod busybox touch /go/pkg/mod/test.txt
# test.txt will be created under host dir /tmp/gocache/
# maybe work
$ docker buildx build --cache-mount id=gocache,type=volume,volume=myvolume . @tonistiigi should we control the mount target too? |
Does the proposed solution cater to a scenario of a buildserver which uses docker-in-docker? I'm not sure tbh. |
Any news on this? This shouldn’t be a really big change, right? |
This would solve my life. |
Would love to see this. Would be a huge win for speeding up builds on CI |
This would be brilliant for build systems like Gradle and Maven building on e.g. GitHub Actions. They typically download all their dependencies to a cache dir. It's hard to benefit from layer caching - dependencies can be expressed in multiple files in a nested folder structure, so to avoid a maintenance nightmare it's generally necessary for the Dockerfile to do a I really want to use the same Dockerfile to build locally and on CI, which I think means I don't want to use the strategy suggested in docker/buildx#244 (comment) of loading & exporting the cache to named locations as commands in the Dockerfile - it might work in CI but would be much less efficient building locally, as well as adding a lot of noise to the Dockerfile. I'm currently caching the whole of the I'm guessing it wouldn't be a great place for a buildx AND go newbie to start contributing, though... |
I would love to see
be exported as a own layer/blob in the --cache-to and --cache-from arguments like Perhaps even better if it got it's own argument like |
I have hacked together a solution that seems to work for including cache mounts in the GitHub Because this cache will grow on every run, we use |
It would be nice to allow setting the |
hello, is that feature currently planned or being worked on? |
I think the design is accepted. Not being worked on atm. |
@tonistiigi can you clarify what you mean by this? This sounds like it is only considering use cases such as caching dependency installation where downloading the dependencies from a repository would take the same time to download the cache itself. What about situations where regenerating the cache takes significantly longer than importing it (e.g. code building)? The desire is to re-use parts of the cache without erasing it entirely, something that the remote instruction cache does not support. |
@chris13524 That's what "usually" means there, "not always". Even with code building cache you might have mixed results. Importing and exporting cache is not free. In addition you need something that can clean up the older cache from your mount or your cache just infinitely grows with every invocation. You can try this today by importing/exporting cache out of the builder into some persistent storage(or additional image) with just an additional built request. docker/buildx#244 (comment) |
nice solution, but |
Just wanted to highlight few things:
I hope that one day this feature would be implemented. :) |
By the way in this issue it was proposed to create a volume to be able to mount a host dir as cache target. But maybe it would require less work to just add new cache mode to current caching mechanism that would to just also export/import the cache volumes as part of layers cache? |
What's the best way to do this in 2024? |
@anthonyalayo - in 2024, this is still the way (that we know) how to do this. This is a heavily modified version of what @speller provided further up in the comments, I think back in 2022? 😂 In this implementation/script, we are using a single AWS account to store our "build cache" in an S3 bucket, with logical naming of nested folders (aka keys) inside S3. In this particular script, we pull down a "base" NodeJS docker image that we build in a different pipeline...then, on top of the base NodeJS image, we build our SPA JS app with this script and its accompanying Dockerfile. For our use-case, we also push a copy of our images into each of our different AWS accounts (yes, I know...there are better ways, but we have "reasons" for this)...so that's why there are tuples of account + account ID that we loop through towards the end of the script. The key takeaways from the script should be how we mount, process, push, and pull the cache. The rest is just gymnastics and sugar, based on your build requirements/process + your needs around where to push and pull from. cc: @tsarlewey (sorry I didn't see your question when you tagged me last year!) #!/bin/bash
##################################################################################################
# Build script for all Dockerfiles in this repository
#
# This script will build all Docker containers defined by the Dockerfiles in this repository.
# It enables Docker BuildKit when building containers and requires access to an S3 "artifact
# repository" in some aws account you have
#
# Dependencies:
# * docker
# * aws credentials (should be located at ~/.aws/credentials)
#
# Usage
# $> AWS_PROFILE=<your-aws-profile> ./build.sh
#
##################################################################################################
set -e -o pipefail
export DOCKER_BUILDKIT=1
export BUILDKIT_INLINE_CACHE=1
# Recitals
IMAGE_NAME="${1}"
ACCOUNTS=("some-account,<ACCT_ID>" "some-account,<ACCT_ID>" "some-account,<ACCT_ID>" "some-account,<ACCT_ID>")
BASE_IMAGES_ACCT_NAME=${BASE_IMAGES_ACCT_NAME:-"some-account-name"}
BASE_IMAGES_ACCT_ID=${BASE_IMAGES_ACCT_ID:-"some_account_id"}
BITBUCKET_REPO_SLUG=${BITBUCKET_REPO_SLUG:-"some_repo_name_slug"}
BITBUCKET_BRANCH=${BITBUCKET_BRANCH:-"some_default_branch"}
DOCKER_TAG=${DOCKER_TAG:-"latest"}
DOCKER_PUSH=${DOCKER_PUSH:-"true"}
DOCKER_REPO=${DOCKER_REPO:-"SOME_ID.dkr.ecr.us-west-2.amazonaws.com"}
REPO_CACHE_NAME=${REPO_CACHE_NAME:-"repo-cache"}
S3_BUILDKIT_CACHE=${S3_BUILDKIT_CACHE:-"ops-buildkit-cache"}
MEM_LIMIT=${MEM_LIMIT:-"6144"}
DOCKER_MEM_LIMIT=${DOCKER_MEM_LIMIT:-"6144"}
##################################################################################################
# Functions
##################################################################################################
check_aws_profiles() {
if [[ -z "${AWS_PROFILE_BUILDER}" ]]; then
echo "🛑 This script expects the AWS_PROFILE environment variable to be set"
exit 1
else
echo "Found AWS_PROFILE --> ${AWS_PROFILE_BUILDER}"
fi
}
fetch_ecr_token() {
local account_name="${1}"
local account_id="${2}"
echo "!! Fetching valid ECR login for ${account_name} with ID: ${account_id} !!"
aws ecr get-login-password --profile "${account_name}" | docker login --username AWS --password-stdin "${account_id}.dkr.ecr.us-west-2.amazonaws.com"
}
update_git_submodules() {
echo "!! Updating Submodules !!"
git submodule update --init --recursive
}
construct_docker_tag() {
# If not in a branch, get the tag
local tag
tag=$(git describe --tags --exact-match 2>/dev/null)
# If a tag is found, use it as the Docker tag
if [[ -n "${tag}" ]]; then
# Replace '.' with '-' and remove any 'v' prefix
tag=${tag//./-}
tag=${tag//v/}
# Grab the short commit hash
local commit_hash
commit_hash=$(git rev-parse --short HEAD)
# Construct the Docker tag
local docker_tag="${tag}-${commit_hash}"
# Output the Docker tag
echo "${docker_tag}"
else
# Fallback to the existing logic for branches
local branch
branch=$(git symbolic-ref --short HEAD)
# Replace '/' with '-' and sanitize branch name
branch=${branch//\//-}
branch=$(echo "${branch}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
# Grab the short commit hash
local commit_hash
commit_hash=$(git rev-parse --short HEAD)
# Construct the Docker tag
local docker_tag="${branch}-${commit_hash}"
# Output the Docker tag
echo "${docker_tag}"
fi
}
DOCKER_TAG=$(construct_docker_tag)
export DOCKER_TAG
build_to_local() {
docker build "." \
--file "Dockerfile" \
--progress plain \
--pull \
--memory "${DOCKER_MEM_LIMIT}" \
--platform linux/amd64 \
--cache-from "${DOCKER_REPO}/${IMAGE_NAME}" \
--build-arg AWS_PROFILE_BUILDER="${AWS_PROFILE_BUILDER}" \
--build-arg MEM_LIMIT="${MEM_LIMIT}" \
--secret id=aws-config,src="${HOME}/.aws/config" \
--secret id=aws-creds,src="${HOME}/.aws/credentials" \
--tag "${DOCKER_REPO}/${IMAGE_NAME}:${DOCKER_TAG}" \
--tag "${IMAGE_NAME}:${DOCKER_TAG}" \
--tag "${IMAGE_NAME}:latest" \
--tag "${IMAGE_NAME}"
echo "🙌 BUILT ${image_name}:${2} LOCALLY!"
}
docker_build_tagged_stage() {
local target_stage
target_stage="${1}"
echo "target_stage: ${target_stage}"
local step_tag
step_tag="${target_stage}-${DOCKER_TAG}"
echo "step_tag: ${step_tag}"
echo "tag 1: ${DOCKER_REPO}/${IMAGE_NAME}:${step_tag}"
echo "tag 2: ${IMAGE_NAME}:${step_tag}"
echo "tag 3: ${IMAGE_NAME}:${target_stage}"
echo "[*] Building ${target_stage} stage with tag ${step_tag}"
docker build "." \
--file "Dockerfile" \
--target "${target_stage}" \
--platform linux/amd64 \
--progress plain \
--memory "${DOCKER_MEM_LIMIT}" \
--cache-from "${DOCKER_REPO}/${IMAGE_NAME}" \
--build-arg AWS_PROFILE_BUILDER="${AWS_PROFILE_BUILDER}" \
--secret id=aws-config,src="${HOME}/.aws/config" \
--secret id=aws-creds,src="${HOME}/.aws/credentials" \
--tag "${DOCKER_REPO}/${IMAGE_NAME}:${step_tag}" \
--tag "${IMAGE_NAME}:${step_tag}" \
--tag "${IMAGE_NAME}:${target_stage}"
# --load
echo "🙌 Built ${IMAGE_NAME}:${step_tag} to LOCAL!"
}
tag_and_push_to_ecr() {
local image_name="${1}"
local image_tag="${2}"
local ecr_repo="${3}.dkr.ecr.us-west-2.amazonaws.com"
# Tag and push to ECR
docker tag "${image_name}:${image_tag}" "${ecr_repo}/${image_name}:${image_tag}"
docker tag "${image_name}:${image_tag}" "${ecr_repo}/${image_name}:latest"
docker push --quiet "${ecr_repo}/${image_name}:${image_tag}"
docker push --quiet "${ecr_repo}/${image_name}:latest"
echo "🙌 Published ${image_name}:${image_tag} to ECR!"
}
# Copy the updated cache back to the host machine
process_cache() {
###############################
# Define local variables
###############################
local build_stage_name
build_stage_name="${1}"
local cache_dir
cache_dir="${2}"
local cache_container_name
cache_container_name="${build_stage_name}-container"
local cache_container_image
cache_container_image="${IMAGE_NAME}:${build_stage_name}-$DOCKER_TAG"
###############################
# Processing steps
###############################
# remove any existing cache from the host machine
echo "!! Removing existing local cache !!"
rm -rf "opt/$cache_dir/*"
docker_build_tagged_stage "$build_stage_name"
# remove previous temporary container with the same name if any
# return true to avoid failing the script if the container does not exist
docker rm -f "$cache_container_name" || true
# create a (stopped) temporary container from the tagged image containing the cache.
docker create -ti --name "$cache_container_name" "$cache_container_image"
# copy files from the container to the host
echo "!! Extracting latest cache !!"
docker cp -L "$cache_container_name:/tmp/$cache_dir" - | tar -x -m -C opt
# remove the temporary container
docker rm -f "$cache_container_name"
}
push_to_docker_repo() {
docker push "${DOCKER_REPO}/${IMAGE_NAME}:${DOCKER_TAG}"
echo "🙌 Published ${IMAGE_NAME}:${DOCKER_TAG} to ECR!"
}
pull_cache_from_s3() {
local cache_type
cache_type="${1}"
local cache_dir
cache_dir="${2}"
echo "!! Creating host-side directory !!"
mkdir -p "opt/${cache_dir}"
echo "!! Pulling cache from S3 !!"
AWS_PROFILE=${AWS_PROFILE_BUILDER} s5cmd --numworkers 512 cp \
"s3://${S3_BUILDKIT_CACHE}/${BITBUCKET_REPO_SLUG}/${cache_type}/${BITBUCKET_BRANCH}/*" \
"opt/${cache_dir}/" \
> /dev/null 2>&1 || true
echo "!! Done syncing ${cache_type} FROM S3 !!"
}
push_cache_to_s3() {
local cache_type
cache_type="${1}"
local cache_dir
cache_dir="${2}"
AWS_PROFILE=${AWS_PROFILE_BUILDER} s5cmd --numworkers 512 cp \
"opt/${cache_dir}/" \
"s3://${S3_BUILDKIT_CACHE}/${BITBUCKET_REPO_SLUG}/${cache_type}/${BITBUCKET_BRANCH}/" \
> /dev/null 2>&1 || true
echo "!! Done syncing ${cache_type} TO S3 !!"
}
##################################################################################################
# MAIN
##################################################################################################
main() {
###############################
# Build Setup
###############################
# construct docker tag
DOCKER_TAG=$(construct_docker_tag)
echo "🚀 Docker tag: ${DOCKER_TAG}"
# Check for all required aws profiles
check_aws_profiles
# Fetch ECR token from base DEV account for BASE nodeJS image
fetch_ecr_token "${BASE_IMAGES_ACCT_NAME}" "${BASE_IMAGES_ACCT_ID}"
# Update git submodules
echo "!! Updating Submodules !!"
update_git_submodules
echo "!! Syncing cache FROM S3 !!"
pull_cache_from_s3 "${REPO_CACHE_NAME}" "cache" &
# pull_cache_from_s3 "ng-cache" "ng-cache" &
wait
###############################
# Build Steps
###############################
echo "🛠️ Building linux/amd64 images for: ${IMAGE_NAME}:${DOCKER_TAG} ..."
###############################
# Prepare Caches
###############################
echo "[*] Preparing caches..."
docker_build_tagged_stage "spa-cache-prepare"
###############################
# Build Multi-Stage Image
###############################
echo "[*] Executing build"
build_to_local "${IMAGE_NAME}" "${DOCKER_TAG}"
###############################
# Build & Process Caches
###############################
echo "[*] Building caches..."
process_cache "spa-cache" "cache"
echo "!! Syncing cache TO S3 !!"
push_cache_to_s3 "${REPO_CACHE_NAME}" "cache"
################################################
# Push Final Image to All/Multiple AWS Accounts
################################################
for tuple in "${ACCOUNTS[@]}"; do
IFS=',' read -ra elements <<< "${tuple}"
account_name="${elements[0]}"
account_id="${elements[1]}"
# Fetch ECR token
fetch_ecr_token "${account_name}" "${account_id}"
echo "+ Executing ECR tag & push"
tag_and_push_to_ecr "${IMAGE_NAME}" "${DOCKER_TAG}" "${account_id}"
echo "🙌 Done Pushing linux/amd64 images for: ${IMAGE_NAME}:${DOCKER_TAG}" to "${account_name}" with account id "${account_id}"
done
###############################
# Post-Build Steps
###############################
CACHE_ITEM_COUNT=$(find opt/cache -type f | wc -l)
CACHE_SIZE=$(du -sh opt/cache | awk '{print $1}')
echo "[CACHE] Resulting build-time cache contains ${CACHE_ITEM_COUNT} files"
echo "[CACHE] Total size of all files in extracted cache: ${CACHE_SIZE}"
echo ""
echo ""
echo "🙌 Done building linux/amd64 images for: ${IMAGE_NAME}..."
}
###############################
# Run Main
###############################
main |
Thanks for the big offering @armenr , we all appreciate it. It's a bit wild that this has been desired for almost 4 years now and we still need to resort to things like this. Does anyone on the thread know a maintainer that can chime in? |
Cache-dance repo is linked above and looks pretty simple to use. Even if this was builtin to buildkit, you still need something external that would load and save to your storage, so this proposal doesn't simplify that use case. |
@tonistiigi if it was built in to builtkit, a user could specify |
Yeah @tonistiigi - the cache-dance repo looks pretty straightforward too. I shared our build script as well, since we (sadly) don't have the benfit or luxury of hosting on GitHub and using GitHub actions. Instead, we're stuck in icky BitBucket Pipeline-land, and we needed something that would run on both our local machines as well as in our BitBucket pipelines. I'm sure ours isn't ideal, but we tried to balance robustness, completeness, and simplicity. |
Downside is that one must be very clever with multistages so that cache is NOT present in final image layers, whereas the cache mounts can be leveraged on single-stage images (and interesting features such as But yes, I think a workaround with |
GitHub has a very effective cache action that can be used to cache the exported volumes, I agree having this in build kit the way proposed would be life saving for many. For perspective I'm currently using a local cache type and it works for layer caching but 70-80% of time amd compute on our build is just downloading pip and npm dependencies so without the ability to cache the Mount volumes we get nearly zero of the benefits. My org won't let us use any GitHub action also so even if there are forks I might not be able to to use them. Is there a reason why this shouldn't be implemented in buildkit? |
** Changes ** Docker does not natively support exporting cache mounts, or saving it to some external cache storage like the GHA cache. This commit uses a workaround described in [1] where the content in the cache mount is moved into a dumb container and then exported through its file system. To load recovered cache into the cache mount, another dumb container is built and its build process copies recovered cache into the cache mount. This is very much a hack and shouldn't be necessary once docker natively supported by buildx. ** Justification for not caching more build steps ** dnf, apt, and pip can theoretically also be cached this way. However, if the list of dependencies have not been changed, the entire docker layer would be a cache hit and dnf/apt caches would not be beneficial. Considering that dependency changes should be rare, not caching them seems wise. [1] moby/buildkit#1512
Any progress on this? I have a similar use-case to the first comment, and having the |
bump being able to set cache-to/cache-from location would have an enormous impact overall. |
I'd also really like to see this. In case it helps anyone, here's an example of instead using a multi-stage build, and the new I don't love including my cache in the build context, but it seems to work alright. Before it was possible to https://gist.github.com/mnahkies/fbc11d747d7b875dcc1bbb304c21c349 |
For my use case, I'd like to share the apt download cache that the buildkit creates when baking an image - with the apt cache I mount to my dev containers, so that I can easily reinstall debians in any dev containers while offline from the internet on my laptop, regardless if I previously downloaded them manually, or downloaded them coincidentally when building invalidated layers from docker's instruction build cache prior. E.g: # Dockerfile
...
# Edit apt config for caching
RUN mv /etc/apt/apt.conf.d/docker-clean /etc/apt/ && \
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
> /etc/apt/apt.conf.d/keep-cache && \
# Given apt repo snapshots, just cache apt update once
apt-get update
# Busted docker instruction build cache here
ARG BUST_CACHE_NONCE
# install bootstrap tools
RUN --mount=type=cache,sharing=locked,target=/var/cache/apt \
apt-get install -y --no-install-recommends \
gettext-base \
python3-rosdep \
python3-vcstool \
wget \
zstd
// .devcontainer/devcontainer.json
...
"mounts": [
{
// Cache apt downloads
"source": "apt-cache",
"target": "/var/cache/apt",
"type": "volume"
}
], If we could direct buildkit to use a named volume or host bind mount at build time, then caches can be shared. |
I see this issue has being added to v0.future milestone by @thompson-shaun . Hopefully this can be prioritize and move to v.0.17 milestone given the huge interest in this feature from the community. Also in my case the lack of this functionality is making the docker builds much slower in CI then in local environments... |
related moby/moby#14080
Allowing exporting contents of the
type=cache
mounts have been asked many times in different repositories and slack.#1474
docker/buildx#244
Regular remote instruction cache does not work for cache mounts that are not tracked by cache key and are just a location on disk that can be shared by multiple builds.
Currently, the best approach to maintain this cache between nodes is to do it as part of the build. docker/buildx#244 (comment)
I don't think we should try to combine cache mounts with the remote cache backends. Usually, cache mounts are for throwaway cache and restoring it would take a similar time to just recreating it.
What we could do is to allow users to control where the cache location is on disk, in case it is not on top of the snapshots.
We can introduce a cache mount backend concept behind a go interface that different implementation can implement.
Eg. for a Dockerfile like
you could invoke a build with
In that case, the cache could use a Docker volume as a backend. I guess good drivers would be volume in Docker and bind mount from the host for non-docker. If no
--cache-mount
is specified, the usual snapshotter-based location is used.From the security perspective, BuildKit API is considered secure by default for the host, so I think this would require daemon side configuration to enable what paths can be bound.
Another complexity is buildx container driver as we can't easily add new mounts to a running container atm. Possible solutions are to force these paths to be set on
buildx create
or do some hackery with mount propagation.The text was updated successfully, but these errors were encountered: