Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 77 additions & 36 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,45 +1,86 @@
FROM --platform=$BUILDPLATFORM ubuntu as build
# syntax=docker/dockerfile:1

FROM --platform=${BUILDPLATFORM} ubuntu:24.04 AS builder-base
# Configure the shell to exit early if any command fails, or when referencing unset variables.
# Additionally `-x` outputs each command run, this is helpful for troubleshooting failures.
SHELL ["/bin/bash", "-eux", "-o", "pipefail", "-c"]

RUN \
--mount=target=/var/lib/apt/lists,type=cache,sharing=locked \
--mount=target=/var/cache/apt,type=cache,sharing=locked \
<<HEREDOC
# https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#example-cache-apt-packages
# https://stackoverflow.com/questions/66808788/docker-can-you-cache-apt-get-package-installs#comment135104889_72851168
rm -f /etc/apt/apt.conf.d/docker-clean
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache

apt update && apt install -y --no-install-recommends \
build-essential \
curl \
python3-venv \
cmake
HEREDOC
Comment on lines +8 to +22
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache mount here (and workaround for docker-clean in this base image used), doesn't bring any savings to disk usage of this layer. It does however keep the cache external instead of the automatic clean, which is useful when the layer is invalidated, as it speeds up the build time.


ENV HOME="/root"
ENV PATH="$HOME/.venv/bin:$PATH"
WORKDIR $HOME

RUN apt update \
&& apt install -y --no-install-recommends \
build-essential \
curl \
python3-venv \
cmake \
&& apt clean \
&& rm -rf /var/lib/apt/lists/*

# Setup zig as cross compiling linker
RUN python3 -m venv $HOME/.venv
RUN .venv/bin/pip install cargo-zigbuild
ENV PATH="$HOME/.venv/bin:$PATH"
# Setup zig as cross compiling linker:
RUN <<HEREDOC
python3 -m venv $HOME/.venv
.venv/bin/pip install --no-cache-dir cargo-zigbuild
HEREDOC

# Install rust
ARG TARGETPLATFORM
RUN case "$TARGETPLATFORM" in \
"linux/arm64") echo "aarch64-unknown-linux-musl" > rust_target.txt ;; \
"linux/amd64") echo "x86_64-unknown-linux-musl" > rust_target.txt ;; \
*) exit 1 ;; \
esac
# Update rustup whenever we bump the rust version
COPY rust-toolchain.toml rust-toolchain.toml
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --target $(cat rust_target.txt) --profile minimal --default-toolchain none
# Install rust:
ENV PATH="$HOME/.cargo/bin:$PATH"
# Installs the correct toolchain version from rust-toolchain.toml and then the musl target
RUN rustup target add $(cat rust_target.txt)

# Build
COPY crates crates
COPY ./Cargo.toml Cargo.toml
COPY ./Cargo.lock Cargo.lock
RUN cargo zigbuild --bin uv --target $(cat rust_target.txt) --release
RUN cp target/$(cat rust_target.txt)/release/uv /uv
# TODO(konsti): Optimize binary size, with a version that also works when cross compiling
# RUN strip --strip-all /uv
COPY rust-toolchain.toml .
RUN <<HEREDOC
# Install rustup, but skip installing a default toolchain as we only want the version from `rust-toolchain.toml`:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain none

# When rustup installs the toolchain ensure it actually uses the minimal profile, avoiding excess layer weight:
# https://github.com/rust-lang/rustup/issues/3805#issuecomment-2094066914
echo 'profile = "minimal"' >> rust-toolchain.toml
echo 'targets = [ "aarch64-unknown-linux-musl", "x86_64-unknown-linux-musl" ]' >> rust-toolchain.toml
# Add the relevant musl target triples (for a building binary with static linking):
# Workaround until `ensure` arrives: https://github.com/rust-lang/rustup/issues/2686#issuecomment-788825744
rustup show
HEREDOC
Comment on lines +37 to +48
Copy link
Copy Markdown
Author

@polarathene polarathene May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When installing rustup in the first line, as the commit history notes (and the PR description), setting --target was redundant, it emits a warning that it will ignore it.

Likewise there was a mishap with the --profile minimal being ignored when rust-toolchain.toml exists and doesn't define it's own profile key. If you'd like that file could add this line instead? I've raised an issue upstream to get the regression in rustup fixed.

The next line added to rust-toolchain.toml is probably not something you'd want to add to the actual file, since both targets aren't mandatory 🤷‍♂️

  • I could instead add them individually with rustup target add ... in their respective stages that come afterwards
  • Due to the --default-toolchain none this still requires the rustup show workaround (UPDATE: Since Aug 2024, you can use rustup toolchain install instead of rustup show), otherwise you'd install the toolchain twice. An alternative is to use an ARG directive for passing in the toolchain version to use for --default-toolchain, that way you could just build with the current stable release, but if you really need to install a toolchain that matches rust-toolchain.toml, provide that via the ARG instead?


# Handle individual images differences for ARM64 / AMD64:
FROM builder-base AS builder-arm64
ENV CARGO_BUILD_TARGET=aarch64-unknown-linux-musl

FROM builder-base AS builder-amd64
ENV CARGO_BUILD_TARGET=x86_64-unknown-linux-musl
Comment on lines +50 to +55
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively what you were doing earlier with the bash conditional statement on TARGETPLATFORM arg. But instead of writing to a file (rust_target.txt), a new intermediary stage is introduced and sets the expected target as an appropriate ENV.

This actually removes the need for the --target option when building, but I kept it for clarity.

The next stage (builder-app) refers to the TARGETARCH implicit arg from BuildKit. Thus when building the Dockerfile the stage selection here is determined from the final stage being built.

It'll remain as being built from the same BUILDPLATFORM above, but diverge at this point. Due to the cache mounts used in builder-app, this isn't a big concern, you really only have a few MB repeated from the COPY instruction, followed by the binary cp (since the target/ cache mount excludes the original binary from the image).

A prior commit in this PR history had an alternative approach where both targets were built, and these separate stages were located at the end of the Dockerfile, where they instead used COPY --from with the location to their respective target binary.


# Build app:
FROM builder-${TARGETARCH} AS builder-app
COPY crates/ crates/
COPY Cargo.toml Cargo.lock .
ARG APP_NAME=uv
ARG CARGO_HOME=/usr/local/cargo
ARG RUSTFLAGS="-C strip=symbols -C relocation-model=static -C target-feature=+crt-static -C opt-level=z"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used the RUSTFLAGS ENV here, although you can configure these in Cargo.toml + .cargo/config.toml (relocation-model and target-feature would still be rustflags IIRC). Presumably this is only relevant to the CI / Docker builds, so they're probably better managed here?

  • I'm not sure if opt-level with z (best size) is ideal? It should probably be compared/benched against a build for performance, since that is a focus of uv I doubt a little size savings is worth it if the performance regresses quite a bit. I'm not familiar with this project to know how you'd want to verify the impact.
  • +crt-static isn't necessary at the moment for the musl targets being built, but there has been talk about future changes for these targets to default to dynamic linking, so I've included it as more of a precaution, it also better communicates the desire for a static linked build.
  • relocation-model=static tends to help save on binary size, I think this is ok. The default AFAIK is more of a security feature for memory layout, but I'm not sure if that's a concern for uv as an attack vector (some attacker with access to read memory). It's related to PIE if you're familiar with that.

ARG TARGETPLATFORM
RUN \
--mount=type=cache,target="/root/.cache/zig",id="zig-cache" \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--mount=type=cache,target="/root/.cache/zig",id="zig-cache" \
--mount=type=cache,target="$HOME/.cache/zig",id="zig-cache" \

Worth using $HOME to be explicit

# Cache mounts (dirs for crates cache + build target):
# https://doc.rust-lang.org/cargo/guide/cargo-home.html#caching-the-cargo-home-in-ci
# CAUTION: As cargo uses multiple lock files (eg: `${CARGO_HOME}/{.global-cache,.package-cache,.package-cache-mutate}`), do not mount subdirs individually.
--mount=type=cache,target="${CARGO_HOME}",id="cargo-cache" \
# This cache mount is specific enough that you may not have any concurrent builds needing to share it, communicate that expectation explicitly:
--mount=type=cache,target="target/",id="cargo-target-${APP_NAME}-${TARGETPLATFORM}",sharing=locked \
# These are redundant as they're easily reconstructed from cache above. Use TMPFS mounts to exclude from cache mounts:
# TMPFS mount is a better choice than `rm -rf` command (which is risky on a cache mount that is shared across concurrent builds).
--mount=type=tmpfs,target="${CARGO_HOME}/registry/src" \
--mount=type=tmpfs,target="${CARGO_HOME}/git/checkouts" \
Comment on lines +64 to +76
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache mounting here is a bit excessive, I've added some context with comments. You're not really going to need to maintain much here AFAIK, so it's mostly out of sight/mind in practice.

If you're not familiar with this feature, the default sharing type is shared, which allows other concurrent builds to share the same data for each mount by matching id, if no id is configured it implicitly uses the target path.

We have 3 caches here, zig, cargo, and the project specific target/ directory.

  • zig I'm not familiar with it's cache system. I assume it's a global cache from the path and that it's fine for concurrent writers (aka sharing=shared, the implicit default).
  • cargo mounts the entire cargo home location as it doesn't yet have an isolated cache location.
    • This does mean if any other Dockerfile used the id of cargo-cache they'd both be sharing the same contents, including whatever is installed in bin/.
    • It'd normally not be safe for concurrent writers AFAIK, but due to the lock files cargo manages here, it's a non-issue.
  • The target/ location has an id assigned per TARGETPLATFORM, and is also isolated by APP_NAME (uv) as this is not useful to share with other Dockerfile unrelated to uv project.
    • While there shouldn't be any concurrent writers, if someone was to to do several parallel builds with RUSTFLAGS arg being changed, the sharing=locked will block them as they'd all want to compile the dependencies again (making the cache a bit less useful, but not as wasteful as individual caches via sharing=private which aren't deterministic for repeat builds).
    • A previous commit did take a different approach, where the build process was part of the original builder stage and built both targets together, instead of relying on TARGETPLATFORM. If you don't mind always building for both targets, they could share a bit of a common target/release/ cache contents AFAIK. The logic for the two commands below was complicated a bit more with a bash loop (since static builds require and explicit --target AFAIK, even when multiple targets are configured via rust-toolchain.toml / .cargo/config.toml?), so I opted for keeping the logic below simple to grok by using separate TARGETPLATFORM caches (these can still build concurrently as I demo'd in the PR description).

The last two tmpfs mounts are rather self-explanatory. There is no value in caching these to disk.

If you do want something that looks simpler, Earthly has tooling to simplify rust builds and cache management. However Dockerfile is more common knowledge to have and troubleshoot/maintain, adding another tool/abstraction didn't seem worthwhile to contribute.

<<HEREDOC
cargo zigbuild --release --bin "${APP_NAME}" --target "${CARGO_BUILD_TARGET}"
cp "target/${CARGO_BUILD_TARGET}/release/${APP_NAME}" "/${APP_NAME}"
HEREDOC
Comment on lines +77 to +80
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using --release, but I noticed in your Cargo.toml you have a custom release profile that uses lto = "thin", it could be changed to use that profile instead, otherwise LTO should implicitly default to thin-local.


# Final stage - Image containing only uv + empty /io dir:
FROM scratch
COPY --from=build /uv /uv
COPY --from=builder-app /uv /uv
WORKDIR /io
ENTRYPOINT ["/uv"]