Remove chmod -R a+rwX /opt to eliminate ~22GB image layer#41
Merged
Conversation
In Docker/BuildKit, every RUN instruction that modifies existing files creates a new layer containing the full diff of those changes. Running chmod -R on /opt after installing ROCm, PyTorch, vLLM, flash-attention, and bitsandbytes touches every file in the directory, causing the container runtime (overlayfs) to track all of them again in a new layer. This effectively doubles the storage cost of /opt in the final image. Files installed by pip and the ROCm tarball already have appropriate permissions (755 for executables/dirs, 644 for data files). The chmod was redundant and its removal saves ~20GB from the final image size, as confirmed by inspecting layer sizes with podman history.
The original Dockerfile had two instances of this command — one in the cleanup step (removed previously) and one after the RCCL install. Both create the same oversized layer for the same reason.
Owner
|
Yeah, this was not very smart of me, I seem to recall I wanted the folder to be writable for some benchmarks, but even so, making every single file writable makes no sense. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the
chmod -R a+rwX /optfrom the final cleanup step.Why this reduces image size
Docker/BuildKit stores images as a stack of immutable layers. Each
RUNinstruction produces a new layer containing only the diff of filesystem changes. The critical detail is how overlayfs tracks changes: even a pure permission change (no content change) counts as a modified file and is written into the new layer.chmod -R a+rwX /optruns after ROCm, PyTorch, vLLM, flash-attention, and bitsandbytes have all been installed into/opt. At that point/optcontains ~20GB of files. The chmod touches the metadata of every single one of them, so the resulting layer is essentially a full duplicate of/opt— just to record permission bits.Inspecting the current image with
podman historyconfirms this:The 22GB chmod layer is larger than either of the directories it touches because overlayfs must record every modified inode.
Is the chmod actually needed?
No. Files installed by
pipand extracted from the TheRock ROCm tarball already have correct permissions (755 for executables and directories, 644 for data files). The chmod was added defensively but is redundant in practice.Impact
Removing it saves ~22GB from the final image with no functional change.