Skip to content

Remove chmod -R a+rwX /opt to eliminate ~22GB image layer#41

Merged
kyuz0 merged 3 commits into
kyuz0:mainfrom
Lafunamor:fix-chmod-layer
Apr 18, 2026
Merged

Remove chmod -R a+rwX /opt to eliminate ~22GB image layer#41
kyuz0 merged 3 commits into
kyuz0:mainfrom
Lafunamor:fix-chmod-layer

Conversation

@Lafunamor
Copy link
Copy Markdown

Summary

Removes the chmod -R a+rwX /opt from the final cleanup step.

Why this reduces image size

Docker/BuildKit stores images as a stack of immutable layers. Each RUN instruction produces a new layer containing only the diff of filesystem changes. The critical detail is how overlayfs tracks changes: even a pure permission change (no content change) counts as a modified file and is written into the new layer.

chmod -R a+rwX /opt runs after ROCm, PyTorch, vLLM, flash-attention, and bitsandbytes have all been installed into /opt. At that point /opt contains ~20GB of files. The chmod touches the metadata of every single one of them, so the resulting layer is essentially a full duplicate of /opt — just to record permission bits.

Inspecting the current image with podman history confirms this:

22.3GB   RUN chmod -R a+rwX /opt
15.5GB   COPY /opt/rocm ...
 6.8GB   COPY /opt/venv ...

The 22GB chmod layer is larger than either of the directories it touches because overlayfs must record every modified inode.

Is the chmod actually needed?

No. Files installed by pip and extracted from the TheRock ROCm tarball already have correct permissions (755 for executables and directories, 644 for data files). The chmod was added defensively but is redundant in practice.

Impact

Removing it saves ~22GB from the final image with no functional change.

In Docker/BuildKit, every RUN instruction that modifies existing files
creates a new layer containing the full diff of those changes. Running
chmod -R on /opt after installing ROCm, PyTorch, vLLM, flash-attention,
and bitsandbytes touches every file in the directory, causing the
container runtime (overlayfs) to track all of them again in a new layer.
This effectively doubles the storage cost of /opt in the final image.

Files installed by pip and the ROCm tarball already have appropriate
permissions (755 for executables/dirs, 644 for data files). The chmod
was redundant and its removal saves ~20GB from the final image size,
as confirmed by inspecting layer sizes with podman history.
The original Dockerfile had two instances of this command — one in the
cleanup step (removed previously) and one after the RCCL install. Both
create the same oversized layer for the same reason.
@kyuz0
Copy link
Copy Markdown
Owner

kyuz0 commented Apr 18, 2026

Yeah, this was not very smart of me, I seem to recall I wanted the folder to be writable for some benchmarks, but even so, making every single file writable makes no sense.

@kyuz0 kyuz0 merged commit ef8640c into kyuz0:main Apr 18, 2026
@Lafunamor Lafunamor deleted the fix-chmod-layer branch April 18, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants