Skip to content

Convert to multi-stage build to reduce runtime image size#39

Closed
Lafunamor wants to merge 6 commits into
kyuz0:mainfrom
Lafunamor:multi-stage-build
Closed

Convert to multi-stage build to reduce runtime image size#39
Lafunamor wants to merge 6 commits into
kyuz0:mainfrom
Lafunamor:multi-stage-build

Conversation

@Lafunamor
Copy link
Copy Markdown

Summary

  • Splits the Dockerfile into a builder stage and a runtime stage
  • Removes the pinned transformers==5.3.0 override (also covered by Remove pinned transformers==5.3.0 #38 — if that merges first, this PR's diff will be cleaner)

What changes

Builder stage retains everything needed for compilation:

  • Full TheRock ROCm SDK including LLVM/Clang (used as CC/CXX to prevent ABI segfault)
  • vLLM source tree, bitsandbytes sources, flash-attention sources
  • All build tools (gcc, cmake, ninja, python3.12-devel, etc.)

Runtime stage starts from a clean Fedora base and receives only:

  • /opt/rocm — full copy, kept generous to avoid runtime surprises
  • /opt/venv — compiled venv with vLLM, PyTorch, flash-attention, bitsandbytes, ray

Everything else is shed: compiler toolchain, source trees, build caches, *-devel headers.

Expected impact

~20-30GB reduction in final image size. The main saving is the LLVM/Clang compiler bundled in /opt/rocm/llvm plus build tools that are only needed at compile time.

Transformers pin removal

The transformers==5.3.0 pin is removed because vLLM PR #30566 (merged 2026-04-15) updated the constraint from < 5 to != 5.0.*..5.5.0. The pin was both redundant and actively conflicting with the new constraint since 5.3.0 is now explicitly excluded.

adrian added 6 commits April 16, 2026 18:17
Builder stage retains the full ROCm SDK + LLVM/Clang toolchain, vLLM
source tree, and bitsandbytes sources needed for compilation. Runtime
stage starts from a clean Fedora base and receives only /opt/rocm and
/opt/venv via COPY --from=builder, shedding gcc, cmake, ninja, *-devel
packages, all source trees and build caches. Expected reduction: ~20-30GB.
Triton compiles GPU kernels at runtime using ROCm Clang and requires
system C headers (stdlib.h). These were implicitly available in the
single-stage build via gcc but missing from the runtime stage.
Removes ~6-7GB of content not needed at runtime:
- Test and benchmark client suites (/opt/rocm/clients, /opt/rocm/tests)
- Profiler and video decode tooling (rocprofiler-systems, rocdecode, rdc)
- Static libraries (.a files) in /opt/rocm/lib and /opt/rocm/lib/llvm/lib
- Test/bench/validate binaries and gtest data in /opt/rocm/bin
- Build tools (hipify-clang, rocblas-gemm-tune)
Triton compiles Python extension modules (hip_utils.cpython-*.so) at
runtime using ROCm Clang, which requires Python.h from python3.12-devel.
The final chmod created a 22GB layer by touching every file in /opt/rocm
and /opt/venv. Using --chmod=755 on the COPY instructions applies
permissions inline with no extra layer, saving ~22GB from the image.
@Lafunamor
Copy link
Copy Markdown
Author

Closing in favour of the more focused #41 (chmod fix) which addresses the image size issue more conservatively. The multi-stage build approach needs more validation before it's ready to contribute upstream.

@Lafunamor Lafunamor closed this Apr 16, 2026
@Lafunamor Lafunamor deleted the multi-stage-build branch April 18, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant