Skip to content

[AMD] Merge Dockerfiles for ROCm#19203

Merged
HaiShaw merged 3 commits intosgl-project:mainfrom
akao-amd:amd_merge_720_dockerfile
Feb 27, 2026
Merged

[AMD] Merge Dockerfiles for ROCm#19203
HaiShaw merged 3 commits intosgl-project:mainfrom
akao-amd:amd_merge_720_dockerfile

Conversation

@akao-amd
Copy link
Copy Markdown
Contributor

Motivation

rocm720.Dockerfile and rocm.Dockerfile shares a lot in common. It is time to unify them.

Modifications

Making rocm720.Dockerfile a superset of rocm.Dockerfile was the design choice back in #17799 . This PR mostly replaces rocm.Dockerfile with rocm720.Dockerfile, and modifies all the users of it accordingly. Refactoring workflow files is a reasonable next step but not include in this PR.

Accuracy Tests

Both rocm700-mi35x and rocm720-mi35x pass the accuracy test of DEEPSEEK-R1-MXFP4-Preview model.

Benchmarking and Profiling

N/A.

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @akao-amd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the Docker build process for ROCm environments by consolidating two previously separate Dockerfiles, rocm.Dockerfile and rocm720.Dockerfile, into a single, comprehensive rocm.Dockerfile. This change integrates ROCm 7.2 specific configurations and necessary dependency hotfixes directly into the main build definition, reducing redundancy and simplifying the overall build system. The modifications ensure that both ROCm 7.0 and 7.2 environments can be built from a single source, maintaining accuracy and compatibility across different ROCm versions.

Highlights

  • Dockerfile Unification: The rocm720.Dockerfile has been merged into rocm.Dockerfile, consolidating the build logic for different ROCm versions into a single Dockerfile. This simplifies maintenance and ensures consistency.
  • ROCm 7.2 Support Enhancements: The unified rocm.Dockerfile now includes specific build arguments, base images, and conditional logic to properly support ROCm 7.2, including installing the AMD SMI Python package and applying hot patches for AITER and PyTorch/Triton dependencies.
  • Triton and AITER Dependency Management: Updated the Triton repository and commit hash. Introduced hot patches to address specific dependency issues with PyTorch/Triton and AITER when building with ROCm 7.2, ensuring compatibility and correct functionality.
  • CI Script Simplification: The CI script amd_ci_start_container.sh was updated to remove the conditional selection of Dockerfiles, now always pointing to the unified rocm.Dockerfile.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docker/rocm.Dockerfile
    • Updated Docker build command comments to reflect v0.5.8.post1 and added rocm720 specific build examples.
    • Introduced new ARG declarations for BASE_IMAGE_942_ROCM720 and BASE_IMAGE_950_ROCM720.
    • Added new FROM stages (gfx942-rocm720 and gfx950-rocm720) to define base images and environment variables for ROCm 7.2.
    • Updated TRITON_REPO to https://github.com/triton-lang/triton.git and TRITON_COMMIT to a new hash.
    • Implemented conditional installation of the AMD SMI Python package for rocm720 builds.
    • Added psutil and pybind11 as required dependencies for AITER.
    • Introduced a conditional hot patch for AITER's pa_mqa_logits.py for ROCm 7.2 builds.
    • Appended export PYTHONPATH=/sgl-workspace/aiter:${PYTHONPATH} to /etc/bash.bashrc.
    • Removed the previous Triton build section, replacing it with a later hot patch.
    • Added vim to the list of system packages installed.
    • Added tabulate to the list of Python tools installed.
    • Set ENV PYTORCH_ROCM_ARCH=gfx942;gfx950.
    • Included a Python script for a hot patch to modify the torch-ROCm wheel, relaxing its Triton dependency for ROCm 7.2.
    • Added a new hot patch section for Triton, building it from source for ROCm 7.2.
    • Added ENV SGLANG_DISABLE_CUDNN_CHECK=1.
  • docker/rocm720.Dockerfile
    • Removed the entire Dockerfile, as its functionality has been integrated into rocm.Dockerfile.
  • scripts/ci/amd/amd_ci_start_container.sh
    • Removed conditional logic for selecting rocm720.Dockerfile or rocm.Dockerfile, now always using rocm.Dockerfile.
Ignored Files
  • Ignored by pattern: .github/workflows/** (3)
    • .github/workflows/pr-test-amd-rocm720.yml
    • .github/workflows/release-docker-amd-rocm720-nightly.yml
    • .github/workflows/release-docker-amd.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully unifies rocm720.Dockerfile into rocm.Dockerfile, streamlining the Dockerfile management for ROCm builds. The changes involve updating base images, adding ROCm 7.2 specific patches for AMD SMI and AITER, and modifying the CI script to use the unified Dockerfile. The accuracy tests passed for the specified models, indicating functional correctness. The removal of rocm720.Dockerfile and the consolidation of logic are positive steps towards better maintainability.

Comment on lines +175 to +176
fi \
&& echo "export PYTHONPATH=/sgl-workspace/aiter:\${PYTHONPATH}" >> /etc/bash.bashrc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PYTHONPATH export should ideally be placed in a more persistent configuration file like /etc/profile or a custom script that sources it, rather than /etc/bash.bashrc. While /etc/bash.bashrc works for interactive shells, it might not be sourced in non-interactive contexts (e.g., when running scripts or services), leading to unexpected behavior. Alternatively, if this is only for the build environment, it could be set as an ENV variable in the Dockerfile itself, or passed as a build argument to the Python setup.py command if applicable.

     && echo "export PYTHONPATH=/sgl-workspace/aiter:${PYTHONPATH}" >> /etc/profile

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved.

@akao-amd akao-amd force-pushed the amd_merge_720_dockerfile branch from 8a57843 to 7a49043 Compare February 25, 2026 07:28
@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Feb 26, 2026

/tag-and-rerun-ci

Copy link
Copy Markdown
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep using release-docker-amd-nightly.yml vs. release-docker-amd-rocm720-nightly.yml

@@ -117,17 +174,6 @@ RUN cd aiter \
sh -c "GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop"; \
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost && echo "export PYTHONPATH=/sgl-workspace/aiter:\${PYTHONPATH}" >> /etc/bash.bashrc in case ROCm 7.2
Please be careful at merging two dockerfiles

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (YC, BingXu) discussed yesterday that the line wasn't needed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed so sys.path on ROCm 7.2 is set properly to include /sgl-workspace/aiter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but can you elaborate? If it is the case, why doesn't ROCm 7.0 image need it in the first place?
CC. @yctseng0211 @bingxche

&& cd python \
&& python setup.py install; \
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep this for ROCm 7.0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just move to bottom, not removed.

&& pip install -r python/requirements.txt \
&& pip install -e .; \
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L426-L509, pls keep it identical as in ROCm 7.2 dockerfile, and apply it only to ROCm7.2 build

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I believe BUILD_TRITON is already a better flag for this block and serves its purpose correctly.

Currently ROCm 7.2 images gives BUILD_TRITON=1 while ROCm 7.0 images give BUILD_TRITON=0. The flag has a finer granularity that can serve us well. For example, if we don't want to update gfx942-rocm720 for some reason, then it can keep as is while we may have fixed the triton dependency issue for gfx950-rocm720.

@akao-amd akao-amd requested a review from bingxche as a code owner February 26, 2026 01:24
@akao-amd
Copy link
Copy Markdown
Contributor Author

@HaiShaw Thanks for the review. Fixed two positions (vllm removal and cancel prompt for pip install) accordingly. For the rest I commented without modification.

@akao-amd akao-amd force-pushed the amd_merge_720_dockerfile branch from a27cd12 to 525cbc6 Compare February 26, 2026 05:29
@akao-amd akao-amd force-pushed the amd_merge_720_dockerfile branch from 525cbc6 to a5cd2f1 Compare February 26, 2026 05:29
@bingxche
Copy link
Copy Markdown
Collaborator

@HaiShaw HaiShaw merged commit 9b2fbf7 into sgl-project:main Feb 27, 2026
54 of 63 checks passed
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
sammysun0711 pushed a commit to sammysun0711/sglang that referenced this pull request Mar 20, 2026
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants