Make the main Dockerfile to work on aarch64 #7721

kyleliang-nv · 2025-07-02T17:18:38Z

Motivation

The current Dockerfile does not build for aarch64. This PR makes the main Dockerfile platform agnostic
which allows people to build it for both x86 and aarch64 targets.

Modifications

Change hard-coded x86_64 path to be platform dependent.
For aarch64 build, install fzyzcjy's DeepEP fork with commit 1b14ad661c7640137fcfe93cccb2694ede1220b0
In release-docker.yml, add in b200 and gb200 with cuda 12.9.1
In release-docker-dev.yml, build both blackwell and blackwell-cu129, with blackwell-cu129 being a multiplatform docker image
Change release-docker-gb200.yml to not auto-trigger on version change.
NOTE: This PR should make Dockerfile.gb200 and release-docker-gb200.yml obsolete, however, will keep these two files around and remove them in the figure, when we are sure everything works with the current github actions

Accuracy Tests

This PR was tested by changing the docker push target to https://hub.docker.com/repository/docker/chingkai1023/sglang/tags

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @kyleliang-nv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Dockerfile's compatibility by making it platform-agnostic, specifically enabling builds for both x86 and aarch64 architectures. My changes primarily involve replacing hardcoded x86_64 references with dynamic $(uname -m) commands or conditional logic. This ensures that architecture-specific dependencies, symlinks, and package sources are correctly resolved during the Docker build process, greatly improving the portability of the build environment.

Highlights

Platform Agnostic Symlinks: I've modified the libmlx5.so symlink creation in the Dockerfile to dynamically use $(uname -m) instead of a hardcoded x86_64 path. This ensures the symlink is correctly created regardless of the underlying architecture.
Dynamic SGLang Kernel Wheel Installation: The SGLang kernel wheel download URL has been updated to incorporate $(uname -m) for the architecture suffix. This allows the Docker build to fetch the correct pre-built wheel for either x86 or aarch64 targets.
Multi-Architecture NVIDIA Repositories: I've adjusted the apt source list and apt-key commands for NVIDIA developer tools and CUDA repositories. These now conditionally select amd64 or arm64 based on the output of uname -m, ensuring the correct repository is used for package management on different architectures.
CMake Installation for Multiple Architectures: The CMake download URL and extraction paths have been changed to dynamically include $(uname -m). This enables the Dockerfile to download and install the appropriate CMake binary for the detected build architecture, enhancing cross-platform compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the Dockerfile to support aarch64 builds by replacing hardcoded x86_64 values with dynamic architecture detection. A critical issue was identified where the architecture check for the NVIDIA CUDA repository key uses arm64 instead of sbsa for aarch64. Suggestions were provided to improve the maintainability of the Dockerfile by using variables to reduce repetitive code.

docker/Dockerfile

ishandhanani

We should add this into CI and allow users to pull either platform from the lmsys docker repository. WDYT @zhyncs?

cliffwoolley · 2025-07-17T04:37:25Z

docker/Dockerfile

"Blackwell" encompasses more than sm_100. (Same comment below for TORCH_CUDA_ARCH_LIST.) Maybe we should be more expansive in the list here?

Would CMAKE_CUDA_ARCHITECTURES=100;120 be sufficient?
And for TORCH_CUDA_ARCH_LIST, will TORCH_CUDA_ARCH_LIST="10.0 12.0" be sufficient?
Since this is a blackwell build, I'm thinking if I should include sm90.

FP8 or FP4 kernels add complexity to the explanation because of arch and/or family conditional compilation.

For the kernels that aren't fp8 or fp4, 10.0 + 12.0 together covers a lot of ground for the Blackwell families, yes. Thor is the other one; its number had been 10.1 but it will be renumbered as SM 11.0 in CUDA 13 here shortly. (We could perhaps skip Thor for the moment but expect to add it later.)

As to the "what about 9.0", I think my best advice is to have one build that serves all the primary targets, so at least here that's Hopper and Blackwell together. Instead of having the "blackwell" one not be the default.

Thanks for the explaination Cliff. I'll work on these changes.
Claiming it to be Hopper and Blackwell is a bit challenging right now, due to the DeepEP library where they explictly check that TORCH_CUDA_ARCH_LIST have to be exactly "9.0", otherwise will disable some features.
https://github.com/deepseek-ai/DeepEP/blob/main/setup.py#L70-L73.
Not sure how much SGLang depends on this specific feature in DeepEP package.

docker/Dockerfile

ishandhanani · 2025-08-13T19:30:30Z

.github/workflows/release-docker-dev.yml

should we call this blackwell-cu128?

I think that's more consistent, but will render blackwell tag obsolete. Or, we can tag it as blackwell-cu128 and instead add additional tag for blackwell-cu129asblackwell. So that blackwell` is the one that have multi-platform (with cu129).

ishandhanani

Small comment on tag but LGTM

kyleliang-nv · 2025-09-30T16:05:31Z

Closing this PR since #10705 is already merged in

kyleliang-nv requested review from ByronHsu, HaiShaw and zhyncs as code owners July 2, 2025 17:18

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

docker/Dockerfile Outdated Show resolved Hide resolved

ishandhanani approved these changes Jul 2, 2025

View reviewed changes

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch 2 times, most recently from 9e7dee6 to 4af8600 Compare July 15, 2025 03:31

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from 4ebda8c to f817e6e Compare July 16, 2025 19:02

cliffwoolley reviewed Jul 17, 2025

View reviewed changes

cliffwoolley reviewed Jul 18, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from 390bf94 to bc6ffbe Compare July 18, 2025 22:28

kyleliang-nv mentioned this pull request Jul 18, 2025

Add GB200 wide-EP docker #8157

Merged

6 tasks

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch 2 times, most recently from bf6d118 to 8dcbbd1 Compare July 25, 2025 16:17

kyleliang-nv marked this pull request as draft July 29, 2025 05:43

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch 3 times, most recently from 3427207 to 63c339c Compare August 6, 2025 18:21

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch 2 times, most recently from 29f5ee8 to 6c51a23 Compare August 13, 2025 16:38

kyleliang-nv changed the title ~~Make the main docker build file to work on aarch64~~ Make the main Dockerfile to work on aarch64 Aug 13, 2025

kyleliang-nv marked this pull request as ready for review August 13, 2025 18:55

kyleliang-nv requested a review from merrymercy as a code owner August 13, 2025 18:55

ishandhanani reviewed Aug 13, 2025

View reviewed changes

ishandhanani approved these changes Aug 13, 2025

View reviewed changes

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from e3f56bc to 1af3696 Compare August 14, 2025 16:45

zhyncs self-assigned this Aug 14, 2025

zhyncs added the high priority label Aug 14, 2025

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from b90f009 to c846af6 Compare August 15, 2025 20:35

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from c846af6 to 63072e8 Compare August 18, 2025 15:53

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from b18d7ed to ac12dc3 Compare September 3, 2025 23:29

kyleliang-nv mentioned this pull request Sep 4, 2025

[CI] Fix runner for sgl-kernel #9887

Merged

4 tasks

kyleliang-nv added 12 commits September 8, 2025 11:02

Make the main docker build file to work on aarch64

9d90319

Fix bad merge

6a9343b

Cleanup docker release github action

085cb3a

Fix release-docker-dev.yml docker image tag

6d01a9f

Remove whitespace

ff4c62c

Change release-docker-gb200 to not run automatically

de1365e

Change x86 build to run on ubuntu-latest

c6c44c9

Build hopper nvhsmem kernels for blackwell build type

00ef92b

Add sgl-kernel version arg

c08e8a1

Fix docker build

9e881ca

Rename docker tag

95a8368

Revert pipefail change

3e85f4b

kyleliang-nv force-pushed the feature/update_sglang_docker_for_aarch64 branch from 46744b3 to 3e85f4b Compare September 8, 2025 18:03

kyleliang-nv closed this Sep 30, 2025

kyleliang-nv deleted the feature/update_sglang_docker_for_aarch64 branch October 17, 2025 16:23

Make the main Dockerfile to work on aarch64 #7721

Make the main Dockerfile to work on aarch64 #7721

Uh oh!

Conversation

kyleliang-nv commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ishandhanani left a comment

Choose a reason for hiding this comment

Uh oh!

cliffwoolley Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

kyleliang-nv Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

cliffwoolley Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

kyleliang-nv Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ishandhanani Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

kyleliang-nv Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

ishandhanani left a comment

Choose a reason for hiding this comment

Uh oh!

kyleliang-nv commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kyleliang-nv commented Jul 2, 2025 •

edited

Loading