docs: add AMD MI300X/MI325X/MI350X/MI355X to Hunyuan 3 Preview cookbook#23582
Open
andyluo7 wants to merge 1 commit into
Open
docs: add AMD MI300X/MI325X/MI350X/MI355X to Hunyuan 3 Preview cookbook#23582andyluo7 wants to merge 1 commit into
andyluo7 wants to merge 1 commit into
Conversation
End-to-end validated on a single 8xMI300X (gfx942) node and an 8xMI355X
(gfx950) node with TP=8, BF16, both with and without MTP speculative
decoding. MI325X (gfx942) and MI350X (gfx950) are listed by hardware
parity; the same image, file overlay, and flags apply to those GPUs.
Changes:
docs_new/cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdx:
- metatag description: mention NVIDIA + AMD.
- Section 2 (SGLang Installation): add two AMD rows to the Docker
image table (rocm/sgl-dev mi30x and mi35x nightlies). Updated the
table caption to reference PR sgl-project#23533 (model code) for AMD users.
- Section 3.2 (Configuration Tips): add a new "AMD MI300X /
MI325X / MI350X / MI355X" subsection with the three-step
recipe (pull AMD nightly image -> overlay PR sgl-project#23533 model files
-> set SGLANG_USE_AITER_AR=0), full sglang serve commands for
both no-MTP and MTP, and a forward-looking note that the workaround
will be unnecessary once PRs sgl-project#23533 and sgl-project#23581 ship in rocm/sgl-dev.
- Section 3.2 also adds a "Hardware Requirements: AMD BF16" block
mirroring the existing NVIDIA one.
docs_new/src/snippets/autoregressive/hunyuan3-preview-deployment.jsx:
- hardware items: add MI300X, MI325X, MI350X, MI355X.
- modelConfigs: add the four AMD platforms with tp=8, mem=0.85
(matching the recipe in the .mdx).
- generateCommand: detect AMD and prepend SGLANG_USE_AITER_AR=0 to
the generated command. Also tweak MTP defaults to (num-steps=1,
num-draft-tokens=2) on AMD, matching the model card's MTP recipe.
Refs:
sgl-project#23533 (Hy3-preview model code support)
sgl-project#23580 (HIP CUDA-graph capture invalidation bug)
sgl-project#23581 (fix for the bug)
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds AMD MI300X / MI325X / MI350X / MI355X to the Hunyuan 3 Preview cookbook entry that landed in #23532. End-to-end validated on a single 8×MI300X (gfx942) node and an 8×MI355X (gfx950) node with TP=8, BF16, both with and without MTP speculative decoding. MI325X (gfx942) and MI350X (gfx950) are listed by hardware parity; the same image, file overlay, and flags apply.
Changes
docs_new/cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdxmetatags.description: mention NVIDIA + AMD instead of NVIDIA only.§2 SGLang Installation: add two AMD rows to the Docker-image table:
AMD MI300X / MI325X→rocm/sgl-dev:v0.5.10.post1-rocm720-mi30x-20260423AMD MI350X / MI355X→rocm/sgl-dev:v0.5.10.post1-rocm720-mi35x-20260423rocm/sgl-dev.§3.2 Configuration Tips: add a
Hardware Requirements: AMD BF16block mirroring the NVIDIA one, plus a new AMD MI300X / MI325X / MI350X / MI355X subsection with the three-step recipe:/sgl-workspace/sglanginstall +pip install -U "transformers>=5.6.0"SGLANG_USE_AITER_AR=0(the default-flip is filed in [AMD] Default SGLANG_USE_AITER_AR to false to avoid HIP graph capture invalidation #23581)Followed by full no-MTP and MTP
python3 -m sglang.launch_server …examples and a forward-looking note that none of the workaround steps will be needed once support Hy3 preview #23533 and [AMD] Default SGLANG_USE_AITER_AR to false to avoid HIP graph capture invalidation #23581 ship inrocm/sgl-dev.docs_new/src/snippets/autoregressive/hunyuan3-preview-deployment.jsxhardwareitems: addMI300X,MI325X,MI350X,MI355X.modelConfigs: add the four AMD platforms withtp=8, mem=0.85(matches the in-recipe text).generateCommand:SGLANG_USE_AITER_AR=0to the generated command.--speculative-num-steps 1 --speculative-num-draft-tokens 2(model card's MTP recipe), versus3 / 4on NVIDIA.Why no AMD perf table in this PR
Performance on AMD will improve once #23581 lands (the workaround forces a non-AITER all-reduce path). Posting AMD perf numbers under the workaround configuration would be a snapshot of a transient state. We can add an AMD perf section in a follow-up PR after #23581 merges and the AMD-tuned MTP defaults stabilize.
Validation
The recipe text was sanity-tested against a real deployment: server boots cleanly at TP=8 with cuda-graph capture, smoke and concurrent inference produce coherent output on both MI300X and MI355X nodes.
Refs
SGLANG_USE_AITER_ARdefault flipped on HIP)