Skip to content

docs: add AMD MI300X/MI325X/MI350X/MI355X to Hunyuan 3 Preview cookbook#23582

Open
andyluo7 wants to merge 1 commit into
sgl-project:mainfrom
andyluo7:add-amd-hy3-preview-cookbook
Open

docs: add AMD MI300X/MI325X/MI350X/MI355X to Hunyuan 3 Preview cookbook#23582
andyluo7 wants to merge 1 commit into
sgl-project:mainfrom
andyluo7:add-amd-hy3-preview-cookbook

Conversation

@andyluo7
Copy link
Copy Markdown
Contributor

Summary

Adds AMD MI300X / MI325X / MI350X / MI355X to the Hunyuan 3 Preview cookbook entry that landed in #23532. End-to-end validated on a single 8×MI300X (gfx942) node and an 8×MI355X (gfx950) node with TP=8, BF16, both with and without MTP speculative decoding. MI325X (gfx942) and MI350X (gfx950) are listed by hardware parity; the same image, file overlay, and flags apply.

Changes

docs_new/cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdx

docs_new/src/snippets/autoregressive/hunyuan3-preview-deployment.jsx

  • hardware items: add MI300X, MI325X, MI350X, MI355X.
  • modelConfigs: add the four AMD platforms with tp=8, mem=0.85 (matches the in-recipe text).
  • generateCommand:
    • Detects AMD and prepends SGLANG_USE_AITER_AR=0 to the generated command.
    • On AMD MTP, uses --speculative-num-steps 1 --speculative-num-draft-tokens 2 (model card's MTP recipe), versus 3 / 4 on NVIDIA.

Why no AMD perf table in this PR

Performance on AMD will improve once #23581 lands (the workaround forces a non-AITER all-reduce path). Posting AMD perf numbers under the workaround configuration would be a snapshot of a transient state. We can add an AMD perf section in a follow-up PR after #23581 merges and the AMD-tuned MTP defaults stabilize.

Validation

The recipe text was sanity-tested against a real deployment: server boots cleanly at TP=8 with cuda-graph capture, smoke and concurrent inference produce coherent output on both MI300X and MI355X nodes.

Refs

End-to-end validated on a single 8xMI300X (gfx942) node and an 8xMI355X
(gfx950) node with TP=8, BF16, both with and without MTP speculative
decoding. MI325X (gfx942) and MI350X (gfx950) are listed by hardware
parity; the same image, file overlay, and flags apply to those GPUs.

Changes:

  docs_new/cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdx:
    - metatag description: mention NVIDIA + AMD.
    - Section 2 (SGLang Installation): add two AMD rows to the Docker
      image table (rocm/sgl-dev mi30x and mi35x nightlies). Updated the
      table caption to reference PR sgl-project#23533 (model code) for AMD users.
    - Section 3.2 (Configuration Tips): add a new "AMD MI300X /
      MI325X / MI350X / MI355X" subsection with the three-step
      recipe (pull AMD nightly image -> overlay PR sgl-project#23533 model files
      -> set SGLANG_USE_AITER_AR=0), full sglang serve commands for
      both no-MTP and MTP, and a forward-looking note that the workaround
      will be unnecessary once PRs sgl-project#23533 and sgl-project#23581 ship in rocm/sgl-dev.
    - Section 3.2 also adds a "Hardware Requirements: AMD BF16" block
      mirroring the existing NVIDIA one.

  docs_new/src/snippets/autoregressive/hunyuan3-preview-deployment.jsx:
    - hardware items: add MI300X, MI325X, MI350X, MI355X.
    - modelConfigs: add the four AMD platforms with tp=8, mem=0.85
      (matching the recipe in the .mdx).
    - generateCommand: detect AMD and prepend SGLANG_USE_AITER_AR=0 to
      the generated command. Also tweak MTP defaults to (num-steps=1,
      num-draft-tokens=2) on AMD, matching the model card's MTP recipe.

Refs:
  sgl-project#23533 (Hy3-preview model code support)
  sgl-project#23580 (HIP CUDA-graph capture invalidation bug)
  sgl-project#23581 (fix for the bug)
@andyluo7 andyluo7 requested a review from wisclmy0611 as a code owner April 23, 2026 19:31
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants