Skip to content

[Hy3-preview] Add AMD MI300X/MI325X/MI350X/MI355X support#368

Merged
esmeetu merged 2 commits into
vllm-project:mainfrom
andyluo7:add-amd-mi300x-mi355x-hy3-preview
Apr 28, 2026
Merged

[Hy3-preview] Add AMD MI300X/MI325X/MI350X/MI355X support#368
esmeetu merged 2 commits into
vllm-project:mainfrom
andyluo7:add-amd-mi300x-mi355x-hy3-preview

Conversation

@andyluo7
Copy link
Copy Markdown
Contributor

Summary

Adds AMD MI300X / MI325X / MI350X / MI355X to the verified hardware list for the tencent/Hy3-preview recipe.

End-to-end validated on a single 8×MI300X (gfx942) node (SVR08) and an 8×MI355X (gfx950) node (Tensorwave mia1-p01-g07) with TP=8, BF16, both with and without MTP speculative decoding. MI325X (gfx942) and MI350X (gfx950) are listed as verified by hardware parity; the same image, build, and flags apply to those GPUs.

Changes

models/tencent/Hy3-preview.yaml only. No code changes elsewhere.

  • meta.hardware: add mi300x: verified, mi325x: verified, mi350x: verified, mi355x: verified.
  • meta.performance_headline: extended to mention AMD platforms.
  • hardware_overrides.amd:
    • install_note explaining that until vLLM PR #40681 (Hy3-preview model code) merges, AMD users must build vLLM editable from the PR branch into the published rocm/vllm-dev:nightly image. Includes the canonical reproducer (docker run + pip install) and the PYTHONPATH=/work/build/vllm workaround for the /app/vllm namespace conflict in the base image (silently breaks from vllm import SamplingParams in subprocesses otherwise).
    • extra_env enables the AITER fast paths used during validation: VLLM_ROCM_USE_AITER=1, VLLM_ROCM_USE_AITER_MOE=1, VLLM_ROCM_USE_AITER_MHA=1, VLLM_ROCM_USE_AITER_RMSNORM=1, VLLM_ROCM_USE_AITER_LINEAR=1.
  • guide: adds a Serving on 8×AMD MI300X / MI325X / MI350X / MI355X section with both no-MTP and MTP command examples. The existing NVIDIA section is preserved unchanged.

Validation

$ node scripts/build-recipes-api.mjs
✓ JSON API: 78 models, 8 strategies
  /models.json
  /{hf_org}/{hf_repo}.json  (e.g. /moonshotai/Kimi-K2.5.json)
  /strategies.json
  /taxonomy.json

YAML parses cleanly:

$ python3 -c "import yaml; d=yaml.safe_load(open('models/tencent/Hy3-preview.yaml')); print(d['meta']['hardware'])"
{'h200': 'verified', 'mi300x': 'verified', 'mi325x': 'verified', 'mi350x': 'verified', 'mi355x': 'verified'}

Refs

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment Apr 28, 2026 3:36am

Request Review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for AMD MI300 and MI350 series GPUs to the Hy3-preview model, updating hardware metadata and providing comprehensive installation and deployment instructions. The review feedback identifies a path mismatch in the installation script that would prevent the environment from being set up correctly and suggests improvements for consistency, such as using the official model ID in examples and standardizing the CLI flag format for speculative configuration.

Comment thread models/tencent/Hy3-preview.yaml Outdated
Comment on lines +82 to +83
-e PYTHONPATH=/work/build/vllm rocm/vllm-dev:nightly bash
git clone -b feature/support_hy_v3 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PYTHONPATH environment variable is set to /work/build/vllm, but the git clone command on the following line creates the repository at /work/vllm. This mismatch will prevent the workaround for the /app/vllm namespace conflict from working correctly. Please ensure the paths are consistent (e.g., by changing the PYTHONPATH to /work/vllm).

          -e PYTHONPATH=/work/vllm rocm/vllm-dev:nightly bash
        git clone -b feature/support_hy_v3 \

Comment thread models/tencent/Hy3-preview.yaml Outdated
export VLLM_ROCM_USE_AITER_RMSNORM=1
export VLLM_ROCM_USE_AITER_LINEAR=1

vllm serve /path/to/Hy3-preview \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the NVIDIA section and to provide a better user experience, consider using the Hugging Face model ID tencent/Hy3-preview instead of the placeholder path /path/to/Hy3-preview.

  vllm serve tencent/Hy3-preview \

Comment thread models/tencent/Hy3-preview.yaml Outdated
MTP (recommended on AMD for lower latency, same flags as the NVIDIA path):

```bash
vllm serve /path/to/Hy3-preview \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using the model ID tencent/Hy3-preview here as well for consistency and ease of use.

  vllm serve tencent/Hy3-preview \

Comment thread models/tencent/Hy3-preview.yaml Outdated
```bash
vllm serve /path/to/Hy3-preview \
--tensor-parallel-size 8 \
--speculative-config '{"method":"mtp","num_speculative_tokens":1}' \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AMD section uses a JSON string for --speculative-config, while the NVIDIA section (lines 208-209) uses the dot-notation (--speculative-config.method). For consistency across the guide, it is recommended to use the same format.

    --speculative-config.method mtp \
    --speculative-config.num_speculative_tokens 1 \

andyluo7 added a commit to andyluo7/recipes that referenced this pull request Apr 23, 2026
- Fix PYTHONPATH path mismatch: clone target is /work/vllm (not
  /work/build/vllm), so PYTHONPATH must point at /work/vllm to make
  the editable install actually shadow the empty /app/vllm namespace.
- Use the Hugging Face model id 'tencent/Hy3-preview' in both AMD
  serve commands instead of '/path/to/Hy3-preview', matching the
  style of the existing NVIDIA section.
- Switch the AMD MTP example to the dot-notation form
  (--speculative-config.method mtp / --speculative-config.num_speculative_tokens 1)
  to match the NVIDIA section's format.

Refs: vllm-project#368 (review)
@andyluo7
Copy link
Copy Markdown
Contributor Author

Thanks @gemini-code-assist! Addressed all four comments in commit b7f1a48:

  1. PYTHONPATH mismatch (high) — fixed: changed PYTHONPATH=/work/build/vllmPYTHONPATH=/work/vllm to match the actual clone path.
  2. Use HF model id (medium, AMD no-MTP block) — fixed: replaced /path/to/Hy3-preview with tencent/Hy3-preview.
  3. Use HF model id (medium, AMD MTP block) — fixed: same as above.
  4. Speculative-config format consistency (medium) — fixed: switched the AMD MTP example to the dot-notation form (--speculative-config.method mtp / --speculative-config.num_speculative_tokens 1) to match the NVIDIA section.

node scripts/build-recipes-api.mjs still passes: ✓ JSON API: 78 models, 8 strategies.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the update, @andyluo7. The changes in commit b7f1a48 correctly address the identified issues, and the consistency in the speculative-config format is appreciated. Since the node scripts/build-recipes-api.mjs check still passes, this looks good to go.

@esmeetu
Copy link
Copy Markdown
Member

esmeetu commented Apr 28, 2026

@andyluo7 LGTM! Can you signoff your commits?

Tencent Hy3-preview works on AMD ROCm via vLLM PR #40681
(stevenkuang-tencent/vllm@feature/support_hy_v3). End-to-end
validated on a single 8xMI300X (gfx942) node and an 8xMI355X
(gfx950) node with TP=8, BF16, both with and without MTP
speculative decoding. MI325X and MI350X are listed as verified by
hardware parity (gfx942 / gfx950 respectively); the same image and
flags apply.

Changes:

  meta.hardware:
    + mi300x: verified
    + mi325x: verified
    + mi350x: verified
    + mi355x: verified

  meta.performance_headline: extended to mention AMD platforms.

  hardware_overrides.amd:
    install_note explaining that until PR #40681 merges, AMD users
    must build vLLM editable from the PR branch into the published
    rocm/vllm-dev:nightly image. Includes the canonical reproducer
    (docker run + pip install) and the PYTHONPATH workaround for the
    /app/vllm namespace conflict in the base image.
    extra_env enables the AITER fast paths used during validation:
      VLLM_ROCM_USE_AITER=1
      VLLM_ROCM_USE_AITER_MOE=1
      VLLM_ROCM_USE_AITER_MHA=1
      VLLM_ROCM_USE_AITER_RMSNORM=1
      VLLM_ROCM_USE_AITER_LINEAR=1

  guide:
    Adds a 'Serving on 8xAMD MI300X / MI325X / MI350X / MI355X'
    section with the standalone serve commands (with and without
    MTP). The existing NVIDIA section is preserved unchanged.

Refs: vllm-project/vllm#40681

Validated with: node scripts/build-recipes-api.mjs
Result: '✓ JSON API: 78 models, 8 strategies' with no errors.

Signed-off-by: Andy Luo <andy.linluo@gmail.com>
- Fix PYTHONPATH path mismatch: clone target is /work/vllm (not
  /work/build/vllm), so PYTHONPATH must point at /work/vllm to make
  the editable install actually shadow the empty /app/vllm namespace.
- Use the Hugging Face model id 'tencent/Hy3-preview' in both AMD
  serve commands instead of '/path/to/Hy3-preview', matching the
  style of the existing NVIDIA section.
- Switch the AMD MTP example to the dot-notation form
  (--speculative-config.method mtp / --speculative-config.num_speculative_tokens 1)
  to match the NVIDIA section's format.

Refs: vllm-project#368 (review)
Signed-off-by: Andy Luo <andy.linluo@gmail.com>
@andyluo7
Copy link
Copy Markdown
Contributor Author

Done — signed off both commits and force-pushed (b7f1a48..cd7b6da). Thanks for the review!

@andyluo7 andyluo7 force-pushed the add-amd-mi300x-mi355x-hy3-preview branch from b7f1a48 to cd7b6da Compare April 28, 2026 03:35
@esmeetu esmeetu merged commit e033d7c into vllm-project:main Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants