[Hy3-preview] Add AMD MI300X/MI325X/MI350X/MI355X support#368
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request adds support for AMD MI300 and MI350 series GPUs to the Hy3-preview model, updating hardware metadata and providing comprehensive installation and deployment instructions. The review feedback identifies a path mismatch in the installation script that would prevent the environment from being set up correctly and suggests improvements for consistency, such as using the official model ID in examples and standardizing the CLI flag format for speculative configuration.
| -e PYTHONPATH=/work/build/vllm rocm/vllm-dev:nightly bash | ||
| git clone -b feature/support_hy_v3 \ |
There was a problem hiding this comment.
The PYTHONPATH environment variable is set to /work/build/vllm, but the git clone command on the following line creates the repository at /work/vllm. This mismatch will prevent the workaround for the /app/vllm namespace conflict from working correctly. Please ensure the paths are consistent (e.g., by changing the PYTHONPATH to /work/vllm).
-e PYTHONPATH=/work/vllm rocm/vllm-dev:nightly bash
git clone -b feature/support_hy_v3 \| export VLLM_ROCM_USE_AITER_RMSNORM=1 | ||
| export VLLM_ROCM_USE_AITER_LINEAR=1 | ||
|
|
||
| vllm serve /path/to/Hy3-preview \ |
| MTP (recommended on AMD for lower latency, same flags as the NVIDIA path): | ||
|
|
||
| ```bash | ||
| vllm serve /path/to/Hy3-preview \ |
| ```bash | ||
| vllm serve /path/to/Hy3-preview \ | ||
| --tensor-parallel-size 8 \ | ||
| --speculative-config '{"method":"mtp","num_speculative_tokens":1}' \ |
There was a problem hiding this comment.
The AMD section uses a JSON string for --speculative-config, while the NVIDIA section (lines 208-209) uses the dot-notation (--speculative-config.method). For consistency across the guide, it is recommended to use the same format.
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \- Fix PYTHONPATH path mismatch: clone target is /work/vllm (not /work/build/vllm), so PYTHONPATH must point at /work/vllm to make the editable install actually shadow the empty /app/vllm namespace. - Use the Hugging Face model id 'tencent/Hy3-preview' in both AMD serve commands instead of '/path/to/Hy3-preview', matching the style of the existing NVIDIA section. - Switch the AMD MTP example to the dot-notation form (--speculative-config.method mtp / --speculative-config.num_speculative_tokens 1) to match the NVIDIA section's format. Refs: vllm-project#368 (review)
|
Thanks @gemini-code-assist! Addressed all four comments in commit b7f1a48:
|
|
Thanks for the update, @andyluo7. The changes in commit |
|
@andyluo7 LGTM! Can you signoff your commits? |
Tencent Hy3-preview works on AMD ROCm via vLLM PR #40681
(stevenkuang-tencent/vllm@feature/support_hy_v3). End-to-end
validated on a single 8xMI300X (gfx942) node and an 8xMI355X
(gfx950) node with TP=8, BF16, both with and without MTP
speculative decoding. MI325X and MI350X are listed as verified by
hardware parity (gfx942 / gfx950 respectively); the same image and
flags apply.
Changes:
meta.hardware:
+ mi300x: verified
+ mi325x: verified
+ mi350x: verified
+ mi355x: verified
meta.performance_headline: extended to mention AMD platforms.
hardware_overrides.amd:
install_note explaining that until PR #40681 merges, AMD users
must build vLLM editable from the PR branch into the published
rocm/vllm-dev:nightly image. Includes the canonical reproducer
(docker run + pip install) and the PYTHONPATH workaround for the
/app/vllm namespace conflict in the base image.
extra_env enables the AITER fast paths used during validation:
VLLM_ROCM_USE_AITER=1
VLLM_ROCM_USE_AITER_MOE=1
VLLM_ROCM_USE_AITER_MHA=1
VLLM_ROCM_USE_AITER_RMSNORM=1
VLLM_ROCM_USE_AITER_LINEAR=1
guide:
Adds a 'Serving on 8xAMD MI300X / MI325X / MI350X / MI355X'
section with the standalone serve commands (with and without
MTP). The existing NVIDIA section is preserved unchanged.
Refs: vllm-project/vllm#40681
Validated with: node scripts/build-recipes-api.mjs
Result: '✓ JSON API: 78 models, 8 strategies' with no errors.
Signed-off-by: Andy Luo <andy.linluo@gmail.com>
- Fix PYTHONPATH path mismatch: clone target is /work/vllm (not /work/build/vllm), so PYTHONPATH must point at /work/vllm to make the editable install actually shadow the empty /app/vllm namespace. - Use the Hugging Face model id 'tencent/Hy3-preview' in both AMD serve commands instead of '/path/to/Hy3-preview', matching the style of the existing NVIDIA section. - Switch the AMD MTP example to the dot-notation form (--speculative-config.method mtp / --speculative-config.num_speculative_tokens 1) to match the NVIDIA section's format. Refs: vllm-project#368 (review) Signed-off-by: Andy Luo <andy.linluo@gmail.com>
|
Done — signed off both commits and force-pushed ( |
b7f1a48 to
cd7b6da
Compare
Summary
Adds AMD MI300X / MI325X / MI350X / MI355X to the verified hardware list for the
tencent/Hy3-previewrecipe.End-to-end validated on a single 8×MI300X (gfx942) node (SVR08) and an 8×MI355X (gfx950) node (Tensorwave
mia1-p01-g07) with TP=8, BF16, both with and without MTP speculative decoding. MI325X (gfx942) and MI350X (gfx950) are listed as verified by hardware parity; the same image, build, and flags apply to those GPUs.Changes
models/tencent/Hy3-preview.yamlonly. No code changes elsewhere.meta.hardware: addmi300x: verified,mi325x: verified,mi350x: verified,mi355x: verified.meta.performance_headline: extended to mention AMD platforms.hardware_overrides.amd:install_noteexplaining that until vLLM PR #40681 (Hy3-preview model code) merges, AMD users must build vLLM editable from the PR branch into the publishedrocm/vllm-dev:nightlyimage. Includes the canonical reproducer (docker run+pip install) and thePYTHONPATH=/work/build/vllmworkaround for the/app/vllmnamespace conflict in the base image (silently breaksfrom vllm import SamplingParamsin subprocesses otherwise).extra_envenables the AITER fast paths used during validation:VLLM_ROCM_USE_AITER=1,VLLM_ROCM_USE_AITER_MOE=1,VLLM_ROCM_USE_AITER_MHA=1,VLLM_ROCM_USE_AITER_RMSNORM=1,VLLM_ROCM_USE_AITER_LINEAR=1.guide: adds a Serving on 8×AMD MI300X / MI325X / MI350X / MI355X section with both no-MTP and MTP command examples. The existing NVIDIA section is preserved unchanged.Validation
YAML parses cleanly:
Refs
models/tencent/Hunyuan-A13B-Instruct.yaml.