Skip to content

Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR)#26718

Merged
khluu merged 1 commit intovllm-project:mainfrom
Alexei-V-Ivanov-AMD:MAIN_20251013
Oct 14, 2025
Merged

Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR)#26718
khluu merged 1 commit intovllm-project:mainfrom
Alexei-V-Ivanov-AMD:MAIN_20251013

Conversation

@Alexei-V-Ivanov-AMD
Copy link
Copy Markdown
Collaborator

@Alexei-V-Ivanov-AMD Alexei-V-Ivanov-AMD commented Oct 13, 2025

Adding the test-amd.yaml for test definitions for the AMD backend.

Signed-off-by: Alexei V. Ivanov alexei.ivanov@amd.com

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
@mergify mergify bot added ci/build rocm Related to AMD ROCm labels Oct 13, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new test definition file for the AMD backend. However, the file appears to be a direct copy from a configuration for an NVIDIA backend. It contains numerous references to NVIDIA-specific technologies, commands, and GPU architectures such as CUDA, CUTLASS, NCCL, nvidia-smi, A100, H200, and Blackwell. These are incorrect for an AMD environment and will likely cause test failures or incorrect test execution. The file needs a thorough review to replace all NVIDIA-specific elements with their AMD equivalents (e.g., using rocm-smi instead of nvidia-smi, ROCR_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES, and targeting AMD GPUs).

- tests/basic_correctness/test_cumem.py
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s basic_correctness/test_cumem.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The test basic_correctness/test_cumem.py seems to be CUDA-specific, as indicated by the cu prefix in cumem. This test may not be relevant or may fail on an AMD backend.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

- pytest -v -s v1/engine/test_engine_core_client.py::test_kv_cache_events_dp
- pytest -v -s distributed/test_utils.py
- pytest -v -s compile/test_basic_correctness.py
- pytest -v -s distributed/test_pynccl.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test runs distributed/test_pynccl.py. NCCL is NVIDIA's collective communications library. For AMD, RCCL should be used. This test is likely incorrect for an AMD backend.

- python3 offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048
- python3 offline_inference/spec_decode.py --test --method eagle3 --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048

- label: Platform Tests (CUDA) # 4min
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The label for this test step is Platform Tests (CUDA). This is incorrect for a test file intended for the AMD backend. The label and the associated tests in this step should be updated to be AMD-specific.

agent_pool: mi325_8
# grade: Blocking
source_file_dependencies:
- csrc/quantization/cutlass_w8a8/moe/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The source file dependency csrc/quantization/cutlass_w8a8/moe/ refers to CUTLASS, which is an NVIDIA library for CUDA kernels. This is inappropriate for an AMD backend test configuration.

# since torchao nightly is only compatible with torch nightly currently
# https://github.com/pytorch/ao/issues/2919, we'll have to skip new torchao tests for now
# we can only upgrade after this is resolved
- pip install --pre torchao==0.13.0.dev20250814 --index-url https://download.pytorch.org/whl/nightly/cu128
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This command installs a torchao wheel built for CUDA 12.8 (cu128). This is incorrect for an AMD backend and will fail. You should use a wheel built for ROCm.

  - pip install --pre torchao==<rocm_compatible_version> --index-url https://download.pytorch.org/whl/nightly/rocm<rocm_version>

Comment on lines +924 to +962
- label: Blackwell Test # 38 min
timeout_in_minutes: 60
working_dir: "/vllm-workspace/"
gpu: b200
# optional: true
source_file_dependencies:
- csrc/quantization/fp4/
- csrc/attention/mla/
- csrc/quantization/cutlass_w8a8/moe/
- vllm/model_executor/layers/fused_moe/cutlass_moe.py
- vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py
- vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
- vllm/model_executor/layers/quantization/utils/flashinfer_utils.py
- vllm/v1/attention/backends/flashinfer.py
- vllm/compilation/fusion.py
- vllm/compilation/fusion_attn.py
commands:
- nvidia-smi
- python3 examples/offline_inference/basic/chat.py
# Attention
# num_heads2 broken by https://github.com/flashinfer-ai/flashinfer/issues/1353
- pytest -v -s tests/kernels/attention/test_flashinfer.py -k 'not num_heads2'
- pytest -v -s tests/kernels/attention/test_flashinfer_trtllm_attention.py
- pytest -v -s tests/kernels/attention/test_cutlass_mla_decode.py
- pytest -v -s tests/kernels/attention/test_flashinfer_mla_decode.py
# Quantization
- pytest -v -s tests/kernels/quantization/test_cutlass_scaled_mm.py -k 'fp8'
- pytest -v -s tests/kernels/quantization/test_nvfp4_quant.py
- pytest -v -s tests/kernels/quantization/test_silu_mul_nvfp4_quant.py
- pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py
- pytest -v -s tests/kernels/quantization/test_flashinfer_scaled_mm.py
- pytest -v -s tests/kernels/quantization/test_flashinfer_nvfp4_scaled_mm.py
- pytest -v -s tests/kernels/moe/test_nvfp4_moe.py
- pytest -v -s tests/kernels/moe/test_ocp_mx_moe.py
# Fusion
- pytest -v -s tests/compile/test_fusion_all_reduce.py
- pytest -v -s tests/compile/test_fusion_attn.py::test_attention_quant_pattern
- pytest -v -s tests/kernels/moe/test_flashinfer.py
- pytest -v -s tests/compile/test_silu_mul_quant_fusion.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This entire test step is labeled Blackwell Test and configured to run on a b200 GPU, which are NVIDIA's architecture and hardware. It also uses nvidia-smi and tests for NVIDIA-specific features like CUTLASS and TRTLLM. This entire block is irrelevant for an AMD backend and should be removed or replaced with AMD-equivalent tests.

- pytest -v -s ./compile/test_wrapper.py
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep 'Same node test passed'
- pytest -v -s distributed/test_sequence_parallel.py
- CUDA_VISIBLE_DEVICES=0,1 pytest -v -s v1/shutdown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The environment variable CUDA_VISIBLE_DEVICES is used here, which is specific to NVIDIA GPUs. For an AMD backend, you should use ROCR_VISIBLE_DEVICES or HIP_VISIBLE_DEVICES.

  - ROCR_VISIBLE_DEVICES=0,1 pytest -v -s v1/shutdown

Comment on lines +1204 to +1230
##### A100 test #####

- label: Distributed Tests (A100) # optional
gpu: a100
optional: true
num_gpus: 4
source_file_dependencies:
- vllm/
commands:
# NOTE: don't test llama model here, it seems hf implementation is buggy
# see https://github.com/vllm-project/vllm/pull/5689 for details
- pytest -v -s distributed/test_custom_all_reduce.py
- torchrun --nproc_per_node=2 distributed/test_ca_buffer_sharing.py
- TARGET_TEST_SUITE=A100 pytest basic_correctness/ -v -s -m 'distributed(num_gpus=2)'
- pytest -v -s -x lora/test_mixtral.py

- label: LM Eval Large Models # optional
gpu: a100
optional: true
num_gpus: 4
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v test_lm_eval_correctness.py --config-list-file=configs/models-large.txt --tp-size=4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This entire section is for testing on NVIDIA A100 GPUs. It specifies gpu: a100 and includes tests that might be NVIDIA-specific. This section should be adapted for AMD GPUs or removed if not applicable.

Comment on lines +1232 to +1240
##### H200 test #####
- label: Distrubted Tests (H200) # optional
gpu: h200
optional: true
working_dir: "/vllm-workspace/"
num_gpus: 2
commands:
- pytest -v -s tests/distributed/test_context_parallel.py
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test section is for NVIDIA H200 GPUs. It specifies gpu: h200 and uses CUDA_VISIBLE_DEVICES. This is incorrect for an AMD test file.

Comment on lines +1242 to +1250
##### B200 test #####
- label: Distributed Tests (B200) # optional
gpu: b200
optional: true
working_dir: "/vllm-workspace/"
num_gpus: 2
commands:
- pytest -v -s tests/distributed/test_context_parallel.py
- pytest -v -s tests/distributed/test_nccl_symm_mem_allreduce.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test section is for NVIDIA B200 GPUs. It specifies gpu: b200 and includes a test for test_nccl_symm_mem_allreduce.py, which is NVIDIA-specific (NCCL). This is incorrect for an AMD test file.

@khluu khluu enabled auto-merge (squash) October 13, 2025 21:21
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 13, 2025
@khluu khluu merged commit d3cc842 into vllm-project:main Oct 14, 2025
22 of 23 checks passed
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…d. (alternative PR) (vllm-project#26718)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants