Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR)#26718
Conversation
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
There was a problem hiding this comment.
Code Review
This pull request adds a new test definition file for the AMD backend. However, the file appears to be a direct copy from a configuration for an NVIDIA backend. It contains numerous references to NVIDIA-specific technologies, commands, and GPU architectures such as CUDA, CUTLASS, NCCL, nvidia-smi, A100, H200, and Blackwell. These are incorrect for an AMD environment and will likely cause test failures or incorrect test execution. The file needs a thorough review to replace all NVIDIA-specific elements with their AMD equivalents (e.g., using rocm-smi instead of nvidia-smi, ROCR_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES, and targeting AMD GPUs).
| - tests/basic_correctness/test_cumem.py | ||
| commands: | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -v -s basic_correctness/test_cumem.py |
| - pytest -v -s v1/engine/test_engine_core_client.py::test_kv_cache_events_dp | ||
| - pytest -v -s distributed/test_utils.py | ||
| - pytest -v -s compile/test_basic_correctness.py | ||
| - pytest -v -s distributed/test_pynccl.py |
| - python3 offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048 | ||
| - python3 offline_inference/spec_decode.py --test --method eagle3 --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048 | ||
|
|
||
| - label: Platform Tests (CUDA) # 4min |
| agent_pool: mi325_8 | ||
| # grade: Blocking | ||
| source_file_dependencies: | ||
| - csrc/quantization/cutlass_w8a8/moe/ |
| # since torchao nightly is only compatible with torch nightly currently | ||
| # https://github.com/pytorch/ao/issues/2919, we'll have to skip new torchao tests for now | ||
| # we can only upgrade after this is resolved | ||
| - pip install --pre torchao==0.13.0.dev20250814 --index-url https://download.pytorch.org/whl/nightly/cu128 |
There was a problem hiding this comment.
| - label: Blackwell Test # 38 min | ||
| timeout_in_minutes: 60 | ||
| working_dir: "/vllm-workspace/" | ||
| gpu: b200 | ||
| # optional: true | ||
| source_file_dependencies: | ||
| - csrc/quantization/fp4/ | ||
| - csrc/attention/mla/ | ||
| - csrc/quantization/cutlass_w8a8/moe/ | ||
| - vllm/model_executor/layers/fused_moe/cutlass_moe.py | ||
| - vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py | ||
| - vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py | ||
| - vllm/model_executor/layers/quantization/utils/flashinfer_utils.py | ||
| - vllm/v1/attention/backends/flashinfer.py | ||
| - vllm/compilation/fusion.py | ||
| - vllm/compilation/fusion_attn.py | ||
| commands: | ||
| - nvidia-smi | ||
| - python3 examples/offline_inference/basic/chat.py | ||
| # Attention | ||
| # num_heads2 broken by https://github.com/flashinfer-ai/flashinfer/issues/1353 | ||
| - pytest -v -s tests/kernels/attention/test_flashinfer.py -k 'not num_heads2' | ||
| - pytest -v -s tests/kernels/attention/test_flashinfer_trtllm_attention.py | ||
| - pytest -v -s tests/kernels/attention/test_cutlass_mla_decode.py | ||
| - pytest -v -s tests/kernels/attention/test_flashinfer_mla_decode.py | ||
| # Quantization | ||
| - pytest -v -s tests/kernels/quantization/test_cutlass_scaled_mm.py -k 'fp8' | ||
| - pytest -v -s tests/kernels/quantization/test_nvfp4_quant.py | ||
| - pytest -v -s tests/kernels/quantization/test_silu_mul_nvfp4_quant.py | ||
| - pytest -v -s tests/kernels/quantization/test_nvfp4_scaled_mm.py | ||
| - pytest -v -s tests/kernels/quantization/test_flashinfer_scaled_mm.py | ||
| - pytest -v -s tests/kernels/quantization/test_flashinfer_nvfp4_scaled_mm.py | ||
| - pytest -v -s tests/kernels/moe/test_nvfp4_moe.py | ||
| - pytest -v -s tests/kernels/moe/test_ocp_mx_moe.py | ||
| # Fusion | ||
| - pytest -v -s tests/compile/test_fusion_all_reduce.py | ||
| - pytest -v -s tests/compile/test_fusion_attn.py::test_attention_quant_pattern | ||
| - pytest -v -s tests/kernels/moe/test_flashinfer.py | ||
| - pytest -v -s tests/compile/test_silu_mul_quant_fusion.py |
There was a problem hiding this comment.
This entire test step is labeled Blackwell Test and configured to run on a b200 GPU, which are NVIDIA's architecture and hardware. It also uses nvidia-smi and tests for NVIDIA-specific features like CUTLASS and TRTLLM. This entire block is irrelevant for an AMD backend and should be removed or replaced with AMD-equivalent tests.
| - pytest -v -s ./compile/test_wrapper.py | ||
| - VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep 'Same node test passed' | ||
| - pytest -v -s distributed/test_sequence_parallel.py | ||
| - CUDA_VISIBLE_DEVICES=0,1 pytest -v -s v1/shutdown |
| ##### A100 test ##### | ||
|
|
||
| - label: Distributed Tests (A100) # optional | ||
| gpu: a100 | ||
| optional: true | ||
| num_gpus: 4 | ||
| source_file_dependencies: | ||
| - vllm/ | ||
| commands: | ||
| # NOTE: don't test llama model here, it seems hf implementation is buggy | ||
| # see https://github.com/vllm-project/vllm/pull/5689 for details | ||
| - pytest -v -s distributed/test_custom_all_reduce.py | ||
| - torchrun --nproc_per_node=2 distributed/test_ca_buffer_sharing.py | ||
| - TARGET_TEST_SUITE=A100 pytest basic_correctness/ -v -s -m 'distributed(num_gpus=2)' | ||
| - pytest -v -s -x lora/test_mixtral.py | ||
|
|
||
| - label: LM Eval Large Models # optional | ||
| gpu: a100 | ||
| optional: true | ||
| num_gpus: 4 | ||
| working_dir: "/vllm-workspace/.buildkite/lm-eval-harness" | ||
| source_file_dependencies: | ||
| - csrc/ | ||
| - vllm/model_executor/layers/quantization | ||
| commands: | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -s -v test_lm_eval_correctness.py --config-list-file=configs/models-large.txt --tp-size=4 |
| ##### H200 test ##### | ||
| - label: Distrubted Tests (H200) # optional | ||
| gpu: h200 | ||
| optional: true | ||
| working_dir: "/vllm-workspace/" | ||
| num_gpus: 2 | ||
| commands: | ||
| - pytest -v -s tests/distributed/test_context_parallel.py | ||
| - CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048 |
| ##### B200 test ##### | ||
| - label: Distributed Tests (B200) # optional | ||
| gpu: b200 | ||
| optional: true | ||
| working_dir: "/vllm-workspace/" | ||
| num_gpus: 2 | ||
| commands: | ||
| - pytest -v -s tests/distributed/test_context_parallel.py | ||
| - pytest -v -s tests/distributed/test_nccl_symm_mem_allreduce.py |
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Adding the test-amd.yaml for test definitions for the AMD backend.
Signed-off-by: Alexei V. Ivanov alexei.ivanov@amd.com