Skip to content

Conversation

@vllmellm
Copy link
Contributor

@vllmellm vllmellm commented Sep 9, 2025

Purpose

This PR introduces _aiter_ops.py as proposed in the RFC here. The aiter_ops namespace provides several key benefits:

  • Centralized kernel registration: Ensures that kernels from the aiter package are properly registered

  • Environment availability checks: Encapsulates aiter support detection and environment compatibility validation

  • Reduced code duplication: Eliminates the need for duplicate helper functions, namely checking device compability and environment varible enability checks across different vLLM modules.

This implementation establishes the foundation for future refactoring efforts, where existing kernels throughout the vLLM repository will be migrated to use this unified approach for better maintainability and consistency.

This PR uses 5ee37dce commit from aiter repo.

Test Plan

Test models that are afftected by this change, using lm_eval on gsm8k dataset.

environment setting

Step 1: run vllm serve

VLLM_USE_V1=1 \ VLLM_ROCM_USE_AITER=1 \ SAFETENSORS_FAST_GPU=1 \ VLLM_DISABLE_COMPILE_CACHE=1 \ vllm serve $MODEL_NAME --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE", "cudagraph_capture_sizes": [1,2,4,8,16,24,32]}' --trust-remote-code --swap-space 16 --distributed-executor-backend mp

Step 2: run lm_eval

lm_eval --model local-completions --tasks gsm8k --model_args model=$MODEL_NAME,base_url=http://localhost:8000/v1/completions --trust_remote_code --num_fewshot 5 --batch_size 256

Test Results

deepseek-ai/DeepSeek-V3 -tp 8 --block-size 1 --max-model-len 32768 --max_seq_len_to_capture 32768

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9500 ± 0.006
strict-match 5 exact_match 0.9507 ± 0.006

meta-llama/Llama-4-Scout-17B-16E-Instruct -tp 8 --max-model-len 8192

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9174 ± 0.0076
strict-match 5 exact_match 0.8999 ± 0.0083

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len 8192

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9272 ± 0.0072
strict-match 5 exact_match 0.9287 ± 0.0071

serve Qwen/Qwen3-235B-A22B-FP8 -tp 4

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.8704 ± 0.0093
strict-match 5 exact_match 0.8635 ± 0.0095

mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.6452 ± 0.0132
strict-match 5 exact_match 0.6422 ± 0.0132

mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2 --quantization fp8

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.6058 ± 0.0135
strict-match 5 exact_match 0.6027 ± 0.0135

meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9386 ± 0.0066
strict-match 5 exact_match 0.9128 ± 0.0078

meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2 --quantization fp8

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9424 ± 0.0064
strict-match 5 exact_match 0.9128 ± 0.0078

amd/Llama-3.3-70B-Instruct-FP8-KV -tp 2

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9378 ± 0.0067
strict-match 5 exact_match 0.9030 ± 0.0082

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added rocm Related to AMD ROCm v1 labels Sep 9, 2025
Signed-off-by: vllmellm <[email protected]>
@mergify
Copy link

mergify bot commented Sep 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 10, 2025
@mergify mergify bot removed the needs-rebase label Sep 12, 2025
@vllmellm vllmellm marked this pull request as ready for review September 12, 2025 21:10
@vllmellm
Copy link
Contributor Author

vllmellm commented Nov 3, 2025

@ProExpertProg @SageMoore Thank you for reviewing this PR so far. resolved the merge conflicts and ready for another review round. I appreciate your comments and if possible merge approval.

@mergify
Copy link

mergify bot commented Nov 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 5, 2025
@mergify mergify bot removed the needs-rebase label Nov 5, 2025
@mergify
Copy link

mergify bot commented Nov 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 7, 2025
@mergify mergify bot removed the needs-rebase label Nov 7, 2025
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 7, 2025
@tjtanaa tjtanaa enabled auto-merge (squash) November 9, 2025 16:36
@tjtanaa tjtanaa merged commit f080a83 into vllm-project:main Nov 10, 2025
67 checks passed
@mgoin
Copy link
Member

mgoin commented Nov 10, 2025

@tjtanaa @vllmellm it seems this PR introduced mandatory imports of rocm.py, showing

WARNING 11-10 22:31:06 [rocm.py:39] Failed to import from amdsmi with ModuleNotFoundError("No module named 'amdsmi'")
WARNING 11-10 22:31:06 [rocm.py:50] Failed to import from vllm._rocm_C with ModuleNotFoundError("No module named 'vllm._rocm_C'")

by default on nvidia devices. Can we move these to be lazy imports? I opened #28428 for fp8_utils.py but then saw that this is applied to more files

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
…`_custom_ops` and `_ipex_ops` (vllm-project#24490)

Signed-off-by: vllmellm <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…`_custom_ops` and `_ipex_ops` (vllm-project#24490)

Signed-off-by: vllmellm <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants