-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops
#24490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e the aiter ops enability and support Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
…gics and update unit tests Signed-off-by: vllmellm <[email protected]>
|
@ProExpertProg @SageMoore Thank you for reviewing this PR so far. resolved the merge conflicts and ready for another review round. I appreciate your comments and if possible merge approval. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: vllmellm <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
|
@tjtanaa @vllmellm it seems this PR introduced mandatory imports of rocm.py, showing by default on nvidia devices. Can we move these to be lazy imports? I opened #28428 for fp8_utils.py but then saw that this is applied to more files |
…`_custom_ops` and `_ipex_ops` (vllm-project#24490) Signed-off-by: vllmellm <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…`_custom_ops` and `_ipex_ops` (vllm-project#24490) Signed-off-by: vllmellm <[email protected]> Co-authored-by: Luka Govedič <[email protected]>
Purpose
This PR introduces
_aiter_ops.pyas proposed in the RFC here. Theaiter_opsnamespace provides several key benefits:Centralized kernel registration: Ensures that kernels from the aiter package are properly registered
Environment availability checks: Encapsulates aiter support detection and environment compatibility validation
Reduced code duplication: Eliminates the need for duplicate helper functions, namely checking device compability and environment varible enability checks across different vLLM modules.
This implementation establishes the foundation for future refactoring efforts, where existing kernels throughout the vLLM repository will be migrated to use this unified approach for better maintainability and consistency.
This PR uses 5ee37dce commit from
aiterrepo.Test Plan
Test models that are afftected by this change, using lm_eval on gsm8k dataset.
environment setting
Step 1: run vllm serve
VLLM_USE_V1=1 \ VLLM_ROCM_USE_AITER=1 \ SAFETENSORS_FAST_GPU=1 \ VLLM_DISABLE_COMPILE_CACHE=1 \ vllm serve $MODEL_NAME --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE", "cudagraph_capture_sizes": [1,2,4,8,16,24,32]}' --trust-remote-code --swap-space 16 --distributed-executor-backend mpStep 2: run lm_eval
lm_eval --model local-completions --tasks gsm8k --model_args model=$MODEL_NAME,base_url=http://localhost:8000/v1/completions --trust_remote_code --num_fewshot 5 --batch_size 256Test Results
deepseek-ai/DeepSeek-V3 -tp 8 --block-size 1 --max-model-len 32768 --max_seq_len_to_capture 32768
meta-llama/Llama-4-Scout-17B-16E-Instruct -tp 8 --max-model-len 8192
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len 8192
serve Qwen/Qwen3-235B-A22B-FP8 -tp 4
mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2
mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2 --quantization fp8
meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2
meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2 --quantization fp8
amd/Llama-3.3-70B-Instruct-FP8-KV -tp 2
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.