Skip to content

[ROCm] Pass moe_buf to AITER to eliminate MoE output copy#39393

Closed
tpopp wants to merge 1 commit intovllm-project:mainfrom
tpopp:moe-buf-passthrough
Closed

[ROCm] Pass moe_buf to AITER to eliminate MoE output copy#39393
tpopp wants to merge 1 commit intovllm-project:mainfrom
tpopp:moe-buf-passthrough

Conversation

@tpopp
Copy link
Copy Markdown
Contributor

@tpopp tpopp commented Apr 9, 2026

Summary

  • Plumb moe_buf and moe_sorting_dispatch_policy through the vLLM AITER fused MoE interface so the kernel writes directly into the caller's output buffer, avoiding a device-to-device copy on every forward pass
  • Add VLLM_ROCM_AITER_MOE_DISPATCH_POLICY environment variable to control AITER MoE sorting dispatch policy
  • Depends on AITER-side change: Allow callers to pass pre-allocated moe_buf to avoid output copy ROCm/aiter#2663

Test plan

  • Verify ROCm MoE models (e.g. Mixtral, Qwen3-Next) produce correct output
  • Benchmark to confirm reduced latency from eliminated copy
  • Verify non-ROCm paths are unaffected (new params default to no-op values)

Made with Cursor

Plumb `moe_buf` and `moe_sorting_dispatch_policy` through the
vLLM AITER fused MoE interface so the kernel writes directly
into the caller's output buffer. This avoids a device-to-device
copy of the full MoE output on every forward pass.

Also adds the `VLLM_ROCM_AITER_MOE_DISPATCH_POLICY` environment
variable to control the AITER MoE sorting dispatch policy.

Made-with: Cursor

Signed-off-by: Tres Popp <tres.popp@amd.com>
@tpopp tpopp requested a review from tjtanaa as a code owner April 9, 2026 07:17
@mergify mergify Bot added the rocm Related to AMD ROCm label Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 9, 2026
@tpopp tpopp closed this Apr 9, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 9, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant