Skip to content

[355_wip] triton fusion optimized fused_shared_experts#741

Merged
dllehr-amd merged 3 commits into355_wipfrom
shaoclee/355_wip_ds_fusion_1010
Oct 22, 2025
Merged

[355_wip] triton fusion optimized fused_shared_experts#741
dllehr-amd merged 3 commits into355_wipfrom
shaoclee/355_wip_ds_fusion_1010

Conversation

@k50112113
Copy link
Copy Markdown

@k50112113 k50112113 commented Oct 17, 2025

This PR includes:

  1. eliminate casting before topk kernel for DSV3 (does not require any AITER changes)
  2. fused shared experts: add fused_gemm_a8w8_blockscale_a16w16 and fused_reduce_act_mul_fp8_group_quant for DSV3 from AITER, VLLM_ROCM_USE_AITER_TRITON_FUSED_SHARED_EXPERTS env var is used and it is set to default = True

AITER side:
PRed to 355_wip ROCm/aiter#1217
PRed to 355_wip_triton ROCm/aiter#1218

add VLLM_ROCM_USE_AITER_TRITON_FUSED_SHARED_EXPERTS
Copy link
Copy Markdown
Collaborator

@dllehr-amd dllehr-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thanks @k50112113

@dllehr-amd dllehr-amd merged commit 2e2bb33 into 355_wip Oct 22, 2025
0 of 2 checks passed
@gshtras gshtras deleted the shaoclee/355_wip_ds_fusion_1010 branch January 16, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants