Fix fused_scaled_matmul_reduce_scatter callsite #26506

angelayi · 2025-10-09T16:25:19Z

Purpose

The signature of symm_mem::fused_scaled_matmul_reduce_scatter was changed in pytorch/pytorch#149247 to additionally have the arguments output_shape, orig_scatter_dim, scatter_dim_after_maybe_reshape. vllm's async_tp pass that has a replacement graph with this op but uses the wrong signature (https://github.com/vllm-project/vllm/blob/main/vllm/compilation/collective_fusion.py#L172), causing an error.

Although there is a test case, looks like it's not actually run on CI?

Test Plan

pytest tests/compile/test_async_tp.py -k test_async_tp_pass_replace locally passes

cc @ProExpertProg @cascade812

Signed-off-by: angelayi <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a breaking change in the fused_scaled_matmul_reduce_scatter op signature from a newer PyTorch version by adding version-conditional logic. My main feedback is to refactor the duplicated code blocks into a shared helper function to improve maintainability.

gemini-code-assist · 2025-10-09T16:26:15Z

vllm/compilation/collective_fusion.py

+            if is_torch_equal_or_newer("2.8.0.dev"):
+                # TODO: This fails in the dynamic shapes case because the shapes
+                # get specialized
+                output_shape = (
+                    torch.ops.aten.sym_size.int(input, 0),
+                    torch.ops.aten.sym_size.int(mat2, 1),
+                )
+                gemm_rs = torch.ops.symm_mem.fused_scaled_matmul_reduce_scatter(
+                    input,
+                    mat2,
+                    scale_a,
+                    scale_b,
+                    "avg",
+                    orig_scatter_dim=0,
+                    scatter_dim_after_maybe_reshape=0,
+                    output_shape=output_shape,
+                    out_dtype=self.dtype,
+                    group_name=self.tp.device_group.group_name,
+                )
+            else:
+                # For older versions, use the old signature
+                gemm_rs = torch.ops.symm_mem.fused_scaled_matmul_reduce_scatter(
+                    input,
+                    mat2,
+                    scale_a,
+                    scale_b,
+                    "avg",
+                    scatter_dim=0,
+                    out_dtype=self.dtype,
+                    group_name=self.tp.device_group.group_name,
+                )

            return gemm_rs


This version-checking logic for fused_scaled_matmul_reduce_scatter is duplicated in CutlassScaledMMReduceScatterPattern.replacement (lines 320-353). To improve maintainability and reduce redundancy, consider extracting this logic into a shared helper function or method. This would centralize the call to the op, making future updates easier.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-09T16:28:15Z

vllm/compilation/collective_fusion.py

+            if is_torch_equal_or_newer("2.8.0.dev"):
+                # TODO: This fails in the dynamic shapes case because the shapes
+                # get specialized
+                output_shape = (
+                    torch.ops.aten.sym_size.int(input, 0),
+                    torch.ops.aten.sym_size.int(mat2, 1),
+                )
+                gemm_rs = torch.ops.symm_mem.fused_scaled_matmul_reduce_scatter(
+                    input,
+                    mat2,
+                    scale_a,
+                    scale_b,
+                    "avg",
+                    orig_scatter_dim=0,
+                    scatter_dim_after_maybe_reshape=0,
+                    output_shape=output_shape,
+                    out_dtype=self.dtype,


Compute reduce-scatter output_shape per shard

When targeting the new fused_scaled_matmul_reduce_scatter signature, output_shape is derived directly from input and mat2 without accounting for the tensor-parallel world size. In the unfused graph this matmul output is immediately reduced along dim‑0, so each rank ultimately sees a first dimension of scaled_mm.size(0) // tp_world_size. Passing the pre‑scatter size (input.shape[0]) will request the wrong shape from the fused op and either misallocate or fail once torch 2.8 executes this branch. output_shape[0] should reflect the reduce-scatter result (divide by self.tp_size).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-09T16:28:15Z

vllm/compilation/collective_fusion.py

+            if is_torch_equal_or_newer("2.8.0.dev"):
+                # TODO: This fails in the dynamic shapes case because the shapes
+                # get specialized
+                output_shape = (
+                    torch.ops.aten.sym_size.int(input, 0),
+                    torch.ops.aten.sym_size.int(mat2, 1),
+                )
+
+                gemm_rs = torch.ops.symm_mem.fused_scaled_matmul_reduce_scatter(
+                    input,
+                    mat2,
+                    scale_a,
+                    scale_b,
+                    "avg",
+                    orig_scatter_dim=0,
+                    scatter_dim_after_maybe_reshape=0,
+                    output_shape=output_shape,
+                    out_dtype=self.dtype,


Cutlass fused path also uses full matmul size for output_shape

The cutlass variant builds output_shape with the original matmul dimensions even though the fused operator returns the reduce-scatter shard. On a torch version ≥2.8 this means output_shape[0] stays the full input length instead of the per‑rank length (input.shape[0] // tp_world_size), leading to shape mismatches or execution failures when the new signature is exercised. The output shape passed to the op must match the size after scattering, not before.

Useful? React with 👍 / 👎.

angelayi · 2025-10-09T17:36:29Z

closing as #26038 already fixes it

Fix fused_scaled_matmul_reduce_scatter callsite

976d00b

Signed-off-by: angelayi <[email protected]>

angelayi requested review from ProExpertProg, youkaichao and zou3519 as code owners October 9, 2025 16:25

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 9, 2025

View reviewed changes

angelayi mentioned this pull request Oct 9, 2025

[Bug]: Sequence Parallelism and Async TP disabled by default #25277

Open

4 tasks

angelayi closed this Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix fused_scaled_matmul_reduce_scatter callsite #26506

Fix fused_scaled_matmul_reduce_scatter callsite #26506

Uh oh!

angelayi commented Oct 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Uh oh!

angelayi commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Fix fused_scaled_matmul_reduce_scatter callsite #26506

Fix fused_scaled_matmul_reduce_scatter callsite #26506

Uh oh!

Conversation

angelayi commented Oct 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

angelayi commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

angelayi commented Oct 9, 2025 •

edited by github-actions bot

Loading