Skip to content

[Bugfix] Fix illegal memory access#12758

Merged
Fridge003 merged 1 commit intosgl-project:mainfrom
elvischenv:elvischenv/fix-illegal-memory-access
Nov 7, 2025
Merged

[Bugfix] Fix illegal memory access#12758
Fridge003 merged 1 commit intosgl-project:mainfrom
elvischenv:elvischenv/fix-illegal-memory-access

Conversation

@elvischenv
Copy link
Copy Markdown
Contributor

@elvischenv elvischenv commented Nov 6, 2025

Motivation

Fixed illegal memory access issue in #12695 and flashinfer-ai/flashinfer#2034
Caused by #12524

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@elvischenv elvischenv force-pushed the elvischenv/fix-illegal-memory-access branch from dbfc6c9 to 12ee626 Compare November 6, 2025 10:31
@elvischenv elvischenv force-pushed the elvischenv/fix-illegal-memory-access branch from 12ee626 to a6936cb Compare November 6, 2025 15:53
@@ -128,7 +128,7 @@ def flashinfer_allreduce_residual_rmsnorm(
residual: torch.Tensor,
weight: torch.Tensor,
eps: float = 1e-6,
max_token_num: int = 2048,
max_token_num: int = 16384,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang issue WAR: Increase max_token_num to allocate a larger workspace

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an assert: assert input_tensor.shape[0] <= max_token_num ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be covered by the code below:

    if input_tensor.shape[0] > max_token_num:
        logger.debug(
            "Input token(%d) is greater than max_token_num(%d), "
            "falling back to standard implementation",
            input_tensor.shape[0],
            max_token_num,
        )
        return None, None

@Fridge003
Copy link
Copy Markdown
Collaborator

@elvischenv
Copy link
Copy Markdown
Contributor Author

elvischenv commented Nov 7, 2025

Failed at GPTOSS Ci test, please have a look
https://github.com/sgl-project/sglang/actions/runs/19149049114/job/54736119046?pr=12758

This is DeepseekV2.

  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/models/deepseek_v2.py", line 793, in forward_normal_dual_stream
    final_hidden_states += shared_output
RuntimeError: The size of tensor a (3584) must match the size of tensor b (7168) at non-singleton dimension 1

This failure seems also related to #12524. cc @merrymercy

@Fridge003
Copy link
Copy Markdown
Collaborator

@elvischenv elvischenv force-pushed the elvischenv/fix-illegal-memory-access branch from 7683443 to 3e35e3a Compare November 7, 2025 01:43
@elvischenv elvischenv force-pushed the elvischenv/fix-illegal-memory-access branch from 3e35e3a to ddff77b Compare November 7, 2025 02:48
work around hanging issue of trtllm_allreduce_fusion

fix correctly
@elvischenv elvischenv force-pushed the elvischenv/fix-illegal-memory-access branch from ddff77b to b008fbe Compare November 7, 2025 04:04
@Fridge003
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 merged commit 1fa788e into sgl-project:main Nov 7, 2025
109 of 120 checks passed
@elvischenv elvischenv deleted the elvischenv/fix-illegal-memory-access branch November 7, 2025 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants