Skip to content

Breakable Cuda Graph Support for bs > 1#24662

Merged
ispobock merged 4 commits into
sgl-project:mainfrom
Oasis-Git:bcg-fix
May 11, 2026
Merged

Breakable Cuda Graph Support for bs > 1#24662
ispobock merged 4 commits into
sgl-project:mainfrom
Oasis-Git:bcg-fix

Conversation

@Oasis-Git
Copy link
Copy Markdown
Collaborator

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Oasis-Git and others added 2 commits May 8, 2026 04:49
Capture only the inner transformer stack (layer_model) instead of the
outer *ForCausalLM.forward. logits_processor / pooler now runs eagerly
after replay with the live forward_batch, so captured segments are
bs-invariant: their kernel launches only depend on num_tokens, not on
batch_size. Drops the bs=1 reject in can_run; multi-req prefill stays on
graph instead of falling back to eager.

Mechanics:
- Resolve self.layer_model the same way PCG does (patch_model boundary).
- replay() monkey-patches layer_model.forward with a closure that
  replays the captured CUDAGraph and returns the captured hidden_states;
  the outer model.forward then runs logits_processor/pooler eagerly.
- _run_forward calls layer_model.forward directly during capture, so
  it must re-apply @torch.no_grad() (the outer *ForCausalLM.forward
  carried it). Without that, MoE @torch.compile kernels using
  torch.sum(out=...) fail dynamo with "out= doesn't support autograd",
  and mamba state ops spuriously track grad and hang capture.
- Drops the static_seq_lens / static_extend_* / static_req_pool_indices
  / static_orig_seq_lens machinery — they only existed to give the
  in-graph logits_processor stable bs=1 addresses.
- can_run gains an is_target_verify reject (matches PCG); per-reject
  counter + periodic [BCG] replays/rejects log line to verify the fix
  actually keeps prefill on graph under load.

Validated on g294 H100s, mgsm_en 200q, fa3, --mem-fraction-static 0.85,
threads=1 and threads=32:

| Config            | t32 score | t32 tput | t32 replays | t32 rejects |
| qwen3_8b_tp1      | 0.82      | 3448     | 300         | 0           |
| qwen3_30b_a3b_tp2 | 0.965     | 2711     | 300         | 0           |
| nemotronh_8b_tp2  | 0.285     | 3508     | 300         | 0           |

rejects=0 across all three confirms multi-req prefill stays on graph
(vs the prior bs>1 eager-fallback). Scores match prior BCG baselines
within mgsm_en noise (~0.03); throughput on-par or slightly better
(qwen3_30b_a3b_tp2 +4.3% vs prior BCG notes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The replay/can_run_reject counter and periodic [BCG] log line were
instrumentation to verify multi-req prefill actually stays on graph
during validation. Validation passed (rejects=0 across qwen3_8b_tp1,
qwen3_30b_a3b_tp2, nemotronh_8b_tp2 at threads=32) — drop the
instrumentation now that the fix is confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Oasis-Git
Copy link
Copy Markdown
Collaborator Author

/tag-run-ci-label

@github-actions github-actions Bot added the run-ci label May 8, 2026
@Oasis-Git
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

3 similar comments
@Oasis-Git
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@Oasis-Git
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@Oasis-Git
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@ispobock ispobock merged commit 5207f07 into sgl-project:main May 11, 2026
519 of 601 checks passed
@Oasis-Git Oasis-Git deleted the bcg-fix branch May 11, 2026 05:33
if hasattr(language_model, "model")
and hasattr(language_model.model, "layers")
else language_model
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hacky. It is based on string name match.
At least you should raise a warning if the string match failed.

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants