Skip to content

Gather static input buffers for cuda graph#13676

Merged
cctry merged 4 commits intomainfrom
shiyang/graph_input
Nov 22, 2025
Merged

Gather static input buffers for cuda graph#13676
cctry merged 4 commits intomainfrom
shiyang/graph_input

Conversation

@cctry
Copy link
Collaborator

@cctry cctry commented Nov 20, 2025

Motivation

To combat with data race between scheduler and model runner, the effective approach iis to fully decouple the two components by staging all dynamic model inputs in pre-allocated static buffers. This transforms the interaction between scheduler and model runner into SPSC pattern.

As the first step, this PR is to organize the static buffers needed for forward.
Note that this is not the complete list. Some tensors such as req_to_token are static and captured in graph.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants