[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc by b-mu · Pull Request #37914 · vllm-project/vllm

b-mu · 2026-03-23T18:25:07Z

Summary

Add a new "Encoder (ViT) CUDA Graphs" section to docs/design/cuda_graphs.md, documenting the encoder CUDA graph feature from [Feature] ViT Full CUDA Graph #35963
Covers motivation, budget-based capture/replay design, greedy bin-packing algorithm, data-parallel support,
SupportsEncoderCudaGraph protocol, configuration options, and usage examples (CLI + Python)
Add table of contents entry linking to the new section

cc @maxyanghu @wangshangsam @Isotr0py @ywang96

Test plan

Built docs locally with mkdocs serve and verified the new section renders correctly (headings, code blocks,
admonitions, table of contents links)
No existing content modified other than adding the table of contents entry

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-23T18:26:00Z

Documentation preview: https://vllm--37914.org.readthedocs.build/en/37914/

gemini-code-assist

Code Review

This pull request adds comprehensive documentation for Encoder (ViT) CUDA Graphs to the docs/design/cuda_graphs.md file. The new section details the motivation for using CUDA Graphs for vision encoders, their budget-based capture and replay design, model integration via the SupportsEncoderCudaGraph protocol, configuration options, and usage examples. There are no review comments to address.

gemini-code-assist

Code Review

This pull request adds a new section to the cuda_graphs.md documentation, detailing the "Encoder (ViT) CUDA Graphs" feature, including its motivation, design, model integration, configuration, and usage. The reviewer suggests improving the "About the Performance" section by linking to specific performance examples or benchmarks, or by adjusting the section title.

Isotr0py · 2026-03-24T03:12:12Z

I think we can separate ViT Cuda graph doc into single ones like docs/design/torch_compile_multimodal.md

I have moved it to a seperate file docs/design/cuda_graphs_multimodal.md

shen-shanshan · 2026-03-25T02:18:54Z

@b-mu @wangshangsam

Hello, I also want to help improve this feature (ViT CG) in vLLM and maybe we can propose a RFC to track all the related PRs. How do you think?

I suppose there are some other things we can do in the future:

Support video inference for Qwen3-VL ViT CG (I'm currently working on this).
Support more models, e.g., Qwen3.5, GLM-V, Kimi K2.5.
Make benchmark comparisons for more scenarios and offer tuning guide for users to better use feature, since it's not always beneficial to performance. [Feature] ViT Full CUDA Graph #35963 (comment)

Add documentation for the encoder CUDA graph feature (PR vllm-project#35963), covering budget-based capture/replay, greedy bin-packing, data-parallel support, SupportsEncoderCudaGraph protocol, configuration, and usage. Signed-off-by: Baorun Mu <bmu@nvidia.com>

…raphs doc Add full benchmark setup (model, dataset, prompts, warmup, max_model_len), reproduce commands to the encoder CUDA Graphs performance section. Signed-off-by: Baorun Mu <bmu@nvidia.com>

Address review suggestion to clarify that the section covers vision encoder CUDA graphs specifically. Signed-off-by: Baorun Mu <bmu@nvidia.com>

Extract Vision Encoder (ViT) CUDA Graphs section from cuda_graphs.md into its own file cuda_graphs_multimodal.md, following the pattern of torch_compile_multimodal.md. Replace inline section with cross-reference link in the table of contents. Signed-off-by: Baorun Mu <bmu@nvidia.com>

wangshangsam · 2026-03-25T07:41:10Z

@shen-shanshan

maybe we can propose a RFC to track all the related PRs. How do you think?

Personally I think GitHub Projects is a better tool for tracking issues and PRs. Currently, all the features that we (i.e., NV) developed for the MLPerf Inference VLM benchmark (some of them) are (and some of them planed to be; I was just asking for edit permission from @ywang96 ) tracked by the Qwen3.5 project board (which is maintained by @ywang96 @vadiklyutiy ), in addition to the NVIDIA project board (which I tried to maintain from time to time but for the most part it's running by itself through workflows).

The thing with either a project board or a RFC (i.e., tracking issue) is that it's only useful when someone actively maintains it, and I'm not sure if the scope of ViT-cudagraph-related features are big enough to justify standalone project management efforts (in the greater context of how busy everyone is). If you would like to track them, I would propose to track them in the same Qwen3.5 board (I know that the name is less than ideal, but at least that project board is being actively maintained and looked at in terms of multimodality features). Otherwise, my leadership philosophy is about always minimizing management overhead and empowering others instead of being gatekeepers (not that this matters but I'm just trying to explain my thought process), so just submitting the PR(s) that would support video inference for Qwen3-VL ViT CG and (if needed) discussing through slack might be good enough, and we'd be happy to help and provide feedback on concrete technical problems.

wangshangsam · 2026-03-25T08:05:05Z

With that being said, @shen-shanshan if you would like to commit time and maintain a tracking issue for ViT-cudagraph-related features, please feel very welcome to go ahead :)

shen-shanshan · 2026-03-25T08:31:56Z

With that being said, @shen-shanshan if you would like to commit time and maintain a tracking issue for ViT-cudagraph-related features, please feel very welcome to go ahead :)

Thanks for your patient explanation! Currently, I'm working on vLLM multi-modal related things full-time and I'm sure I can commit time to maintain a tracker issue (only focus on CG for ViT).

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

mergify Bot added documentation Improvements or additions to documentation nvidia labels Mar 23, 2026

github-project-automation Bot added this to NVIDIA Mar 23, 2026

gemini-code-assist Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread docs/design/cuda_graphs.md Outdated

wangshangsam assigned b-mu Mar 23, 2026

wangshangsam approved these changes Mar 23, 2026

View reviewed changes

Comment thread docs/design/cuda_graphs.md Outdated

Comment thread docs/design/cuda_graphs.md Outdated

Isotr0py approved these changes Mar 24, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Mar 24, 2026

Isotr0py enabled auto-merge (squash) March 24, 2026 11:48

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 24, 2026

b-mu mentioned this pull request Mar 24, 2026

[Fix] Misc Fixes in ViT CUDA Graph #38040

Open

5 tasks

b-mu added 4 commits March 24, 2026 22:36

[Docs] Add benchmark details and reproduce commands to encoder CUDA G…

b7b9e8e

…raphs doc Add full benchmark setup (model, dataset, prompts, warmup, max_model_len), reproduce commands to the encoder CUDA Graphs performance section. Signed-off-by: Baorun Mu <bmu@nvidia.com>

[Docs] Rename section to "Vision Encoder (ViT) CUDA Graphs"

65d8aa9

Address review suggestion to clarify that the section covers vision encoder CUDA graphs specifically. Signed-off-by: Baorun Mu <bmu@nvidia.com>

auto-merge was automatically disabled March 25, 2026 02:36
Head branch was pushed to by a user without write access

b-mu force-pushed the bmu/vit-full-cudagraph-doc branch from 6617306 to 5bc5ca0 Compare March 25, 2026 02:36

Isotr0py enabled auto-merge (squash) March 25, 2026 02:41

Isotr0py merged commit 9d0351c into vllm-project:main Mar 25, 2026
9 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 25, 2026

wangshangsam added qwen Related to Qwen models performance Performance-related issues labels Mar 25, 2026

wangshangsam added this to Qwen3.5 Mar 25, 2026

github-project-automation Bot moved this from Backlog to Done in Qwen3.5 Mar 25, 2026

github-project-automation Bot moved this to Backlog in Qwen3.5 Mar 25, 2026

shen-shanshan mentioned this pull request Mar 26, 2026

[RFC]: Support ViT Full CUDA Graph (Tracker) #38175

Open

20 tasks

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

30c0862

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

9c74e23

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

7487a4f

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

b1120dc

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

b1f9eaf

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (

0ca33b9

vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914
Isotr0py merged 4 commits intovllm-project:mainfrom
CentML:bmu/vit-full-cudagraph-doc

b-mu commented Mar 23, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py Mar 24, 2026

Uh oh!

b-mu Mar 24, 2026

Uh oh!

shen-shanshan commented Mar 25, 2026

Uh oh!

Uh oh!

wangshangsam commented Mar 25, 2026 •

edited

Loading

Uh oh!

wangshangsam commented Mar 25, 2026

Uh oh!

shen-shanshan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

b-mu commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

mergify Bot commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

b-mu Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

shen-shanshan commented Mar 25, 2026

Uh oh!

Uh oh!

wangshangsam commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangshangsam commented Mar 25, 2026

Uh oh!

shen-shanshan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

b-mu commented Mar 23, 2026 •

edited

Loading

wangshangsam commented Mar 25, 2026 •

edited

Loading