Skip to content

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914

Merged
Isotr0py merged 4 commits intovllm-project:mainfrom
CentML:bmu/vit-full-cudagraph-doc
Mar 25, 2026
Merged

[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914
Isotr0py merged 4 commits intovllm-project:mainfrom
CentML:bmu/vit-full-cudagraph-doc

Conversation

@b-mu
Copy link
Copy Markdown
Contributor

@b-mu b-mu commented Mar 23, 2026

Summary

  • Add a new "Encoder (ViT) CUDA Graphs" section to docs/design/cuda_graphs.md, documenting the encoder CUDA graph feature from [Feature] ViT Full CUDA Graph #35963
  • Covers motivation, budget-based capture/replay design, greedy bin-packing algorithm, data-parallel support,
    SupportsEncoderCudaGraph protocol, configuration options, and usage examples (CLI + Python)
  • Add table of contents entry linking to the new section

cc @maxyanghu @wangshangsam @Isotr0py @ywang96

Test plan

  • Built docs locally with mkdocs serve and verified the new section renders correctly (headings, code blocks,
    admonitions, table of contents links)
  • No existing content modified other than adding the table of contents entry

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 23, 2026

Documentation preview: https://vllm--37914.org.readthedocs.build/en/37914/

@mergify mergify Bot added documentation Improvements or additions to documentation nvidia labels Mar 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds comprehensive documentation for Encoder (ViT) CUDA Graphs to the docs/design/cuda_graphs.md file. The new section details the motivation for using CUDA Graphs for vision encoders, their budget-based capture and replay design, model integration via the SupportsEncoderCudaGraph protocol, configuration options, and usage examples. There are no review comments to address.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new section to the cuda_graphs.md documentation, detailing the "Encoder (ViT) CUDA Graphs" feature, including its motivation, design, model integration, configuration, and usage. The reviewer suggests improving the "About the Performance" section by linking to specific performance examples or benchmarks, or by adjusting the section title.

Comment thread docs/design/cuda_graphs.md Outdated
Comment thread docs/design/cuda_graphs.md Outdated
Comment thread docs/design/cuda_graphs.md Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can separate ViT Cuda graph doc into single ones like docs/design/torch_compile_multimodal.md

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it to a seperate file docs/design/cuda_graphs_multimodal.md

@github-project-automation github-project-automation Bot moved this to Ready in NVIDIA Mar 24, 2026
@Isotr0py Isotr0py enabled auto-merge (squash) March 24, 2026 11:48
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 24, 2026
@b-mu b-mu mentioned this pull request Mar 24, 2026
5 tasks
@shen-shanshan
Copy link
Copy Markdown
Contributor

@b-mu @wangshangsam

Hello, I also want to help improve this feature (ViT CG) in vLLM and maybe we can propose a RFC to track all the related PRs. How do you think?

I suppose there are some other things we can do in the future:

  1. Support video inference for Qwen3-VL ViT CG (I'm currently working on this).
  2. Support more models, e.g., Qwen3.5, GLM-V, Kimi K2.5.
  3. Make benchmark comparisons for more scenarios and offer tuning guide for users to better use feature, since it's not always beneficial to performance. [Feature] ViT Full CUDA Graph #35963 (comment)

b-mu added 4 commits March 24, 2026 22:36
Add documentation for the encoder CUDA graph feature (PR vllm-project#35963),
covering budget-based capture/replay, greedy bin-packing, data-parallel
support, SupportsEncoderCudaGraph protocol, configuration, and usage.

Signed-off-by: Baorun Mu <bmu@nvidia.com>
…raphs doc

Add full benchmark setup (model, dataset, prompts, warmup, max_model_len),
reproduce commands to the encoder CUDA Graphs performance section.

Signed-off-by: Baorun Mu <bmu@nvidia.com>
Address review suggestion to clarify that the section covers vision encoder CUDA graphs specifically.

Signed-off-by: Baorun Mu <bmu@nvidia.com>
Extract Vision Encoder (ViT) CUDA Graphs section from cuda_graphs.md
into its own file cuda_graphs_multimodal.md, following the pattern of
torch_compile_multimodal.md. Replace inline section with cross-reference
link in the table of contents.

Signed-off-by: Baorun Mu <bmu@nvidia.com>
auto-merge was automatically disabled March 25, 2026 02:36

Head branch was pushed to by a user without write access

@b-mu b-mu force-pushed the bmu/vit-full-cudagraph-doc branch from 6617306 to 5bc5ca0 Compare March 25, 2026 02:36
@Isotr0py Isotr0py enabled auto-merge (squash) March 25, 2026 02:41
@Isotr0py Isotr0py merged commit 9d0351c into vllm-project:main Mar 25, 2026
9 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 25, 2026
@wangshangsam wangshangsam added qwen Related to Qwen models performance Performance-related issues labels Mar 25, 2026
@wangshangsam
Copy link
Copy Markdown
Collaborator

wangshangsam commented Mar 25, 2026

@shen-shanshan

maybe we can propose a RFC to track all the related PRs. How do you think?

Personally I think GitHub Projects is a better tool for tracking issues and PRs. Currently, all the features that we (i.e., NV) developed for the MLPerf Inference VLM benchmark (some of them) are (and some of them planed to be; I was just asking for edit permission from @ywang96 ) tracked by the Qwen3.5 project board (which is maintained by @ywang96 @vadiklyutiy ), in addition to the NVIDIA project board (which I tried to maintain from time to time but for the most part it's running by itself through workflows).

The thing with either a project board or a RFC (i.e., tracking issue) is that it's only useful when someone actively maintains it, and I'm not sure if the scope of ViT-cudagraph-related features are big enough to justify standalone project management efforts (in the greater context of how busy everyone is). If you would like to track them, I would propose to track them in the same Qwen3.5 board (I know that the name is less than ideal, but at least that project board is being actively maintained and looked at in terms of multimodality features). Otherwise, my leadership philosophy is about always minimizing management overhead and empowering others instead of being gatekeepers (not that this matters but I'm just trying to explain my thought process), so just submitting the PR(s) that would support video inference for Qwen3-VL ViT CG and (if needed) discussing through slack might be good enough, and we'd be happy to help and provide feedback on concrete technical problems.

@github-project-automation github-project-automation Bot moved this from Backlog to Done in Qwen3.5 Mar 25, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in Qwen3.5 Mar 25, 2026
@wangshangsam
Copy link
Copy Markdown
Collaborator

With that being said, @shen-shanshan if you would like to commit time and maintain a tracking issue for ViT-cudagraph-related features, please feel very welcome to go ahead :)

@shen-shanshan
Copy link
Copy Markdown
Contributor

With that being said, @shen-shanshan if you would like to commit time and maintain a tracking issue for ViT-cudagraph-related features, please feel very welcome to go ahead :)

Thanks for your patient explanation! Currently, I'm working on vLLM multi-modal related things full-time and I'm sure I can commit time to maintain a tracker issue (only focus on CG for ViT).

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
vllm-project#37914)

Signed-off-by: Baorun Mu <bmu@nvidia.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
vllm-project#37914)

Signed-off-by: Baorun Mu <bmu@nvidia.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants