[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914
[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc#37914Isotr0py merged 4 commits intovllm-project:mainfrom
Conversation
|
Documentation preview: https://vllm--37914.org.readthedocs.build/en/37914/ |
There was a problem hiding this comment.
Code Review
This pull request adds comprehensive documentation for Encoder (ViT) CUDA Graphs to the docs/design/cuda_graphs.md file. The new section details the motivation for using CUDA Graphs for vision encoders, their budget-based capture and replay design, model integration via the SupportsEncoderCudaGraph protocol, configuration options, and usage examples. There are no review comments to address.
There was a problem hiding this comment.
Code Review
This pull request adds a new section to the cuda_graphs.md documentation, detailing the "Encoder (ViT) CUDA Graphs" feature, including its motivation, design, model integration, configuration, and usage. The reviewer suggests improving the "About the Performance" section by linking to specific performance examples or benchmarks, or by adjusting the section title.
There was a problem hiding this comment.
I think we can separate ViT Cuda graph doc into single ones like docs/design/torch_compile_multimodal.md
There was a problem hiding this comment.
I have moved it to a seperate file docs/design/cuda_graphs_multimodal.md
|
Hello, I also want to help improve this feature (ViT CG) in vLLM and maybe we can propose a RFC to track all the related PRs. How do you think? I suppose there are some other things we can do in the future:
|
Add documentation for the encoder CUDA graph feature (PR vllm-project#35963), covering budget-based capture/replay, greedy bin-packing, data-parallel support, SupportsEncoderCudaGraph protocol, configuration, and usage. Signed-off-by: Baorun Mu <bmu@nvidia.com>
…raphs doc Add full benchmark setup (model, dataset, prompts, warmup, max_model_len), reproduce commands to the encoder CUDA Graphs performance section. Signed-off-by: Baorun Mu <bmu@nvidia.com>
Address review suggestion to clarify that the section covers vision encoder CUDA graphs specifically. Signed-off-by: Baorun Mu <bmu@nvidia.com>
Extract Vision Encoder (ViT) CUDA Graphs section from cuda_graphs.md into its own file cuda_graphs_multimodal.md, following the pattern of torch_compile_multimodal.md. Replace inline section with cross-reference link in the table of contents. Signed-off-by: Baorun Mu <bmu@nvidia.com>
Head branch was pushed to by a user without write access
6617306 to
5bc5ca0
Compare
Personally I think GitHub Projects is a better tool for tracking issues and PRs. Currently, all the features that we (i.e., NV) developed for the MLPerf Inference VLM benchmark (some of them) are (and some of them planed to be; I was just asking for edit permission from @ywang96 ) tracked by the Qwen3.5 project board (which is maintained by @ywang96 @vadiklyutiy ), in addition to the NVIDIA project board (which I tried to maintain from time to time but for the most part it's running by itself through workflows). The thing with either a project board or a RFC (i.e., tracking issue) is that it's only useful when someone actively maintains it, and I'm not sure if the scope of ViT-cudagraph-related features are big enough to justify standalone project management efforts (in the greater context of how busy everyone is). If you would like to track them, I would propose to track them in the same Qwen3.5 board (I know that the name is less than ideal, but at least that project board is being actively maintained and looked at in terms of multimodality features). Otherwise, my leadership philosophy is about always minimizing management overhead and empowering others instead of being gatekeepers (not that this matters but I'm just trying to explain my thought process), so just submitting the PR(s) that would support video inference for Qwen3-VL ViT CG and (if needed) discussing through slack might be good enough, and we'd be happy to help and provide feedback on concrete technical problems. |
|
With that being said, @shen-shanshan if you would like to commit time and maintain a tracking issue for ViT-cudagraph-related features, please feel very welcome to go ahead :) |
Thanks for your patient explanation! Currently, I'm working on vLLM multi-modal related things full-time and I'm sure I can commit time to maintain a tracker issue (only focus on CG for ViT). |
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com> Signed-off-by: Rishi Puri <riship@nvidia.com>
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>
vllm-project#37914) Signed-off-by: Baorun Mu <bmu@nvidia.com>
Summary
SupportsEncoderCudaGraphprotocol, configuration options, and usage examples (CLI + Python)cc @maxyanghu @wangshangsam @Isotr0py @ywang96
Test plan
admonitions, table of contents links)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.