[Multimodal] Simplify ViT CUDA graph interfaces#41234
[Multimodal] Simplify ViT CUDA graph interfaces#41234Isotr0py wants to merge 11 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the encoder CUDA graph interface to simplify model implementations. It replaces several specific methods, such as get_input_modality and get_max_frames_per_video, with a unified get_encoder_cudagraph_item_specs method and a consolidated encoder_forward method. The EncoderCudaGraphManager now auto-detects input keys based on configuration. Feedback suggests using ValueError instead of AssertionError for unreachable code in qwen3_vl.py and improving the specificity of error messages in the CUDA graph manager to aid debugging.
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
cc @shen-shanshan @b-mu about ViT CUDA graph cleanup. |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
| # actual inputs may be smaller. Zero then slice-copy so padded | ||
| # positions are invisible to attention (cu_seqlens masks them out). | ||
| input_key = self.config.input_key_by_modality[ | ||
| input_key = input_key = self.config.input_key_by_modality[ |
There was a problem hiding this comment.
This input_key = input_key = ... is changed by mistake?
| def get_encoder_cudagraph_item_specs( | ||
| self, | ||
| mm_kwargs: dict[str, Any], | ||
| ) -> int: | ||
| """Return the number of items (e.g. images) in the batch.""" | ||
| ... | ||
|
|
||
| def get_encoder_cudagraph_per_item_output_tokens( | ||
| self, | ||
| mm_kwargs: dict[str, Any], | ||
| ) -> list[int]: | ||
| """Return output token count for each item. | ||
|
|
||
| Used for greedy packing and DP load balancing. | ||
| """ | ||
| ... | ||
|
|
||
| def get_encoder_cudagraph_per_item_input_sizes( | ||
| self, | ||
| mm_kwargs: dict[str, Any], | ||
| ) -> list[int]: | ||
| """Return input size (e.g. patch count) for each item. | ||
| ) -> list["EncoderItemSpec"]: |
There was a problem hiding this comment.
Since #40830 has been merged, maybe we should also make Qwen2.5-VL adapt to these new interfaces.
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Purpose
get_encoder_cudagraph_num_items,get_encoder_cudagraph_per_item_output_tokensandget_encoder_cudagraph_per_item_input_sizesinto oneget_encoder_cudagraph_item_specsfunction.encoder_cudagraph_forwardandencoder_eager_forwardinto oneencoder_forwardfunction.Test Plan
Test Result
All tests should pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.