[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling#25557
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
Should we update this as well? |
|
So should we make it configurable or just increase the value for it? |
Actually - let's make a new |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This PR fixes the issue of mismatch between calculated number of tokens and actual number of tokens generated from ViT, but I'm getting these warnings. I think there's something wrong around how we call the HF processor that introduces this warning, which could be confusing to the user. |
|
This PR will be reworked for Qwen3-VL after #25631 merged. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Roger Wang <hey@rogerw.io>
|
#25810 should solve the proble of the big misbatch between ViT output length and video soft token length, so I'm going to update this PR accordingly |
Signed-off-by: Roger Wang <hey@rogerw.io>
|
I've update several logics:
|
| target_video_size, _ = self.info._get_vision_info( | ||
| image_width=target_width, | ||
| image_height=target_height, | ||
| num_frames=target_num_frames, | ||
| image_processor=self.info.get_video_processor(), | ||
| ) |
There was a problem hiding this comment.
This is in fact pretty important.
Previous we're sending a [32, 4096, 4096, 3] input tensor which would OOM if we turn on DP ViT, this is now corrected to [24576, 32, 32, 3]
…le video profiling (#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…le video profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…le video profiling (#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…le video profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…eo profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…le video profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…le video profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…le video profiling (vllm-project#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.