fix: resize images logic of qwen-vl series models #12458

yangsijia-serena · 2025-10-31T10:38:09Z

Motivation

fix #12390 and #11896

Modifications

set MIN_PIXELS, MAX_PIXELS, IMAGE_FACTOR according to model's config
remove resize_image logic in qwen-vl series processors because resize will always be done in transformers, also remove other useless functions and imports.

Accuracy Tests

The results of test_vlm_models.py
before fix:

after fix:

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-31T10:38:27Z

Summary of Changes

Hello @yangsijia-serena, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where image resizing for Qwen-VL series models was not correctly utilizing model-specific parameters. The changes refactor the image processing to dynamically fetch resizing configurations from the transformers library's image processor and eliminate redundant custom resizing code. This ensures that image preprocessing is accurately aligned with the model's requirements, improving the overall correctness of multimodal data handling.

Highlights

Dynamic Image Resizing Parameters: Image resizing parameters such as MIN_PIXELS, MAX_PIXELS, and IMAGE_FACTOR are now dynamically determined from the model's configuration, specifically leveraging the transformers image processor for Qwen-VL series models.
Removal of Redundant Resizing Logic: The custom image resizing logic previously present in qwen_vl.py has been removed, as the responsibility for image resizing will now be handled by the transformers library itself, streamlining the preprocessing pipeline.
Parameterized Image Resizing Function: The resize_image_async function in points_v15_chat.py has been updated to accept MIN_PIXELS, MAX_PIXELS, and IMAGE_FACTOR as explicit arguments, enabling model-specific resizing behavior across different processors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the image resizing logic for Qwen-VL series models to use model-specific parameters. The changes involve dynamically calculating IMAGE_FACTOR, MIN_PIXELS, and MAX_PIXELS in QwenVLImageProcessor and updating the dependent code. The overall approach is sound. However, I've identified a potential issue in the initialization logic where fallback values use a hardcoded number, which could lead to inconsistencies. I've provided a suggestion to reorder the initialization to resolve this. The other changes, such as updating the call in points_v15_chat.py and removing the now-redundant resizing logic in qwen_vl.py, are well-aligned with the PR's objectives.

python/sglang/srt/multimodal/processors/qwen_vl.py

…dels Signed-off-by: yangsijia.614 <[email protected]>

Signed-off-by: yangsijia.614 <[email protected]>

python/sglang/srt/multimodal/processors/qwen_vl.py

yhyang201 · 2025-11-10T14:33:08Z

Before:

After:

It doesn’t seem to affect qwen2.5vl.

Signed-off-by: yangsijia.614 <[email protected]>

yangsijia-serena requested review from JustinTong0323 and mickqian as code owners October 31, 2025 10:38

gemini-code-assist bot reviewed Oct 31, 2025

View reviewed changes

python/sglang/srt/multimodal/processors/qwen_vl.py Outdated Show resolved Hide resolved

b8zhong added the run-ci label Nov 1, 2025

antoinegg1 mentioned this pull request Nov 4, 2025

[Bug] Qwen3-VL-30B-A3B yields lower-than-expected results on the image grounding task with sglang #11896

Open

5 tasks

yangsijia-serena force-pushed the fix/qwen-vl-resize branch from db2de09 to cd1abdd Compare November 5, 2025 14:49

fix: use model-specific params to resize images for qwen-vl series mo…

64b0e39

…dels Signed-off-by: yangsijia.614 <[email protected]>

yangsijia-serena force-pushed the fix/qwen-vl-resize branch from cd1abdd to 64b0e39 Compare November 5, 2025 14:52

yangsijia-serena added 2 commits November 5, 2025 22:52

Merge branch 'main' into fix/qwen-vl-resize

05c40db

fix remove resize_image logic in qwen-vl processor

560486f

Signed-off-by: yangsijia.614 <[email protected]>

github-actions bot added the Multi-modal multi-modal language model label Nov 6, 2025

yangsijia-serena changed the title ~~fix: use model-specific params to resize images for qwen-vl series models~~ fix: resize images logic of qwen-vl series models Nov 6, 2025

Merge branch 'main' into fix/qwen-vl-resize

18e54b8

ZhengWG reviewed Nov 7, 2025

View reviewed changes

python/sglang/srt/multimodal/processors/qwen_vl.py Outdated Show resolved Hide resolved

yhyang201 reviewed Nov 10, 2025

View reviewed changes

python/sglang/srt/multimodal/processors/qwen_vl.py Outdated Show resolved Hide resolved

fix: remove useless params in qwen_vl processor

e43c181

Signed-off-by: yangsijia.614 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resize images logic of qwen-vl series models #12458

fix: resize images logic of qwen-vl series models #12458

yangsijia-serena commented Oct 31, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yhyang201 commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: resize images logic of qwen-vl series models #12458

Are you sure you want to change the base?

fix: resize images logic of qwen-vl series models #12458

Conversation

yangsijia-serena commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yhyang201 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yangsijia-serena commented Oct 31, 2025 •

edited

Loading

yhyang201 commented Nov 10, 2025 •

edited

Loading