Skip to content

Conversation

@yangsijia-serena
Copy link
Collaborator

@yangsijia-serena yangsijia-serena commented Oct 31, 2025

Motivation

fix #12390 and #11896

Modifications

  1. set MIN_PIXELS, MAX_PIXELS, IMAGE_FACTOR according to model's config
  2. remove resize_image logic in qwen-vl series processors because resize will always be done in transformers, also remove other useless functions and imports.

Accuracy Tests

The results of test_vlm_models.py
before fix:
image

after fix:
image

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yangsijia-serena, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where image resizing for Qwen-VL series models was not correctly utilizing model-specific parameters. The changes refactor the image processing to dynamically fetch resizing configurations from the transformers library's image processor and eliminate redundant custom resizing code. This ensures that image preprocessing is accurately aligned with the model's requirements, improving the overall correctness of multimodal data handling.

Highlights

  • Dynamic Image Resizing Parameters: Image resizing parameters such as MIN_PIXELS, MAX_PIXELS, and IMAGE_FACTOR are now dynamically determined from the model's configuration, specifically leveraging the transformers image processor for Qwen-VL series models.
  • Removal of Redundant Resizing Logic: The custom image resizing logic previously present in qwen_vl.py has been removed, as the responsibility for image resizing will now be handled by the transformers library itself, streamlining the preprocessing pipeline.
  • Parameterized Image Resizing Function: The resize_image_async function in points_v15_chat.py has been updated to accept MIN_PIXELS, MAX_PIXELS, and IMAGE_FACTOR as explicit arguments, enabling model-specific resizing behavior across different processors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the image resizing logic for Qwen-VL series models to use model-specific parameters. The changes involve dynamically calculating IMAGE_FACTOR, MIN_PIXELS, and MAX_PIXELS in QwenVLImageProcessor and updating the dependent code. The overall approach is sound. However, I've identified a potential issue in the initialization logic where fallback values use a hardcoded number, which could lead to inconsistencies. I've provided a suggestion to reorder the initialization to resolve this. The other changes, such as updating the call in points_v15_chat.py and removing the now-redundant resizing logic in qwen_vl.py, are well-aligned with the PR's objectives.

@github-actions github-actions bot added the Multi-modal multi-modal language model label Nov 6, 2025
@yangsijia-serena yangsijia-serena changed the title fix: use model-specific params to resize images for qwen-vl series models fix: resize images logic of qwen-vl series models Nov 6, 2025
@yhyang201
Copy link
Collaborator

yhyang201 commented Nov 10, 2025

Before:
image
After:
image

It doesn’t seem to affect qwen2.5vl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Qwen-VL resize outside image-processor will lead gap for VIT and total model's output

5 participants