Skip to content

[Model] Refactor Step3-VL processor to HF style#37579

Merged
DarkLight1337 merged 7 commits intovllm-project:mainfrom
DarkLight1337:step3-vl-processor
Mar 20, 2026
Merged

[Model] Refactor Step3-VL processor to HF style#37579
DarkLight1337 merged 7 commits intovllm-project:mainfrom
DarkLight1337:step3-vl-processor

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Mar 19, 2026

Purpose

  • Make Step3-VL processor contain image_processor in order to fit HF call semantics.
  • Make Step3-VL more efficient by avoiding unnecessary text/token construction and string concatenation
  • Fix a typo in InternVL and Kimi-K2.5 processor

Test Plan

Checked using the example script

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 changed the title [Model] Refactor Step3-VL to HF style [Model] Refactor Step3-VL processor to HF style Mar 19, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Step3-VL processor to align with the Hugging Face style by separating the image processing logic into its own class. While the refactoring is a good step towards better code organization, I've identified a critical bug in the new processor's __call__ method that would cause a runtime error. Additionally, there's a high-severity functional regression due to the removal of a method, which results in less accurate image token counting. I have provided detailed comments and code suggestions to address both of these issues.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337
Copy link
Member Author

Ready for review

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 20, 2026 05:28
@DarkLight1337 DarkLight1337 merged commit 30108fc into vllm-project:main Mar 20, 2026
56 checks passed
@DarkLight1337 DarkLight1337 deleted the step3-vl-processor branch March 20, 2026 06:05
chooper26 pushed a commit to intellistream/vllm-hust that referenced this pull request Mar 21, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants