[2/3] Refactor InternVL-based processors by DarkLight1337 · Pull Request #37324 · vllm-project/vllm

DarkLight1337 · 2026-03-17T16:57:56Z

Purpose

Follow-up to #37289

Split up processing logic into *ImageProcessor and *VideoProcessor.
Remove unnecessary processor definitions for Eagle2.5 and SkyworkR1V
Init processor directly instead of using self.ctx.init_processor, in order to avoid extra kwargs causing error.

Note: Nemotron Parse and Nano Nemotron VL's processor will be handled in a separate PR.

Test Plan

Checked python examples/offline_inference/vision_language.py (except for NVLM_D and Skywork which are too big to load in memory)
Checked python examples/offline_inference/vision_language_multi_image.py for InternVL

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the InternVL-based processors. By centralizing common logic into a new InternVLProcessor and its related helper classes, the changes greatly improve code maintainability and reduce duplication. The new structure is more modular and easier to follow. I've identified one minor issue where a method is called redundantly, which I've commented on. Overall, this is a high-quality contribution.

vllm/model_executor/models/nemotron_vl.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2026-03-17T17:03:22Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the InternVL-based processors by splitting the logic into separate *ImageProcessor and *VideoProcessor classes. This is a good architectural improvement that centralizes common logic and removes redundant processor definitions for models like Eagle2.5 and SkyworkR1V. The changes are extensive but consistent across multiple model files. I've found one critical bug in the new InternVLProcessor that could lead to a runtime error.

vllm/transformers_utils/processors/internvl.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2026-03-18T04:26:38Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the processor logic for several InternVL-based models, moving from model-specific processors to a more generic and reusable InternVLProcessor framework. This is a significant improvement for code maintainability and consistency. The changes are well-executed across multiple models. I've found one critical bug in the NVLMProcessor logic that could lead to an incorrect number of image tokens being generated.

vllm/transformers_utils/processors/nvlm_d.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py · 2026-03-18T08:45:16Z

vllm/model_executor/models/internvl.py

+        if video_token not in self.get_tokenizer().get_vocab():
+            return None
+
+        return video_token


Is this still used? I think we have unified all video tokens to use video_token = "<video>" in new processor?

Actually this should be effectively ctx_video_token (the token after replacement), let me rename it. It is not to be confused with the <video> placeholder (before replacement).

Isotr0py · 2026-03-18T08:50:59Z

vllm/transformers_utils/processors/internvl.py

+
+                    while "<placeholder>" in new_prompt:
+                        replace_str = replace_strings.pop(0)
+                        new_prompt = new_prompt.replace("<placeholder>", replace_str, 1)


Since image_token and video_token are both different from ctx_image_token, I think we can direclty replace image/video token instead of using "<placeholder>" as intermediary.

Nemotron VL uses <image> as ctx_image_token so we need to use a different one.

Isotr0py · 2026-03-18T08:52:07Z

Checked python examples/offline_inference/vision_language.py (except for NVLM_D and Skywork which are too big to load in memory)

I can help check NVLM_D and Skywork tonight.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py

Both NVLM_D and Skywork work, LGTM.

$ python examples/offline_inference/vision_language.py -m NVLM_D
INFO 03-18 14:06:53 [utils.py:233] non-default args: {'trust_remote_code': True, 'max_model_len': 4096, 'tensor_parallel_size': 4, 'limit_mm_per_prompt': {'image': 1, 'video': 0, 'audio': 0, 'vision_chunk': 0}, 'model': 'nvidia/NVLM-D-72B'}
INFO 03-18 14:06:54 [model.py:533] Resolved architecture: NVLM_D
INFO 03-18 14:06:54 [model.py:1582] Using max model len 4096
INFO 03-18 14:06:54 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 03-18 14:06:54 [vllm.py:754] Asynchronous scheduling is enabled.
(EngineCore pid=2436208) INFO 03-18 14:06:58 [core.py:103] Initializing a V1 LLM engine (v0.17.2rc1.dev60+g17c47fb86) with config: model='nvidia/NVLM-D-72B', speculative_config=None, tokenizer='nvidia/NVLM-D-72B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=nvidia/NVLM-D-72B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
...
Rendering prompts: 100%|██████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 15.86it/s]
Processed prompts: 100%|█████████████████| 4/4 [00:01<00:00,  2.06it/s, est. speed input: 3841.81 toks/s, output: 131.62 toks/s]
--------------------------------------------------
The image portrays a cherry blossom tree in full bloom, with numerous pink flowers adorning its branches. The tree is positioned in front of a tall, white tower, which serves as a backdrop. The sky is clear and blue, providing a vibrant contrast to the pink blossoms and the tower. The cherry blossom tree is
--------------------------------------------------
The image features a tall, white tower with a distinctive design, surrounded by cherry blossom trees in full bloom. The cherry blossoms, with their pink flowers, create a beautiful contrast against the blue sky. The tower, known as the Tokyo Tower, is a famous landmark in Japan, often associated with the cherry blossom season
--------------------------------------------------
The image depicts a tall, white tower with a lattice structure, partially obscured by cherry blossom trees in full bloom. The cherry blossoms are pink and cover most of the frame, with the blue sky serving as the background. The tower, which is the focal point of the image, is framed by the cherry blossom branches
--------------------------------------------------
The image depicts a cherry blossom tree in full bloom, with numerous pink flowers adorning its branches. The blossoms are in various stages of blooming, creating a vibrant and picturesque scene. The tree's branches extend across the frame, with some reaching towards the top of the image. The sky in the background is a
--------------------------------------------------

$ python examples/offline_inference/vision_language.py -m skywork_chat
INFO 03-18 14:14:45 [model.py:533] Resolved architecture: SkyworkR1VChatModel
INFO 03-18 14:14:45 [model.py:1582] Using max model len 4096
INFO 03-18 14:14:45 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 03-18 14:14:45 [vllm.py:754] Asynchronous scheduling is enabled.
generation_config.json: 100%|██████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 2.59MB/s]
(EngineCore pid=2441252) INFO 03-18 14:14:47 [core.py:103] Initializing a V1 LLM engine (v0.17.2rc1.dev60+g17c47fb86) with config: model='Skywork/Skywork-R1V-38B', speculative_config=None, tokenizer='Skywork/Skywork-R1V-38B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Skywork/Skywork-R1V-38B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
...
--------------------------------------------------
Alright, let's take a look at this image. It seems to be a beautiful scene with cherry blossoms in the foreground and a tall tower in the background. The cherry blossoms are in full bloom, with their delicate pink flowers creating a soft, ethereal atmosphere. The tower appears to be a well-known landmark
--------------------------------------------------
Alright, so I'm looking at this image, and I want to figure out what's going on here. Let me start by breaking it down. The image is a close-up of cherry blossoms, which are in full bloom. The flowers are pink and delicate, and they're spread out across the branches of a
--------------------------------------------------
Alright, so I'm looking at this image, and I want to figure out what it's showing. Let me start by breaking it down. The first thing I notice is the abundance of pink flowers. They seem to be cherry blossoms, which are pretty common in springtime, especially in places like Japan. The
--------------------------------------------------
Alright, so I'm looking at this image, and I need to figure out what's going on. Let me start by breaking it down. The image is a close-up of a cherry blossom tree with lots of pink flowers. The branches are in the foreground, and through them, I can see a tall, cylindrical
--------------------------------------------------

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: EricccYang <yangyang4991@gmail.com>

[2/2] Refactor InternVL-based processors

644cc1a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

vllm/model_executor/models/nemotron_vl.py Outdated Show resolved Hide resolved

Fix

b3b049f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

vllm/transformers_utils/processors/internvl.py Outdated Show resolved Hide resolved

Fix

69f502e

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the speculative-decoding label Mar 17, 2026

DarkLight1337 added 4 commits March 18, 2026 03:33

mypy

6794b07

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Simplify

a1a64c7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Restore main branch behavior

9d4a6c7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

8e860ba

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the qwen Related to Qwen models label Mar 18, 2026

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026

DarkLight1337 marked this pull request as ready for review March 18, 2026 04:24

DarkLight1337 requested review from sighingnow and ywang96 as code owners March 18, 2026 04:24

DarkLight1337 requested a review from Isotr0py March 18, 2026 04:24

DarkLight1337 changed the title ~~[2/2] Refactor InternVL-based processors~~ [2/3] Refactor InternVL-based processors Mar 18, 2026

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

vllm/transformers_utils/processors/nvlm_d.py Show resolved Hide resolved

DarkLight1337 added 3 commits March 18, 2026 07:07

Fix init

b5b8930

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

f708d61

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

36f99b1

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 18, 2026

DarkLight1337 added 2 commits March 18, 2026 08:31

Fix

eaae0e4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Simplify

cf86b10

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py reviewed Mar 18, 2026

View reviewed changes

Up

fc1744c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Mar 18, 2026

[Bugfix] Fix Nemotron Parse loading #37407

Merged

5 tasks

Isotr0py approved these changes Mar 18, 2026

View reviewed changes

Isotr0py merged commit 99267c2 into vllm-project:main Mar 18, 2026
57 checks passed

DarkLight1337 deleted the refactor-internvl branch March 18, 2026 14:42

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

d85b299

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

dda9a7b

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

2152b36

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

4cfa387

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

ff226d4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

[2/3] Refactor InternVL-based processors (vllm-project#37324)

7a47630

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: EricccYang <yangyang4991@gmail.com>

Uh oh!

Conversation

DarkLight1337 commented Mar 17, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Isotr0py Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Isotr0py commented Mar 18, 2026

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DarkLight1337 commented Mar 17, 2026 •

edited by github-actions bot

Loading

DarkLight1337 Mar 18, 2026 •

edited

Loading