Optimize nemotron VL image/video preprocessing by milesial · Pull Request #40093 · vllm-project/vllm

milesial · 2026-04-17T04:14:17Z

Purpose

Compile and reorganize image/video preprocessing for nemotron nano VL, reducing the amount of CPU time and memory needed.

Fused resize+normalize+cast under @torch.compile — CPU kernel for permute → bicubic → /255 → (x-mean)/std → dtype.
dtype conversion integrated in the fusion to avoid a later separate autocast
contiguous fused to avoid a later separate H2H copy
Skip torch.cat on the single-image / single-video path to avoid a redundant copy
Batched tokenizer call for video frame separators

  1 video of 512x512x512, H100
Before:     apply_hf_processor_ms 898.57 898.63 4.58 905.18
After:       apply_hf_processor_ms 254.21 254.56 3.35 260.79

Signed-off-by: milesial <milesial@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request refactors the preprocessing pipeline for Nano-Nemotron-VL by introducing a unified _bicubic_resize_and_normalize function and optimizing frame separator tokenization using batch encoding. It also adds support for configurable dtypes and reduces unnecessary tensor operations. Review feedback highlights significant concerns regarding the use of @torch.compile on the new preprocessing function, specifically citing the risk of excessive recompilation due to varying image dimensions and the operational overhead of requiring a C++ compiler in production environments.

netanel-haber

LGTM. I ran evals and VoxPopuli (audio+text), InfoVQA_VAL (image+text) and DailyOmni (video+audio+text) are on par before and after.

Optimize nemotron VL image/video preprocessing

10d50aa

Signed-off-by: milesial <milesial@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread vllm/transformers_utils/processors/nano_nemotron_vl.py

Comment thread vllm/transformers_utils/processors/nano_nemotron_vl.py

netanel-haber approved these changes Apr 19, 2026

View reviewed changes

netanel-haber mentioned this pull request Apr 19, 2026

Optimize nemotron VL image/video preprocessing #40283

Merged

milesial closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize nemotron VL image/video preprocessing#40093

Optimize nemotron VL image/video preprocessing#40093
milesial wants to merge 1 commit into
vllm-project:mainfrom
milesial:vk/7078-video-preprocess

milesial commented Apr 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

netanel-haber left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

milesial commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

netanel-haber left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

milesial commented Apr 17, 2026 •

edited

Loading