Skip to content

Optimize nemotron VL image/video preprocessing#40093

Closed
milesial wants to merge 1 commit into
vllm-project:mainfrom
milesial:vk/7078-video-preprocess
Closed

Optimize nemotron VL image/video preprocessing#40093
milesial wants to merge 1 commit into
vllm-project:mainfrom
milesial:vk/7078-video-preprocess

Conversation

@milesial
Copy link
Copy Markdown
Contributor

@milesial milesial commented Apr 17, 2026

Purpose

Compile and reorganize image/video preprocessing for nemotron nano VL, reducing the amount of CPU time and memory needed.

  • Fused resize+normalize+cast under @torch.compile — CPU kernel for permute → bicubic → /255 → (x-mean)/std → dtype.
  • dtype conversion integrated in the fusion to avoid a later separate autocast
  • contiguous fused to avoid a later separate H2H copy
  • Skip torch.cat on the single-image / single-video path to avoid a redundant copy
  • Batched tokenizer call for video frame separators
  1 video of 512x512x512, H100
Before:     apply_hf_processor_ms 898.57 898.63 4.58 905.18
After:       apply_hf_processor_ms 254.21 254.56 3.35 260.79

Signed-off-by: milesial <milesial@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the preprocessing pipeline for Nano-Nemotron-VL by introducing a unified _bicubic_resize_and_normalize function and optimizing frame separator tokenization using batch encoding. It also adds support for configurable dtypes and reduces unnecessary tensor operations. Review feedback highlights significant concerns regarding the use of @torch.compile on the new preprocessing function, specifically citing the risk of excessive recompilation due to varying image dimensions and the operational overhead of requiring a C++ compiler in production environments.

Comment thread vllm/transformers_utils/processors/nano_nemotron_vl.py
Comment thread vllm/transformers_utils/processors/nano_nemotron_vl.py
Copy link
Copy Markdown
Contributor

@netanel-haber netanel-haber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I ran evals and VoxPopuli (audio+text), InfoVQA_VAL (image+text) and DailyOmni (video+audio+text) are on par before and after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants